Archived Content
This website is an archive of the Spring 2019 semester of STAT 107: Data Science Discovery.
▶ Click here for the Fall 2019 webpage.
▶ Click here for the Fall 2019 webpage.
Course Schedule
Date | Event | Links |
---|---|---|
2019-01-14 | Welcome to Data Science Discovery | |
2019-01-16 |
Ideal Experimental Design
Does the death penalty have a deterrent effect? Is chocolate good for you? What causes breast cancer? All of these questions attempt to assign a cause to an effect. A careful examination of data can help shed light on questions like these.
|
|
2019-01-18 |
Confounders and Observational Studies
Observational studies are done out of necessity. Whenever possible, it’s better to do a randomized controlled experiment. Why?
|
|
2019-01-21 | MLK Day |
|
2019-01-23 |
Observational Studies & Simpson’s Paradox
For years observational studies have shown that people who carry lighters are more likely to get lung cancer. However, this does not mean that carrying lighters causes you to get cancer. Smoking is an obvious confounder! If we weren’t sure about this, how can we determine whether it’s the lighters or the confounders or (maybe some combination of both) that is causing the lung cancer?
|
|
2019-01-25 |
Data Science Tools
\"Data\", \"Science\", and \"Tools\" all have meaning in their own, explore how one relates to another and how they all related to Data Science DISCOVERY!
|
|
2019-01-28 |
Introduction to Pandas
Time to focus in on data, learning the primary tool we will be using all semester!
|
|
2019-01-30 | Arctic Vortex |
|
2019-02-01 | Pandas - Creating Columns and Groups | |
2019-02-04 | Algorithms for Complex Problems | |
2019-02-06 | Functions | |
2019-02-08 | Data Cleaning | |
2019-02-11 |
Bar Graphs and Histograms
Large tables of numbers can be difficult to interpret, no matter how organized they are. Sometimes it is much easier to interpret graphs than numbers.
|
|
2019-02-13 |
Center and Spread
Parameters are numerical facts about the population. In this lecture, we will look at parameters such as the average (µ) and standard deviation (σ) of a list of numbers. Later, we will start talking about statistics. Statistics are estimates of parameters computed from a sample.
|
|
2019-02-15 |
Boxplots
Just like histograms, box plots are used as a way to visually represent numerical data. They do this through selected percentiles which are given special names.
|
|
2019-02-18 | Scatter Plots | |
2019-02-20 | Correlation and Regression | |
2019-02-22 | Correlation and Regression II | |
2019-02-25 | Descriptive Statistics and Probability | |
2019-02-27 | Probability | |
2019-03-01 | Probability II | |
2019-03-04 | Midterm Exam (CBTF) |
|
2019-03-06 |
Simulation
Simulation is an imitation of a real-world event within a computer program. We can use millions of simulations and observe the distribution of outcomes to help us understand the answer to a problem that may be difficult to model mathematically.
|
|
2019-03-08 |
Binary Event Simulation
As we work towards simulating events using Python, we need to first develop an understanding of different types of events to simulate. The first type of events are events with exactly two outcomes, or binary outcome events.
|
|
2019-03-11 | Simulation and Analysis | |
2019-03-13 |
Control Flow in Python - Conditionals and Loops
In nearly every programming language, every program runs from top-to-bottom, one line at a time. In addition to running from top-to-bottom, there are three control flow commands in Python that allows us to control the flow of a Python program.
|
|
2019-03-15 |
Control Flow in Python - Loops and Functions
In nearly every programming language, every program runs from top-to-bottom, one line at a time. In addition to running from top-to-bottom, there are three control flow commands in Python that allows us to control the flow of a Python program.
|
|
2019-03-18 | Spring Break |
|
2019-03-20 | Spring Break |
|
2019-03-22 | Spring Break |
|
2019-03-25 | Random Variables, EV, SE |
|
2019-03-27 | Discrete Random Variables, Bernoulli, Binomial |
|
2019-03-29 | Continuous Random Variables and the Normal Distribution |
|
2019-04-01 | The Central Limit Theorem |
|
2019-04-03 | Confidence Intervals for means and proportions |
|
2019-04-05 | Choosing a Sample Size |
|
2019-04-08 | Hypothesis Testing - One Sample Z Test for means and proportions |
|
2019-04-10 | Hypothesis Testing - Two Sample Z Test for means and proportions |
|
2019-04-12 | Hypothesis Testing - One and 2 Sample t tests |
|
2019-04-15 | Hypothesis Testing - Chi Square Test for Goodness of Fit |
|
2019-04-17 | Regression Inference |
|
2019-04-19 | Decisions and Type I & Type 2 Errors |
|
2019-04-22 | Bootsrapping/Resampling |
|
2019-04-24 | A/B Testing |
|
2019-04-26 | Classifiers |
|
2019-04-29 | Case Studies |
|
2019-05-01 | Final Exam Review |
|
2019-05-02 | Reading Day and Final Exam |
|