Archived Content
This website is an archive of the Spring 2019 semester of STAT 107: Data Science Discovery.
▶ Click here for the Fall 2019 webpage.
▶ Click here for the Fall 2019 webpage.
Course Schedule
Date  Event  Links 

20190114  Welcome to Data Science Discovery  
20190116 
Ideal Experimental Design
Does the death penalty have a deterrent effect? Is chocolate good for you? What causes breast cancer? All of these questions attempt to assign a cause to an effect. A careful examination of data can help shed light on questions like these.


20190118 
Confounders and Observational Studies
Observational studies are done out of necessity. Whenever possible, it’s better to do a randomized controlled experiment. Why?


20190121  MLK Day 

20190123 
Observational Studies & Simpson’s Paradox
For years observational studies have shown that people who carry lighters are more likely to get lung cancer. However, this does not mean that carrying lighters causes you to get cancer. Smoking is an obvious confounder! If we weren’t sure about this, how can we determine whether it’s the lighters or the confounders or (maybe some combination of both) that is causing the lung cancer?


20190125 
Data Science Tools
\"Data\", \"Science\", and \"Tools\" all have meaning in their own, explore how one relates to another and how they all related to Data Science DISCOVERY!


20190128 
Introduction to Pandas
Time to focus in on data, learning the primary tool we will be using all semester!


20190130  Arctic Vortex 

20190201  Pandas  Creating Columns and Groups  
20190204  Algorithms for Complex Problems  
20190206  Functions  
20190208  Data Cleaning  
20190211 
Bar Graphs and Histograms
Large tables of numbers can be difficult to interpret, no matter how organized they are. Sometimes it is much easier to interpret graphs than numbers.


20190213 
Center and Spread
Parameters are numerical facts about the population. In this lecture, we will look at parameters such as the average (µ) and standard deviation (σ) of a list of numbers. Later, we will start talking about statistics. Statistics are estimates of parameters computed from a sample.


20190215 
Boxplots
Just like histograms, box plots are used as a way to visually represent numerical data. They do this through selected percentiles which are given special names.


20190218  Scatter Plots  
20190220  Correlation and Regression  
20190222  Correlation and Regression II  
20190225  Descriptive Statistics and Probability  
20190227  Probability  
20190301  Probability II  
20190304  Midterm Exam (CBTF) 

20190306 
Simulation
Simulation is an imitation of a realworld event within a computer program. We can use millions of simulations and observe the distribution of outcomes to help us understand the answer to a problem that may be difficult to model mathematically.


20190308 
Binary Event Simulation
As we work towards simulating events using Python, we need to first develop an understanding of different types of events to simulate. The first type of events are events with exactly two outcomes, or binary outcome events.


20190311  Simulation and Analysis  
20190313 
Control Flow in Python  Conditionals and Loops
In nearly every programming language, every program runs from toptobottom, one line at a time. In addition to running from toptobottom, there are three control flow commands in Python that allows us to control the flow of a Python program.


20190315 
Control Flow in Python  Loops and Functions
In nearly every programming language, every program runs from toptobottom, one line at a time. In addition to running from toptobottom, there are three control flow commands in Python that allows us to control the flow of a Python program.


20190318  Spring Break 

20190320  Spring Break 

20190322  Spring Break 

20190325  Random Variables, EV, SE 

20190327  Discrete Random Variables, Bernoulli, Binomial 

20190329  Continuous Random Variables and the Normal Distribution 

20190401  The Central Limit Theorem 

20190403  Confidence Intervals for means and proportions 

20190405  Choosing a Sample Size 

20190408  Hypothesis Testing  One Sample Z Test for means and proportions 

20190410  Hypothesis Testing  Two Sample Z Test for means and proportions 

20190412  Hypothesis Testing  One and 2 Sample t tests 

20190415  Hypothesis Testing  Chi Square Test for Goodness of Fit 

20190417  Regression Inference 

20190419  Decisions and Type I & Type 2 Errors 

20190422  Bootsrapping/Resampling 

20190424  A/B Testing 

20190426  Classifiers 

20190429  Case Studies 

20190501  Final Exam Review 

20190502  Reading Day and Final Exam 
