Upcoming Deadlines
Online Transition:
 Online transition details
 Updates via Piazza and Compass 2g
 Stay healthy, stay safe.
Week 15 Content

1/8: Hypothesis Tests: Critical Values and CIs (Karle) —
Handout

2/8 and 3/8: The t test (Karle) —
Handout

4/8: ttest in Python (Wade) —
Colab Notebook

5/8: df.apply (Wade) —
Colab Notebook

6/8: A/B Testing (Wade) —
Handout

7/8: Distance Metrics (Wade) —
Handout

8/8: Normalizing Data (Wade) —
Handout
Lecture videos available on Compass 2g.
Assignments:
 Homework: No more homeworks! Work on the final project this week. :)
 Lab: lab_similarity
Week 14 Content
This week focuses on hypothesis testing and you’ll use the ztest to check if an unknown source of data conforms to an expected distribution (eg: is a new sixsided die you just bought actually fair?). Here’s the notes:

1/7 and 2/7: One Sample Z Test (Karle) —
Handout

3/7: One sample ztest in Python (Wade) —
Colab Notebook

4/7: One Tailed vs. Two Tailed Hypothesis Tests (Karle) —
Handout

5/7: Two Sample Z Test (Karle) —
Handout

6/7: Two sample ztest in Python (Wade) —
Colab Notebook

7/7: Human Impact of Probabilities (Wade) —
Handout
Lecture videos available on Compass 2g.
Assignments:
 Homework: Homework 19 and 20 on PL
 Lab: lab_hypothesistests
Week 13 Content
This week you will dive into machine learning and begin building models to do amazing things! As part of the Python notebooks, for lecture I used a Python environment that runs online called Google Colab. In a colab notebook, you can run the cells right in your web browser – let me know if that’s easier to follow along! :)

1/7: Residuals and RMSE (Karle) —
Handout

2/7 and 3/7: Linear Regression in Python (Wade + Karle) —
Handout

Colab Notebook

4/7: RMSE in Python (Wade) —
Colab Notebook

5/7: Machine Learning Overview (Wade) —
Handout

6/7: kmeans Clustering Overview (Wade) —
Handout
 7/7: kmeans in Python (Wade) — Colab Notebook
Lecture videos available on Compass 2g.
Assignments:
 Homework: Homework 18 on PL
 Lab: lab_kmeans
Midterm 2:
 Midtern 2 practice exam available online in PL. Details on Midterm 2 on Piazza.
Week 12 Content
We have six videos this week and are exploring a brand new dataset! Here’s the overview:

1/6: Confidence Intervals for means and percents (Karle) —
Handout

2/6: Data Science with Confidence Intervals (Wade) —
Handout

3/6: Scatterplots, Correlation, Regression (Karle) —
Handout

4/6: Scatter Plots in Python + Diamond Dataset (Wade) —
Handout

Diamond Dataset (Google Drive)

5/6: Correlation in Python (Wade) —
Handout
 6/6: Linear Regression in Python (Wade) — Handout
Lecture videos available on Compass 2g.
Assignments:
 Homework: Homework 16 and Homework 17 on PL
 Lab: lab_regression
Week 11 Content
Lecture Notes on Sampling and Inference:
 Karle  Blank
Lecture Notes on EV and SE for means and percents:
 Karle  Blank
Python Review including Lists and Loops:
 Wade  Blank
Lecture videos available on Compass 2g.
Assignments:
 Homework: Homework 14 and Homework 15 on PL
 Lab: lab_lists
Week 10 Content
Lecture Notes on Normal Distribution:
Lecture Notes on Central Limit Theorem (CLT):
Lecture videos available on Compass 2g.
Assignments:
 Homework: Homework 12 and Homework 13 on PL
 Lab: lab_clt
Lecture: Discrete Random Variables, Bernoulli, and Binomial
Any outcome that has exactly two outcomes with a fixed probability is called a Bernoulli distribution. The Binomial Distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments. For a single trial (n=1), the binomial distribution is a Bernoulli distribution.
Lecture: Simulation Analysis + Images
Simulation allows us to understand the outcomes of uncertain events. We will begin with basic simulations and build up to more complex simulations throughout this semester.
Lecture: Bayes Rule
Bayes Rule allows us to express a conditional probability as the inverse, often making the problem easier to solve.
Lecture: Addition Rule + Conditional Probability
The conditional probability of an event B is the probability that the event will occur given that an event A has already occurred.
Lecture: Midterm 1 (CBTF) happens this week  No class on Friday!
Lecture: Introduction to Probability II
Probability is the likelihood or chance of an event occurring. This continues a multiweek journey discovering probability and how to simulate probabilistic events.
Lecture: Introduction to Probability
Probability is the likelihood or chance of an event occurring. This begins a multiweek journey discovering probability and how to simulate probabilistic events.
Lecture: Algorithms to Solve Complex Problems
An algorithm is a stepbystep, detailed set of instructions to solve a problem. An algorithm can be expressed as English sentences (usually as a numbered list) and is a great way to begin solving complex problems.
Lecture: Quartiles and Box Plots
Just like histograms, box plots are used as a way to visually represent numerical data. They do this through selected percentiles which are given special names.
Lecture: Bar Graphs and Histograms
Large tables of numbers can be difficult to interpret, no matter how organized they are. Sometimes it is much easier to interpret graphs than numbers.
Lecture: Grouping Data (pandas)
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
Lecture: Measures of Center and Spread
Parameters are numerical facts about the population. In this lecture, we will look at parameters such as the average (µ) and standard deviation (σ) of a list of numbers. Later, we will start talking about statistics. Statistics are estimates of parameters computed from a sample.
Lecture: Simpson's Paradox and Stratification
Stratification is often called the "blocking of observational studies" and allows us to use stratification to further explore observational studies.
Lecture: Confounders and Observational Studies
For years observational studies have shown that people who carry lighters are more likely to get lung cancer. However, this does not mean that carrying lighters causes you to get cancer. Smoking is an obvious confounder! If we weren’t sure about this, how can we determine whether it’s the lighters or the confounders or (maybe some combination of both) that is causing the lung cancer?
Lecture: Blocking and Conditionals
Random assignment to treatment and control works best to make the groups as alike as possible. With enough subjects, random differences average out. But what do you do if you have a small sample? Blocking first, then randomizing ensures that the treatment and control group are balanced with regard to the variables blocked on. We can use conditionals in pandas to help us do this!
Lecture: Experimental Design and Row Selection (pandas)
Does the death penalty have a deterrent effect? Is chocolate good for you? What causes breast cancer? All of these questions attempt to assign a cause to an effect. A careful examination of data can help shed light on questions like these.
Lecture: Data Science Tools
Data, Science, and Tools all have meaning in their own, explore how one relates to another and how they all related to Data Science DISCOVERY!
Lecture: Welcome to Data Science Discovery
The next BIG thing at Illinois is Data Science and it starts with Discovery!
 Hello Survey
 Lecture Slides
 Lecture Handout
 Join the course Piazza
 Register your iClicker on Compass 2g
Welcome to Data Science Discovery!
Our first lecture is Wednesday, Jan. 22 at 12:00noon in 1306 Everitt Laboratory. See you there! :)