Course Schedule

Date	Event
Aug. 26	Welcome to Data Science Discovery The next BIG thing at Illinois is Data Science and it starts with Discovery! Lecture Slides Lecture Handout Join the course Piazza Register your iClicker on Compass 2g Day 1 Survey (+1 EC)
Aug. 28	Data Science Tools Data, Science, and Tools all have meaning in their own, explore how one relates to another and how they all related to Data Science DISCOVERY! Lecture Slides Lecture Handout Day 1 (Hello) Dataset
Aug. 30	Experimental Design and Row Selection (pandas) Does the death penalty have a deterrent effect? Is chocolate good for you? What causes breast cancer? All of these questions attempt to assign a cause to an effect. A careful examination of data can help shed light on questions like these. Lecture Handout Extra Credit Notebook (+1) Homework 1
Sep. 2	Labor Day
Sep. 4	Blocking and Conditionals Random assignment to treatment and control works best to make the groups as alike as possible. With enough subjects, random differences average out. But what do you do if you have a small sample? Blocking first, then randomizing ensures that the treatment and control group are balanced with regard to the variables blocked on. We can use conditionals in pandas to help us do this! Lecture Handout Extra Credit Notebook (+1) Course Catalog Dataset
Sep. 6	Confounders and Observational Studies For years observational studies have shown that people who carry lighters are more likely to get lung cancer. However, this does not mean that carrying lighters causes you to get cancer. Smoking is an obvious confounder! If we weren’t sure about this, how can we determine whether it’s the lighters or the confounders or (maybe some combination of both) that is causing the lung cancer? Lecture Handout Extra Credit Notebook (+1)
Sep. 9	Simpson's Paradox and Stratification Stratification is often called the "blocking of observational studies" and allows us to use stratification to further explore observational studies. UC-Berkeley Graduate Admission Dataset Lecture Handout Extra Credit Notebook (+1)
Sep. 11	Measures of Center and Spread Parameters are numerical facts about the population. In this lecture, we will look at parameters such as the average (µ) and standard deviation (σ) of a list of numbers. Later, we will start talking about statistics. Statistics are estimates of parameters computed from a sample. Lecture Handout Extra Credit Notebook (+1)
Sep. 13	Boolean Logic and Conditionals Lecture Handout Extra Credit Notebook (+1)
Sep. 16	Grouping Data (pandas) A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Lecture Handout GPA Dataset Extra Credit Notebook (+1)
Sep. 18	Grouping Data (pandas) II Lecture Handout Extra Credit Notebook (+1)
Sep. 20	Bar Graphs and Histograms Large tables of numbers can be difficult to interpret, no matter how organized they are. Sometimes it is much easier to interpret graphs than numbers. Perception of Probability Words Survey (+1 EC) CBTF Exam Registration Lecture Handout Extra Credit Notebook (+1)
Sep. 23	Quartiles and Box Plots Just like histograms, box plots are used as a way to visually represent numerical data. They do this through selected percentiles which are given special names. Lecture Handout Perception of Probability Words Dataset Extra Credit Notebook (+1)
Sep. 25	Algorithms to Solve Complex Problems An algorithm is a step-by-step, detailed set of instructions to solve a problem. An algorithm can be expressed as English sentences (usually as a numbered list) and is a great way to begin solving complex problems. Lecture Handout Football Dataset Extra Credit Notebook (+1)
Sep. 27	Introduction to Probability + Monty Hall Probability is the likelihood or chance of an event occurring. This begins a multi-week journey discovering probability and how to simulate probabilistic events. Lecture Handout Monty Hall Game Extra Credit Notebook (+1)
Sep. 30	Probability, Birthday Problem, and Control Flow Lecture Handout Extra Credit Notebook (+1)
Oct. 2	Loops in Python + Addition Rule Lecture Handout Extra Credit Notebook (+1)
Oct. 4	Midterm 1 (CBTF) - No Class :)
Oct. 7	Addition Rule + Conditional Probability The conditional probability of an event B is the probability that the event will occur given that an event A has already occurred. Lecture Handout Extra Credit Notebook (+1)
Oct. 9	Functions in Python and Conditional Probability Lecture Handout
Oct. 11	Bayes Rule Lecture Handout Extra Credit Notebook (+1)
Oct. 14	Simulation Analysis + Images Lecture Handout Extra Credit Notebook (+1)
Oct. 16	Images + Random Variables Lecture Handout Extra Credit Notebook (+1)
Oct. 18	Discrete Random Variables, Bernoulli, and Binomial Any outcome that has exactly two outcomes with a fixed probability is called a Bernoulli distribution. The Binomial Distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments. For a single trial (n=1), the binomial distribution is a Bernoulli distribution. Lecture Handout Extra Credit Notebook (+1)
Oct. 21	Normal Approximation The normal curve is a bell-shaped "ideal" histogram that many histograms resemble. Many histograms are close to the normal curve. For these histograms, you can use the normal curve to estimate percentages for the data. Lecture Handout Extra Credit Notebook (+1)
Oct. 23	Central Limit Theorem The normal approximation for random variables amounts to taking advantage of the Central Limit Theorem. We replace the true probability histogram for the sum, average, or percentage of draws by the normal curve before computing areas. Project 1 Released Lecture Handout
Oct. 25	Sampling We take a sample to find out about a larger population. We usually don’t have the resources to gather information on everyone in the whole population so instead, we select a small sample and use it to make inferences about the larger population. Lecture Handout Extra Credit Notebook (+1)
Oct. 28	Confidence Intervals Lecture Handout Extra Credit Notebook (+1)
Oct. 30	Lists and Dictionaries Lecture Handout Extra Credit Notebook (+1)
Nov. 1	CLT + Polling + Scatterplots Lecture Handout Extra Credit Notebook (+1)
Nov. 4	Scatterplots, Correlation, Simple Regression Lecture Handout Diamonds Data Set Extra Credit Notebook (+1)
Nov. 6	Residuals, RMSE, Regression in Python Lecture Handout Extra Credit Notebook (+1)
Nov. 8	Residuals + RMSE Lecture Handout Dataset - Calories in Beer Extra Credit Notebook (+1)
Nov. 11	RMSE and Clustering Lecture Handout Extra Credit Notebook (+1)
Nov. 13	k-means clustering Lecture Handout Extra Credit Notebook (+1)
Nov. 15	Midterm 2 (CBTF) - No Class :)
Nov. 18	Hypothesis Testing Hypothesis Tests are statistical tests to see if a difference we observe is due to chance. Many times, we have competing hypotheses about the value of a population parameter. It’s impossible or impractical to examine the whole population to find out which hypothesis is true, so we take a random sample and see which hypothesis better supported by our sample data. Lecture Handout Extra Credit Notebook (+1)
Nov. 20	Z Tests in Python Lecture Handout Extra Credit Notebook (+1)
Nov. 22	2 Sample Z Test Lecture Handout
Nov. 25	Fall Break
Nov. 27	Fall Break
Nov. 29	Fall Break
Dec. 2	t-tests Lecture Handout Annotated Lecture Notes Extra Credit Notebook (+1)
Dec. 4	Distance Metrics Lecture Handout
Dec. 6	Clustering Lecture Handout Extra Credit Notebook (+1)
Dec. 9	Normalization and Neural Networks Lecture Handout
Dec. 11	Storytelling and Data Visualization

Archived Content

Course Schedule