Upcoming Deadlines


Online Transition:
Week 15 Content

Week 15 Content

  • 1/8: Hypothesis Tests: Critical Values and CIs (Karle) — Handout

  • 2/8 and 3/8: The t test (Karle) — Handout

  • 4/8: t-test in Python (Wade) — Colab Notebook

  • 5/8: df.apply (Wade) — Colab Notebook

  • 6/8: A/B Testing (Wade) — Handout

  • 7/8: Distance Metrics (Wade) — Handout

  • 8/8: Normalizing Data (Wade) — Handout

Lecture videos available on Compass 2g.

Assignments:

  • Homework: No more homeworks! Work on the final project this week. :)
  • Lab: lab_similarity

April 27, 2020
Similarity

lab_similarity: Similarity

April 27, 2020
You and Data Science

Projects: You and Data Science

April 21, 2020
Week 14 Content

Week 14 Content

This week focuses on hypothesis testing and you’ll use the z-test to check if an unknown source of data conforms to an expected distribution (eg: is a new six-sided die you just bought actually fair?). Here’s the notes:

  • 1/7 and 2/7: One Sample Z Test (Karle) — Handout

  • 3/7: One sample z-test in Python (Wade) — Colab Notebook

  • 4/7: One Tailed vs. Two Tailed Hypothesis Tests (Karle) — Handout

  • 5/7: Two Sample Z Test (Karle) — Handout

  • 6/7: Two sample z-test in Python (Wade) — Colab Notebook

  • 7/7: Human Impact of Probabilities (Wade) — Handout

Lecture videos available on Compass 2g.

Assignments:

April 20, 2020
Hypothesis Tests

lab_hypothesis-tests: Hypothesis Tests

Week 13 Content

Week 13 Content

This week you will dive into machine learning and begin building models to do amazing things! As part of the Python notebooks, for lecture I used a Python environment that runs online called Google Colab. In a colab notebook, you can run the cells right in your web browser – let me know if that’s easier to follow along! :)

Lecture videos available on Compass 2g.

Assignments:

Midterm 2:

  • Midtern 2 practice exam available online in PL. Details on Midterm 2 on Piazza.

April 13, 2020
K-Means Clustering

lab_kmeans: K-Means Clustering

April 13, 2020
Regression

lab_regression: Regression

Week 12 Content

Week 12 Content

We have six videos this week and are exploring a brand new dataset! Here’s the overview:

  • 1/6: Confidence Intervals for means and percents (Karle) — Handout

  • 2/6: Data Science with Confidence Intervals (Wade) — Handout

  • 3/6: Scatterplots, Correlation, Regression (Karle) — Handout

  • 4/6: Scatter Plots in Python + Diamond Dataset (Wade) — Handout | Diamond Dataset (Google Drive)

  • 5/6: Correlation in Python (Wade) — Handout

  • 6/6: Linear Regression in Python (Wade) — Handout

Lecture videos available on Compass 2g.

Assignments:

March 30, 2020
Week 11 Content

Week 11 Content

Lecture Notes on Sampling and Inference:

Lecture Notes on EV and SE for means and percents:

Python Review including Lists and Loops:

Lecture videos available on Compass 2g.

Assignments:

March 30, 2020
Lists

lab_lists: Lists

March 30, 2020
Week 10 Content

Week 10 Content

Lecture Notes on Normal Distribution:

Lecture Notes on Central Limit Theorem (CLT):

Lecture videos available on Compass 2g.

Assignments:

March 23, 2020
CLT

lab_clt: CLT

March 22, 2020
Discrete Random Variables, Bernoulli, and Binomial

Lecture: Discrete Random Variables, Bernoulli, and Binomial

Any outcome that has exactly two outcomes with a fixed probability is called a Bernoulli distribution. The Binomial Distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments. For a single trial (n=1), the binomial distribution is a Bernoulli distribution.

March 13, 2020
Image Mosaic

Projects: Image Mosaic

March 12, 2020
Random Variable

lab_random-variable: Random Variable

Images + Random Variables

Lecture: Images + Random Variables

March 11, 2020
Probability and Simulation III

Homework 11: Probability and Simulation III

March 9, 2020
Simulation Analysis + Images

Lecture: Simulation Analysis + Images

Simulation allows us to understand the outcomes of uncertain events. We will begin with basic simulations and build up to more complex simulations throughout this semester.

March 9, 2020
Bayes Rule

Lecture: Bayes Rule

Bayes Rule allows us to express a conditional probability as the inverse, often making the problem easier to solve.

March 6, 2020
Birthday

lab_birthday: Birthday

March 4, 2020
Probability and Simulation II

Homework 10: Probability and Simulation II

March 4, 2020
Functions in Python and Conditional Probability

Lecture: Functions in Python and Conditional Probability

March 4, 2020
Probability and Simulation I

Homework 9: Probability and Simulation I

March 2, 2020
Addition Rule + Conditional Probability

Lecture: Addition Rule + Conditional Probability

The conditional probability of an event B is the probability that the event will occur given that an event A has already occurred.

March 2, 2020
Midterm 1 (CBTF) happens this week - No class on Friday!

Lecture: Midterm 1 (CBTF) happens this week - No class on Friday!

February 28, 2020
Simulation

lab_simulation: Simulation

February 26, 2020
Simulation

Lecture: Simulation

February 26, 2020
Introduction to Probability II

Lecture: Introduction to Probability II

Probability is the likelihood or chance of an event occurring. This continues a multi-week journey discovering probability and how to simulate probabilistic events.

February 24, 2020
Quartiles and Boxplots

Homework 8: Quartiles and Boxplots

February 21, 2020
Introduction to Probability

Lecture: Introduction to Probability

Probability is the likelihood or chance of an event occurring. This begins a multi-week journey discovering probability and how to simulate probabilistic events.

February 21, 2020
Plots

lab_plots: Plots

February 19, 2020
Algorithms to Solve Complex Problems

Lecture: Algorithms to Solve Complex Problems

An algorithm is a step-by-step, detailed set of instructions to solve a problem. An algorithm can be expressed as English sentences (usually as a numbered list) and is a great way to begin solving complex problems.

February 19, 2020
groupby and Center and Spread II

Homework 7: groupby and Center and Spread II

February 17, 2020
Quartiles and Box Plots

Lecture: Quartiles and Box Plots

Just like histograms, box plots are used as a way to visually represent numerical data. They do this through selected percentiles which are given special names.

February 17, 2020
Bar Graphs and Histograms

Lecture: Bar Graphs and Histograms

Large tables of numbers can be difficult to interpret, no matter how organized they are. Sometimes it is much easier to interpret graphs than numbers.

February 14, 2020
GPA

lab_gpa: GPA

February 12, 2020
Center and Spread

Homework 6: Center and Spread

February 12, 2020
Data Visualization

Lecture: Data Visualization

February 12, 2020
Privacy and Data Science

Homework 5: Privacy and Data Science

February 10, 2020
Grouping Data (pandas)

Lecture: Grouping Data (pandas)

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

February 10, 2020
Boolean Logic and Conditionals

Lecture: Boolean Logic and Conditionals

February 7, 2020
Simpson's Paradox

lab_simpsons-paradox: Simpson's Paradox

February 5, 2020
Python Conditionals w/ Pandas

Homework 4: Python Conditionals w/ Pandas

February 5, 2020
Measures of Center and Spread

Lecture: Measures of Center and Spread

Parameters are numerical facts about the population. In this lecture, we will look at parameters such as the average (µ) and standard deviation (σ) of a list of numbers. Later, we will start talking about statistics. Statistics are estimates of parameters computed from a sample.

February 5, 2020
Causal Links and Confounders

Homework 3: Causal Links and Confounders

February 3, 2020
Simpson's Paradox and Stratification

Lecture: Simpson's Paradox and Stratification

Stratification is often called the "blocking of observational studies" and allows us to use stratification to further explore observational studies.

February 3, 2020
Confounders and Observational Studies

Lecture: Confounders and Observational Studies

For years observational studies have shown that people who carry lighters are more likely to get lung cancer. However, this does not mean that carrying lighters causes you to get cancer. Smoking is an obvious confounder! If we weren’t sure about this, how can we determine whether it’s the lighters or the confounders or (maybe some combination of both) that is causing the lung cancer?

January 31, 2020
Getting Started with Pandas

lab_pandas: Getting Started with Pandas

The primary Data Science library we will be using this semester is pandas. This lab explores the basic usage of the pandas library and gets you ready for the Data Science challenges we will be beginning next week!

January 29, 2020
Blocking and Python Conditionals

Homework 2: Blocking and Python Conditionals

January 29, 2020
Blocking and Conditionals

Lecture: Blocking and Conditionals

Random assignment to treatment and control works best to make the groups as alike as possible. With enough subjects, random differences average out. But what do you do if you have a small sample? Blocking first, then randomizing ensures that the treatment and control group are balanced with regard to the variables blocked on. We can use conditionals in pandas to help us do this!

January 29, 2020
Experimental Design and Basic Python

Homework 1: Experimental Design and Basic Python

January 27, 2020
Experimental Design and Row Selection (pandas)

Lecture: Experimental Design and Row Selection (pandas)

Does the death penalty have a deterrent effect? Is chocolate good for you? What causes breast cancer? All of these questions attempt to assign a cause to an effect. A careful examination of data can help shed light on questions like these.

January 27, 2020
Data Science Tools

Lecture: Data Science Tools

Data, Science, and Tools all have meaning in their own, explore how one relates to another and how they all related to Data Science DISCOVERY!

January 24, 2020
Welcome to Data Science Discovery

Lecture: Welcome to Data Science Discovery

The next BIG thing at Illinois is Data Science and it starts with Discovery!

January 22, 2020
Introduction to Data Science

lab_intro: Introduction to Data Science

Data scientists use powerful tools to help learn about data. In this first lab, you will set up your account and computer for Data Science Discovery and begin to play around with your very first Python notebook!

January 21, 2020
Welcome to Data Science Discovery!

Welcome to Data Science Discovery!

Our first lecture is Wednesday, Jan. 22 at 12:00noon in 1306 Everitt Laboratory. See you there! :)

January 13, 2020