Archived Content
▶ Click here for the Fall 2019 webpage.
Final Exam Information
- The STAT 107 Final Exam has two parts: a CBTF-based exam and a Python notebook.
- Infinite Practice Exams Available on PrairieLearn
- You must sign up for your CBTF exam on the CBTF scheduler. Choose anytime you want to take it between Thursday, May 2 - Thursday, May 9!
- The Python exam will be available on Compass 2g starting on Friday, May 3.
- Both parts combined are designed to take no more than 3 hours.
Lecture 38-39: k-means Clustering
Lecture 37: Distance Metrics and Clustering
In many areas of Data Science, we need to define how different two rows of data are from each other. The most common way to find this difference is to define a distance metric that can be used to provide a numeric difference or "distance" two rows of data are from each other.
Lecture 36: A/B Testing
With hypothesis testing in hand, we can explore how you can go about creating experiments in the real-world that allow you to testing of hypotheses and adding value to a project! One of the most common techniques in the use of A/B Testing.
Lecture 35: t Test
T-tests are very similar to z-tests. They test if a difference we observe is due to chance. We use t-tests and the Student's Curve (t-Distribution) only when a set of ALL THREE conditions are met.
Lecture 34: The Two Sample Z Test
Previously, we tested hypotheses about population averages (means) or percentages using the test statistic. Now we’ll try to test hypotheses that compare the averages (means) or percentages in two populations.
Lecture 33: Z Tests in Python
Lecture 32: One Sample Z Test
Hypothesis Tests are statistical tests to see if a difference we observe is due to chance. Many times, we have competing hypotheses about the value of a population parameter. It’s impossible or impractical to examine the whole population to find out which hypothesis is true, so we take a random sample and see which hypothesis better supported by our sample data..
Lecture 30+31: Sampling, Expected Value, and Standard Error
In our discussion of random variables, we started with games of chance because they easily translate into probability models. We know all of the outcomes and their probabilities. In the next section, we are going to see how random variables relate to gathering information about large populations from small samples.
Lecture 28-29: The Normal Distribution
Many histograms are close to the normal curve. For these histograms, you can use the normal curve to estimate percentages for the data.
Lecture 26: Random Variables
Statisticians use the term random variable for variables whose numeric values are based on the outcome of a random process. The domain of a random variable is the set of possible outcomes. Each outcome has a probability associated with it.
Lecture 25: Control Flow in Python - Simulation
Lecture 24: Control Flow in Python - Loops and Functions
In nearly every programming language, every program runs from top-to-bottom, one line at a time. In addition to running from top-to-bottom, there are three control flow commands in Python that allows us to control the flow of a Python program.
Lecture 23: Control Flow in Python - Conditionals and Loops
In nearly every programming language, every program runs from top-to-bottom, one line at a time. In addition to running from top-to-bottom, there are three control flow commands in Python that allows us to control the flow of a Python program.
Lecture 21: Binary Event Simulation
As we work towards simulating events using Python, we need to first develop an understanding of different types of events to simulate. The first type of events are events with exactly two outcomes, or binary outcome events.
Lecture 20: Simulation
Simulation is an imitation of a real-world event within a computer program. We can use millions of simulations and observe the distribution of outcomes to help us understand the answer to a problem that may be difficult to model mathematically.
Lecture 17: Descriptive Statistics and Probability
Lecture 16: Correlation and Regression
Lecture 14: Scatter Plots
Just like histograms, box plots are used as a way to visually represent numerical data. They do this through selected percentiles which are given special names.
Lecture 13: Boxplots
Just like histograms, box plots are used as a way to visually represent numerical data. They do this through selected percentiles which are given special names.
Lecture 12: Center and Spread
Parameters are numerical facts about the population. In this lecture, we will look at parameters such as the average (µ) and standard deviation (σ) of a list of numbers. Later, we will start talking about statistics. Statistics are estimates of parameters computed from a sample.
Lecture 11: Bar Graphs and Histograms
Large tables of numbers can be difficult to interpret, no matter how organized they are. Sometimes it is much easier to interpret graphs than numbers.
Lecture 10: Data Cleaning and Review
Lecture 9: Functions and Data Cleaning
Lecture 8: Developing Algorithms for Complex Problems
Lecture 7: Creating Columns and Groups
Lecture 6: Introduction to Pandas
Time to focus in on data, learning the primary tool we will be using all semester!
Lecture 5: Data Science Tools
"Data", "Science", and "Tools" all have meaning in their own, explore how one relates to another and how they all related to Data Science DISCOVERY!
Lecture 4: Observational Studies & Simpson’s Paradox
For years observational studies have shown that people who carry lighters are more likely to get lung cancer. However, this does not mean that carrying lighters causes you to get cancer. Smoking is an obvious confounder! If we weren’t sure about this, how can we determine whether it’s the lighters or the confounders or (maybe some combination of both) that is causing the lung cancer?
Lecture 3: Observational Studies & Confounders
Observational studies are done out of necessity. Whenever possible, it’s better to do a randomized controlled experiment. Why?
Lecture: Ideal Experimental Design
Does the death penalty have a deterrent effect? Is chocolate good for you? What causes breast cancer? All of these questions attempt to assign a cause to an effect. A careful examination of data can help shed light on questions like these.
Welcome to Data Science Discovery
First lecture is Monday, Jan. 14 at 9am in G32 FLB. See you there!