Course Schedule
Date | Event | Links |
---|---|---|
Aug. 26 |
Introduction to Data Science Exploration
Building on STAT 107, Data Science Discovery, let's explore data science and statistical analysis in real world settings!
|
|
Aug. 28 |
Data and Python
What do we mean by data? How can we organize data? How can we visualize and summarize data? Python is a powerful data science environment for organizing and understanding our data.
|
|
Aug. 30 |
Data and Python
What if the data have missing values? How can we summarize qualitative and quantitative data, and study relations between different variables in the data? Let's explore how core modules like NumPy, Pandas, and Matplotlib help us manage and visualize data for these purposes.
|
|
Sep. 2 | Labor Day |
|
Sep. 4 |
Python, Pandas and Data Frames - Quantitative variables
Let's explore summary statistics, distributions and visuals for quantitative data
|
|
Sep. 6 |
Structure of Data Frames
Let's delve more deeply into the structure of data frames and how we can process the data, extract subsets, and set up for further analysis.
|
|
Sep. 9 |
Working with Data Frames
Let's see how to extract information from data frames, add more data, sort the data, and merge information from two or more sources.
|
|
Sep. 11 |
Random Sampling and Probability
We use Python to demonstrate random sampling and explore the corresponding probabilities of different outcomes.
|
|
Sep. 13 |
For Loops and Functions for Simulation
Python for loops and functions enable us to automate and simplify repetitive tasks, which is essential for Monte Carlo simulations.
|
|
Sep. 16 |
Monte Carlo Studies of Sampling Distributions
How much variation is there in a sample statistic when we draw a random sample from a population? We investigate using Monte Carlo simulations.
|
|
Sep. 18 |
Statistics, Parameters and Estimation
How shall we collect data to estimate key population parameters? How can we estimate those parameters and determine the margin of error?
|
|
Sep. 20 |
Statistics, Parameters and Random Variables
In order to understand uncertainty better, the concept of a random variable is extremely useful for understanding the variation in sample statistics.
|
|
Sep. 23 |
Case Study in Data Science
Albert Man guest lectures on a data science project he did as part of a job interview!
|
|
Sep. 25 |
Computing and Visualizing Interval Probabililites and Quantiles
We develop the basics for computing the interval probabilities and percentiles needed for many confidence intervals and tests.
|
|
Sep. 27 |
Margin of Error for Sample-Based Estimates
Let's study the variation in sums, means and proportions and use their proporties to determine margin of error for these estimates.
|
|
Sep. 30 |
Normal Approximation and Confidence Intervals
We explore how to us sample means, proportions and other statistics to compute confidence intervals for population parameters.
|
|
Oct. 2 |
Review
Let's review what we've done so far - bring questions to class!
|
|
Oct. 4 |
Exam 1
In class exam, 218 Ceramics Building. Bring non-programmable calculator.
|
|
Oct. 7 |
Confidence intervals for general means
We explore large sample confidence intervals for the mean of a population and solve the mystery of n-1!
|
|
Oct. 9 |
Confidence intervals and significance tests for differences
Building on the results for single samples we explore how to compare samples from different subpopulations such as treatment/control, A/B testing and other grouping variables
|
|
Oct. 11 |
Formulating and testing hypotheses
We study a general approach for testing hypotheses about parameters of interest in several representative examples.
|
|
Oct. 14 |
z-tests, t-tests and degrees of freedom
We compare z-tests, which rely on the central limit theorem for large sample validity, and t-tests, which provide a small sample adjustment
|
|
Oct. 16 |
Introduction to Regression Modeling using StatsModels
Python examples with results and interpretation
|
|
Oct. 18 |
Regression modeling and inference
Coefficient standard errors, confidence intervals and prediction intervals
|
|
Oct. 21 |
Regression model assessment and prediction
We'll see how to access and use information about the model parameters, model fit and residuals
|
|
Oct. 23 |
Structured regression and categorical predictor variables
Analysis of variance and F tests for building models
|
|
Oct. 25 |
Comparing nested regression models
Examples of ANOVA F tests in model building
|
|
Oct. 28 |
ANOVA, F tests and Model Selection
More examples with results and interpretation
|
|
Oct. 30 |
Modeling probabilities using logit models
Odds ratios, 2 x 2 tables, and logistic regression
|
|
Nov. 1 |
Logistic regression modeling
Building and interpreting logit models with multiple explanatory variables
|
|
Nov. 4 |
Classification via Logistic Regression
Sensitivity, Specificity, ROC curves
|
|
Nov. 6 |
Review
Work on practice problems in class
|
|
Nov. 8 |
Review
Work on practice problems in class
|
|
Nov. 11 |
Exam 2
In class exam, 218 Ceramics Building. Bring non-programmable calculator.
|
|
Nov. 13 |
Train/Test ROC Analysis
We split the data into training and testing data to reduce bias in ROC evaluation of a classfier.
|
|
Nov. 15 |
Train/Test methods for model assessment
We use tools provided in scikit-learn for modeling and machine learning in Python
|
|
Nov. 18 |
Model Selection in Logistic Regression
We explore the tradeoff between model fit and model simplicity using criteria such as AIC and BIC
|
|
Nov. 20 |
Train/Test and Cross Validation with Scikit-Learn
Examples with regularized logistic regression
|
|
Nov. 22 |
Regularized Logistic Regression for High-Dimensional Data
Compare different regularization penalties for logistic regression with many feature variables
|
|
Dec. 2 |
Regularized Logistic Regression
Regularization penalties and cross-validation accuracy of regularized logit classifiers
|
|
Dec. 4 |
Regularized Linear Regression for High-Dimensional Data
We explore a machine learning approach for improving accuracy of multiple linear regression using penalized least squares, with application to gene expression analysis
|
|
Dec. 6 |
Regularized Linear Regression
Visualizing high dimensional data and selecting the regularization tuning parameter
|
|
Dec. 9 | Review | |
Dec. 11 | Review | |
Dec. 17 |
Exam Prep Help Session
69 English Building, 5:00 - 7:00 pm
|
|
Dec. 18 |
Office Hours
118 Illini Hall, 2:00 - 3:30 pm
|
|
Dec. 19 |
Final Exam
218 Ceramics Building, 8:00 - 11:00 am
|
|