Course Schedule

Date Event Links
Aug. 26 Introduction to Data Science Exploration
Building on STAT 107, Data Science Discovery, let's explore data science and statistical analysis in real world settings!
Aug. 28 Data and Python
What do we mean by data? How can we organize data? How can we visualize and summarize data? Python is a powerful data science environment for organizing and understanding our data.
Aug. 30 Data and Python
What if the data have missing values? How can we summarize qualitative and quantitative data, and study relations between different variables in the data? Let's explore how core modules like NumPy, Pandas, and Matplotlib help us manage and visualize data for these purposes.
Sep. 2 Labor Day
Sep. 4 Python, Pandas and Data Frames - Quantitative variables
Let's explore summary statistics, distributions and visuals for quantitative data
Sep. 6 Structure of Data Frames
Let's delve more deeply into the structure of data frames and how we can process the data, extract subsets, and set up for further analysis.
Sep. 9 Working with Data Frames
Let's see how to extract information from data frames, add more data, sort the data, and merge information from two or more sources.
Sep. 11 Random Sampling and Probability
We use Python to demonstrate random sampling and explore the corresponding probabilities of different outcomes.
Sep. 13 For Loops and Functions for Simulation
Python for loops and functions enable us to automate and simplify repetitive tasks, which is essential for Monte Carlo simulations.
Sep. 16 Monte Carlo Studies of Sampling Distributions
How much variation is there in a sample statistic when we draw a random sample from a population? We investigate using Monte Carlo simulations.
Sep. 18 Statistics, Parameters and Estimation
How shall we collect data to estimate key population parameters? How can we estimate those parameters and determine the margin of error?
Sep. 20 Statistics, Parameters and Random Variables
In order to understand uncertainty better, the concept of a random variable is extremely useful for understanding the variation in sample statistics.
Sep. 23 Case Study in Data Science
Albert Man guest lectures on a data science project he did as part of a job interview!
Sep. 25 Computing and Visualizing Interval Probabililites and Quantiles
We develop the basics for computing the interval probabilities and percentiles needed for many confidence intervals and tests.
Sep. 27 Margin of Error for Sample-Based Estimates
Let's study the variation in sums, means and proportions and use their proporties to determine margin of error for these estimates.
Sep. 30 Normal Approximation and Confidence Intervals
We explore how to us sample means, proportions and other statistics to compute confidence intervals for population parameters.
Oct. 2 Review
Let's review what we've done so far - bring questions to class!
Oct. 4 Exam 1
In class exam, 218 Ceramics Building. Bring non-programmable calculator.
Oct. 7 Confidence intervals for general means
We explore large sample confidence intervals for the mean of a population and solve the mystery of n-1!
Oct. 9 Confidence intervals and significance tests for differences
Building on the results for single samples we explore how to compare samples from different subpopulations such as treatment/control, A/B testing and other grouping variables
Oct. 11 Formulating and testing hypotheses
We study a general approach for testing hypotheses about parameters of interest in several representative examples.
Oct. 14 z-tests, t-tests and degrees of freedom
We compare z-tests, which rely on the central limit theorem for large sample validity, and t-tests, which provide a small sample adjustment
Oct. 16 Introduction to Regression Modeling using StatsModels
Python examples with results and interpretation
Oct. 18 Regression modeling and inference
Coefficient standard errors, confidence intervals and prediction intervals
Oct. 21 Regression model assessment and prediction
We'll see how to access and use information about the model parameters, model fit and residuals
Oct. 23 Structured regression and categorical predictor variables
Analysis of variance and F tests for building models
Oct. 25 Comparing nested regression models
Examples of ANOVA F tests in model building
Oct. 28 ANOVA, F tests and Model Selection
More examples with results and interpretation
Oct. 30 Modeling probabilities using logit models
Odds ratios, 2 x 2 tables, and logistic regression
Nov. 1 Logistic regression modeling
Building and interpreting logit models with multiple explanatory variables
Nov. 4 Classification via Logistic Regression
Sensitivity, Specificity, ROC curves
Nov. 6 Review
Work on practice problems in class
Nov. 8 Review
Work on practice problems in class
Nov. 11 Exam 2
In class exam, 218 Ceramics Building. Bring non-programmable calculator.
Nov. 13 Train/Test ROC Analysis
We split the data into training and testing data to reduce bias in ROC evaluation of a classfier.
Nov. 15 Train/Test methods for model assessment
We use tools provided in scikit-learn for modeling and machine learning in Python
Nov. 18 Model Selection in Logistic Regression
We explore the tradeoff between model fit and model simplicity using criteria such as AIC and BIC
Nov. 20 Train/Test and Cross Validation with Scikit-Learn
Examples with regularized logistic regression
Nov. 22 Regularized Logistic Regression for High-Dimensional Data
Compare different regularization penalties for logistic regression with many feature variables
Dec. 2 Regularized Logistic Regression
Regularization penalties and cross-validation accuracy of regularized logit classifiers
Dec. 4 Regularized Linear Regression for High-Dimensional Data
We explore a machine learning approach for improving accuracy of multiple linear regression using penalized least squares, with application to gene expression analysis
Dec. 6 Regularized Linear Regression
Visualizing high dimensional data and selecting the regularization tuning parameter
Dec. 9 Review
Dec. 11 Review
Dec. 17 Exam Prep Help Session
69 English Building, 5:00 - 7:00 pm
Dec. 18 Office Hours
118 Illini Hall, 2:00 - 3:30 pm
Dec. 19 Final Exam
218 Ceramics Building, 8:00 - 11:00 am