Course Schedule

Date Event Links
Jan. 22 Introduction to Data Science Exploration
We explore data from a Pew Research Center political opinion survey.
Jan. 24 Notebooks and Git Repositories
Examples of Python, Jupyter notebooks and git operations.
Jan. 27 Structure of Data Frames
Let's delve more deeply into the structure of data frames and how we can process the data, extract subsets, and set up for further analysis.
Jan. 29 Working with Data Frames
Let's see how to extract information from data frames, add more data, sort the data, and merge information from two or more sources.
Jan. 31 Quantitative Data Exploration
What if the data have missing values? How can we summarize and visualize qualitative and quantitative information in the data?
Feb. 3 Python, Pandas and Data Frames - Quantitative variables
Let's explore summary statistics, distributions and visuals for quantitative data, and see how to define our own functions for analysis.
Feb. 5 Random Sampling and Probability
We use Python to demonstrate random sampling and explore the corresponding probabilities of different outcomes.
Feb. 7 Random sampling and probability
Use combinatorial methods to calculate probabilities of compound events.
Feb. 10 Monte Carlo Studies of Sampling Distributions
How much variation is there in a sample statistic when we draw a random sample from a population? We investigate using Monte Carlo simulations.
Feb. 12 Statistics, Parameters and Estimation
In building population sampling models for data, the concepts of random variables and distributions are crucial.
Feb. 14 Statistics, Parameters and Random Variables
Compare sample statistics and population parameters to understand how the statistics estimate features of the population.
Feb. 17 Model based probabililites and quantiles
We develop the basics for computing the interval probabilities and percentiles needed for many confidence intervals and tests.
Feb. 19 Margin of Error for Sample-Based Estimates
Let's study the variation in sums, means and proportions and use their proporties to determine margin of error for these estimates.
Feb. 21 Normal Approximation and Confidence Intervals
We use sample means, proportions and their standard errors to compute confidence intervals for population parameters.
Feb. 24 Confidence intervals for general means
We explore large sample confidence intervals for the mean of a population and solve the mystery of n-1!
Feb. 26 Review
Let's review what we've done so far - bring questions to class! Exam study guide solutions will be posted in Compass after class today.
Feb. 28 Exam 1
In class exam, 1090 Lincoln Hall. Bring non-programmable calculator.
Mar. 2 Confidence intervals and hypothesis tests for differences
We explore how to make inferences about subpopulation differences in contexts such as treatment/control studies, A/B testing and sample surveys
Mar. 4 Formulating and testing hypotheses
We study a general strategy for hypotheses testing in several different scenarios.
Mar. 6 z-tests, t-tests and degrees of freedom
We compare z-tests, which rely on the central limit theorem for large sample validity, and t-tests, which provide a small sample adjustment
Mar. 9 Multiple linear regression modeling
General framework and examples
Mar. 11 Multiple regression inference
Coefficient standard errors, confidence intervals and prediction intervals
Mar. 13 (Online starting today) Regression modeling
See Piazza or Compass for the Zoom URL. We'll see how to access and use information about the model parameters, model fit and residuals
Mar. 23 (Online) Regression recap plus LaTeX and images in Jupyter notebooks
See Piazza or Compass for the Zoom URL. We'll review linear regression and demo LaTeX and image insertion for Jupyter notebooks
Mar. 25 Analysis of variance and F test for regression
Analysis of variance and F tests for building models
Mar. 27 Oneway ANOVA models with categorical predictors
Summarizing by groups and testing for group differences
Mar. 30 F tests for comparing nested models
ANOVA and the constrained/unconstrained model framework
Apr. 1 Modeling probabilities using logit models
Odds ratios, 2 x 2 tables, and logistic regression
Apr. 3 Logistic regression modeling
Building and interpreting logit models with multiple explanatory variables
Apr. 6 Classification via Logistic Regression
Sensitivity, specificity, logit classifier. The Exam 2 study guide notebook is available from _classnotes repo.
Apr. 8 ROC curves for logit classifiers
We use ROC curves to summarize the sensitivity / specificity tradeoff and overall accuracy of a scoring system
Apr. 10 Review
Discuss practice problems and other questions. The study guide notebook is available from the _classnotes repository.
Apr. 13 No Lecture Today. Exam 2 on git 9:00 am central time.
Exam 2 is distributed as a Jupyter notebook from the release repository. This is an open notes, open internet exam, but you must work on your own.
Apr. 15 Train/Test Predictive Analytics
By randomly splitting the data into training and testing data we separate model bulding from model evaluation to reduce bias
Apr. 17 Logit Model Selection
Log-likelihood-ratio tests and train/test AIC/BIC model selection for multiple logistic regression
Apr. 20 Logit Model Selection
Train/test splitting for AIC/BIC driven model selection and evaluation
Apr. 22 Regularized Logit Classifiers
Explore machine learning methods for high dimensional classification using penalized logistic regression in scikit-learn
Apr. 24 Regularized Logit Classifiers
Explore machine learning methods for train/test splitting and cross-validation of regularized logistic regression
Apr. 27 Regularized Linear Regression
Explore machine learning methods to compare different regularization penalties for linear regression with many feature variables
Apr. 29 Regularized Linear Regression
Cross-validation and information criteria for regularized linear regression with many variables
May. 1 Train/Test Regularized Regression
How to avoid fooling ourselves -- Comparing train/test performance with naive in-sample performance
May. 4 Review and project work
No lecture. Discuss your review questions and project work
May. 6 Last class of the semester
Discuss review questions and projects. Help session Monday, May 11, 5-7 pm on the open lab zoom link. Good luck on your finals!
May. 11 Project and exam prep help
Open lab zoom link 5:00 - 7:00 pm
May. 13 Final Exam Notebook Release
Released on git by 9am
May. 14 Final Exam Notebook Due
Commit and push before 11:59 pm