Course Schedule

Date	Event	Links
Jan. 22	Introduction to Data Science Exploration We explore data from a Pew Research Center political opinion survey.	01_Intro 01_Intro_attachment Join course Piazza
Jan. 24	Notebooks and Git Repositories Examples of Python, Jupyter notebooks and git operations.	Git_Intro
Jan. 27	Structure of Data Frames Let's delve more deeply into the structure of data frames and how we can process the data, extract subsets, and set up for further analysis.	02_DataFrame
Jan. 29	Working with Data Frames Let's see how to extract information from data frames, add more data, sort the data, and merge information from two or more sources.	02_DataFrame
Jan. 31	Quantitative Data Exploration What if the data have missing values? How can we summarize and visualize qualitative and quantitative information in the data?	03_Quant git_repo_classnotes
Feb. 3	Python, Pandas and Data Frames - Quantitative variables Let's explore summary statistics, distributions and visuals for quantitative data, and see how to define our own functions for analysis.	03_Quant
Feb. 5	Random Sampling and Probability We use Python to demonstrate random sampling and explore the corresponding probabilities of different outcomes.	04_Sampling
Feb. 7	Random sampling and probability Use combinatorial methods to calculate probabilities of compound events.	04_Sampling handout
Feb. 10	Monte Carlo Studies of Sampling Distributions How much variation is there in a sample statistic when we draw a random sample from a population? We investigate using Monte Carlo simulations.	05_Simulation
Feb. 12	Statistics, Parameters and Estimation In building population sampling models for data, the concepts of random variables and distributions are crucial.	06_Statistical_Estimation
Feb. 14	Statistics, Parameters and Random Variables Compare sample statistics and population parameters to understand how the statistics estimate features of the population.	06_Statistical_Estimation Handout
Feb. 17	Model based probabililites and quantiles We develop the basics for computing the interval probabilities and percentiles needed for many confidence intervals and tests.	06_Statistical_Estimation
Feb. 19	Margin of Error for Sample-Based Estimates Let's study the variation in sums, means and proportions and use their proporties to determine margin of error for these estimates.	07_Standard_Errors
Feb. 21	Normal Approximation and Confidence Intervals We use sample means, proportions and their standard errors to compute confidence intervals for population parameters.	07_Standard_Errors
Feb. 24	Confidence intervals for general means We explore large sample confidence intervals for the mean of a population and solve the mystery of n-1!	07_Standard_Errors Handout
Feb. 26	Review Let's review what we've done so far - bring questions to class! Exam study guide solutions will be posted in Compass after class today.	Exam_1_Study_Guide
Feb. 28	Exam 1 In class exam, 1090 Lincoln Hall. Bring non-programmable calculator.
Mar. 2	Confidence intervals and hypothesis tests for differences We explore how to make inferences about subpopulation differences in contexts such as treatment/control studies, A/B testing and sample surveys	08_Hypothesis_Testing
Mar. 4	Formulating and testing hypotheses We study a general strategy for hypotheses testing in several different scenarios.	08_Hypothesis_Testing
Mar. 6	z-tests, t-tests and degrees of freedom We compare z-tests, which rely on the central limit theorem for large sample validity, and t-tests, which provide a small sample adjustment	08_Hypothesis_Testing
Mar. 9	Multiple linear regression modeling General framework and examples	09_Linear_Regression
Mar. 11	Multiple regression inference Coefficient standard errors, confidence intervals and prediction intervals	09_Linear_Regression
Mar. 13	(Online starting today) Regression modeling See Piazza or Compass for the Zoom URL. We'll see how to access and use information about the model parameters, model fit and residuals	09_Linear_regression
Mar. 23	(Online) Regression recap plus LaTeX and images in Jupyter notebooks See Piazza or Compass for the Zoom URL. We'll review linear regression and demo LaTeX and image insertion for Jupyter notebooks	09_Linear_regression
Mar. 25	Analysis of variance and F test for regression Analysis of variance and F tests for building models	10_ANOVA
Mar. 27	Oneway ANOVA models with categorical predictors Summarizing by groups and testing for group differences	10_ANOVA
Mar. 30	F tests for comparing nested models ANOVA and the constrained/unconstrained model framework	10_ANOVA
Apr. 1	Modeling probabilities using logit models Odds ratios, 2 x 2 tables, and logistic regression	11_Logistic_regression
Apr. 3	Logistic regression modeling Building and interpreting logit models with multiple explanatory variables	11_Logistic_regression
Apr. 6	Classification via Logistic Regression Sensitivity, specificity, logit classifier. The Exam 2 study guide notebook is available from _classnotes repo.	12_Classification_and_ROC Exam_2_Study_Guide
Apr. 8	ROC curves for logit classifiers We use ROC curves to summarize the sensitivity / specificity tradeoff and overall accuracy of a scoring system	12_Classification_and_ROC
Apr. 10	Review Discuss practice problems and other questions. The study guide notebook is available from the _classnotes repository.	Exam_2_Study_Guide
Apr. 13	No Lecture Today. Exam 2 on git 9:00 am central time. Exam 2 is distributed as a Jupyter notebook from the release repository. This is an open notes, open internet exam, but you must work on your own.
Apr. 15	Train/Test Predictive Analytics By randomly splitting the data into training and testing data we separate model bulding from model evaluation to reduce bias	13_Train_Test
Apr. 17	Logit Model Selection Log-likelihood-ratio tests and train/test AIC/BIC model selection for multiple logistic regression	14_Model_Selection
Apr. 20	Logit Model Selection Train/test splitting for AIC/BIC driven model selection and evaluation	14_Model_Selection
Apr. 22	Regularized Logit Classifiers Explore machine learning methods for high dimensional classification using penalized logistic regression in scikit-learn	15_regularized_logit
Apr. 24	Regularized Logit Classifiers Explore machine learning methods for train/test splitting and cross-validation of regularized logistic regression	15_regularized_logit
Apr. 27	Regularized Linear Regression Explore machine learning methods to compare different regularization penalties for linear regression with many feature variables	16_regularized_linear
Apr. 29	Regularized Linear Regression Cross-validation and information criteria for regularized linear regression with many variables	16_regularized_linear
May. 1	Train/Test Regularized Regression How to avoid fooling ourselves -- Comparing train/test performance with naive in-sample performance	16_regularized_linear
May. 4	Review and project work No lecture. Discuss your review questions and project work	Final_Exam_Study_Guide
May. 6	Last class of the semester Discuss review questions and projects. Help session Monday, May 11, 5-7 pm on the open lab zoom link. Good luck on your finals!	Final_Exam_Study_Guide
May. 11	Project and exam prep help Open lab zoom link 5:00 - 7:00 pm
May. 13	Final Exam Notebook Release Released on git by 9am
May. 14	Final Exam Notebook Due Commit and push before 11:59 pm