Course Schedule
Date | Event | Links |
---|---|---|
Jan. 22 |
Introduction to Data Science Exploration
We explore data from a Pew Research Center political opinion survey.
|
|
Jan. 24 |
Notebooks and Git Repositories
Examples of Python, Jupyter notebooks and git operations.
|
|
Jan. 27 |
Structure of Data Frames
Let's delve more deeply into the structure of data frames and how we can process the data, extract subsets, and set up for further analysis.
|
|
Jan. 29 |
Working with Data Frames
Let's see how to extract information from data frames, add more data, sort the data, and merge information from two or more sources.
|
|
Jan. 31 |
Quantitative Data Exploration
What if the data have missing values? How can we summarize and visualize qualitative and quantitative information in the data?
|
|
Feb. 3 |
Python, Pandas and Data Frames - Quantitative variables
Let's explore summary statistics, distributions and visuals for quantitative data, and see how to define our own functions for analysis.
|
|
Feb. 5 |
Random Sampling and Probability
We use Python to demonstrate random sampling and explore the corresponding probabilities of different outcomes.
|
|
Feb. 7 |
Random sampling and probability
Use combinatorial methods to calculate probabilities of compound events.
|
|
Feb. 10 |
Monte Carlo Studies of Sampling Distributions
How much variation is there in a sample statistic when we draw a random sample from a population? We investigate using Monte Carlo simulations.
|
|
Feb. 12 |
Statistics, Parameters and Estimation
In building population sampling models for data, the concepts of random variables and distributions are crucial.
|
|
Feb. 14 |
Statistics, Parameters and Random Variables
Compare sample statistics and population parameters to understand how the statistics estimate features of the population.
|
|
Feb. 17 |
Model based probabililites and quantiles
We develop the basics for computing the interval probabilities and percentiles needed for many confidence intervals and tests.
|
|
Feb. 19 |
Margin of Error for Sample-Based Estimates
Let's study the variation in sums, means and proportions and use their proporties to determine margin of error for these estimates.
|
|
Feb. 21 |
Normal Approximation and Confidence Intervals
We use sample means, proportions and their standard errors to compute confidence intervals for population parameters.
|
|
Feb. 24 |
Confidence intervals for general means
We explore large sample confidence intervals for the mean of a population and solve the mystery of n-1!
|
|
Feb. 26 |
Review
Let's review what we've done so far - bring questions to class! Exam study guide solutions will be posted in Compass after class today.
|
|
Feb. 28 |
Exam 1
In class exam, 1090 Lincoln Hall. Bring non-programmable calculator.
|
|
Mar. 2 |
Confidence intervals and hypothesis tests for differences
We explore how to make inferences about subpopulation differences in contexts such as treatment/control studies, A/B testing and sample surveys
|
|
Mar. 4 |
Formulating and testing hypotheses
We study a general strategy for hypotheses testing in several different scenarios.
|
|
Mar. 6 |
z-tests, t-tests and degrees of freedom
We compare z-tests, which rely on the central limit theorem for large sample validity, and t-tests, which provide a small sample adjustment
|
|
Mar. 9 |
Multiple linear regression modeling
General framework and examples
|
|
Mar. 11 |
Multiple regression inference
Coefficient standard errors, confidence intervals and prediction intervals
|
|
Mar. 13 |
(Online starting today) Regression modeling
See Piazza or Compass for the Zoom URL. We'll see how to access and use information about the model parameters, model fit and residuals
|
|
Mar. 23 |
(Online) Regression recap plus LaTeX and images in Jupyter notebooks
See Piazza or Compass for the Zoom URL. We'll review linear regression and demo LaTeX and image insertion for Jupyter notebooks
|
|
Mar. 25 |
Analysis of variance and F test for regression
Analysis of variance and F tests for building models
|
|
Mar. 27 |
Oneway ANOVA models with categorical predictors
Summarizing by groups and testing for group differences
|
|
Mar. 30 |
F tests for comparing nested models
ANOVA and the constrained/unconstrained model framework
|
|
Apr. 1 |
Modeling probabilities using logit models
Odds ratios, 2 x 2 tables, and logistic regression
|
|
Apr. 3 |
Logistic regression modeling
Building and interpreting logit models with multiple explanatory variables
|
|
Apr. 6 |
Classification via Logistic Regression
Sensitivity, specificity, logit classifier. The Exam 2 study guide notebook is available from _classnotes repo.
|
|
Apr. 8 |
ROC curves for logit classifiers
We use ROC curves to summarize the sensitivity / specificity tradeoff and overall accuracy of a scoring system
|
|
Apr. 10 |
Review
Discuss practice problems and other questions. The study guide notebook is available from the _classnotes repository.
|
|
Apr. 13 |
No Lecture Today. Exam 2 on git 9:00 am central time.
Exam 2 is distributed as a Jupyter notebook from the release repository. This is an open notes, open internet exam, but you must work on your own.
|
|
Apr. 15 |
Train/Test Predictive Analytics
By randomly splitting the data into training and testing data we separate model bulding from model evaluation to reduce bias
|
|
Apr. 17 |
Logit Model Selection
Log-likelihood-ratio tests and train/test AIC/BIC model selection for multiple logistic regression
|
|
Apr. 20 |
Logit Model Selection
Train/test splitting for AIC/BIC driven model selection and evaluation
|
|
Apr. 22 |
Regularized Logit Classifiers
Explore machine learning methods for high dimensional classification using penalized logistic regression in scikit-learn
|
|
Apr. 24 |
Regularized Logit Classifiers
Explore machine learning methods for train/test splitting and cross-validation of regularized logistic regression
|
|
Apr. 27 |
Regularized Linear Regression
Explore machine learning methods to compare different regularization penalties for linear regression with many feature variables
|
|
Apr. 29 |
Regularized Linear Regression
Cross-validation and information criteria for regularized linear regression with many variables
|
|
May. 1 |
Train/Test Regularized Regression
How to avoid fooling ourselves -- Comparing train/test performance with naive in-sample performance
|
|
May. 4 |
Review and project work
No lecture. Discuss your review questions and project work
|
|
May. 6 |
Last class of the semester
Discuss review questions and projects. Help session Monday, May 11, 5-7 pm on the open lab zoom link. Good luck on your finals!
|
|
May. 11 |
Project and exam prep help
Open lab zoom link 5:00 - 7:00 pm
|
|
May. 13 |
Final Exam Notebook Release
Released on git by 9am
|
|
May. 14 |
Final Exam Notebook Due
Commit and push before 11:59 pm
|
|