Exam Final

Labs: Exam Final

Final Exam notebook due Thursday, May 14, 11:59 pm

Final Exam Notebook Release

Lecture: Final Exam Notebook Release

Released on git by 9am

May 13, 2020
Project and exam prep help

Lecture: Project and exam prep help

Open lab zoom link 5:00 - 7:00 pm

May 11, 2020
Last class of the semester

Lecture: Last class of the semester

Discuss review questions and projects. Help session Monday, May 11, 5-7 pm on the open lab zoom link. Good luck on your finals!

May 6, 2020
Review and project work

Lecture: Review and project work

No lecture. Discuss your review questions and project work

May 4, 2020
Regularized Linear Regression

Labs: Regularized Linear Regression

Use train/test methodology to compare ordinary least squares and regularized least squares regression with many feature variables

May 1, 2020
Train/Test Regularized Regression

Lecture: Train/Test Regularized Regression

How to avoid fooling ourselves -- Comparing train/test performance with naive in-sample performance

May 1, 2020
Data Science Project

Labs: Data Science Project

Optional data science project -- use methods learned in the class to go further and deeper

April 29, 2020
Regularized Linear Regression

Lecture: Regularized Linear Regression

Cross-validation and information criteria for regularized linear regression with many variables

April 29, 2020
Regularized Linear Regression

Lecture: Regularized Linear Regression

Explore machine learning methods to compare different regularization penalties for linear regression with many feature variables

April 27, 2020
Regularized Logistic Regression

Labs: Regularized Logistic Regression

Train a model for classifying email messages as spam or non-spam and evaluate using train / test splitting and cross validation

April 24, 2020
Regularized Logit Classifiers

Lecture: Regularized Logit Classifiers

Explore machine learning methods for train/test splitting and cross-validation of regularized logistic regression

April 24, 2020
Regularized Logit Classifiers

Lecture: Regularized Logit Classifiers

Explore machine learning methods for high dimensional classification using penalized logistic regression in scikit-learn

April 22, 2020
Logit Model Selection

Lecture: Logit Model Selection

Train/test splitting for AIC/BIC driven model selection and evaluation

April 20, 2020
Logistic Regression Model Selection

Labs: Logistic Regression Model Selection

Train a logit model and evaluate on test data

April 17, 2020
Logit Model Selection

Lecture: Logit Model Selection

Log-likelihood-ratio tests and train/test AIC/BIC model selection for multiple logistic regression

April 17, 2020
Train/Test Predictive Analytics

Lecture: Train/Test Predictive Analytics

By randomly splitting the data into training and testing data we separate model bulding from model evaluation to reduce bias

April 15, 2020
Exam 2

Labs: Exam 2

Exam 2 notebook due Tuesday, April 14, 11:59 pm

April 13, 2020
No Lecture Today. Exam 2 on git 9:00 am central time.

Lecture: No Lecture Today. Exam 2 on git 9:00 am central time.

Exam 2 is distributed as a Jupyter notebook from the release repository. This is an open notes, open internet exam, but you must work on your own.

April 13, 2020
Review

Lecture: Review

Discuss practice problems and other questions. The study guide notebook is available from the _classnotes repository.

April 10, 2020
ROC curves for logit classifiers

Lecture: ROC curves for logit classifiers

We use ROC curves to summarize the sensitivity / specificity tradeoff and overall accuracy of a scoring system

April 8, 2020
Classification via Logistic Regression

Lecture: Classification via Logistic Regression

Sensitivity, specificity, logit classifier. The Exam 2 study guide notebook is available from _classnotes repo.

April 6, 2020
Logistic regression modeling

Lecture: Logistic regression modeling

Building and interpreting logit models with multiple explanatory variables

April 3, 2020
Odds Ratios and Logistic Regression

Labs: Odds Ratios and Logistic Regression

Study association between categorical variables using odds ratio analysis and logit models

April 2, 2020
Modeling probabilities using logit models

Lecture: Modeling probabilities using logit models

Odds ratios, 2 x 2 tables, and logistic regression

April 1, 2020
F tests for comparing nested models

Lecture: F tests for comparing nested models

ANOVA and the constrained/unconstrained model framework

March 30, 2020
Multiple Regression and ANOVA

Labs: Multiple Regression and ANOVA

Geographic analysis of US melanoma mortality rates and analysis of Iris species differences

March 27, 2020
Oneway ANOVA models with categorical predictors

Lecture: Oneway ANOVA models with categorical predictors

Summarizing by groups and testing for group differences

March 27, 2020
Analysis of variance and F test for regression

Lecture: Analysis of variance and F test for regression

Analysis of variance and F tests for building models

March 25, 2020
(Online) Regression recap plus LaTeX and images in Jupyter notebooks

Lecture: (Online) Regression recap plus LaTeX and images in Jupyter notebooks

See Piazza or Compass for the Zoom URL. We'll review linear regression and demo LaTeX and image insertion for Jupyter notebooks

March 23, 2020
Regression models and inference

Labs: Regression models and inference

Linear regression modeling and math for real and simulated data.

March 13, 2020
(Online starting today) Regression modeling

Lecture: (Online starting today) Regression modeling

See Piazza or Compass for the Zoom URL. We'll see how to access and use information about the model parameters, model fit and residuals

March 13, 2020
Multiple regression inference

Lecture: Multiple regression inference

Coefficient standard errors, confidence intervals and prediction intervals

March 11, 2020
Multiple linear regression modeling

Lecture: Multiple linear regression modeling

General framework and examples

March 9, 2020
Hypothesis Tests

Labs: Hypothesis Tests

Analyze lead exposure data and birthweight data while exploring connections between confidence intervals and hypothesis tests.

March 6, 2020
z-tests, t-tests and degrees of freedom

Lecture: z-tests, t-tests and degrees of freedom

We compare z-tests, which rely on the central limit theorem for large sample validity, and t-tests, which provide a small sample adjustment

March 6, 2020
Formulating and testing hypotheses

Lecture: Formulating and testing hypotheses

We study a general strategy for hypotheses testing in several different scenarios.

March 4, 2020
Confidence intervals and hypothesis tests for differences

Lecture: Confidence intervals and hypothesis tests for differences

We explore how to make inferences about subpopulation differences in contexts such as treatment/control studies, A/B testing and sample surveys

March 2, 2020
Exam 1

Lecture: Exam 1

In class exam, 1090 Lincoln Hall. Bring non-programmable calculator.

February 28, 2020
Review

Lecture: Review

Let's review what we've done so far - bring questions to class! Exam study guide solutions will be posted in Compass after class today.

February 26, 2020
Confidence intervals for general means

Lecture: Confidence intervals for general means

We explore large sample confidence intervals for the mean of a population and solve the mystery of n-1!

February 24, 2020
Normal Approximation and Confidence Intervals

Lecture: Normal Approximation and Confidence Intervals

We use sample means, proportions and their standard errors to compute confidence intervals for population parameters.

February 21, 2020
Sampling Distribution, Standard Error and Confidence Interval

Labs: Sampling Distribution, Standard Error and Confidence Interval

Work with samples from uniform and normal distributions, using normal approximations for the sample mean.

February 20, 2020
Margin of Error for Sample-Based Estimates

Lecture: Margin of Error for Sample-Based Estimates

Let's study the variation in sums, means and proportions and use their proporties to determine margin of error for these estimates.

February 19, 2020
Model based probabililites and quantiles

Lecture: Model based probabililites and quantiles

We develop the basics for computing the interval probabilities and percentiles needed for many confidence intervals and tests.

February 17, 2020
Random variables, parameters and samples

Labs: Random variables, parameters and samples

This lab covers the normal distribution, Bernoulli distribution, parameters and random samples.

February 14, 2020
Statistics, Parameters and Random Variables

Lecture: Statistics, Parameters and Random Variables

Compare sample statistics and population parameters to understand how the statistics estimate features of the population.

February 14, 2020
Statistics, Parameters and Estimation

Lecture: Statistics, Parameters and Estimation

In building population sampling models for data, the concepts of random variables and distributions are crucial.

February 12, 2020
Monte Carlo Studies of Sampling Distributions

Lecture: Monte Carlo Studies of Sampling Distributions

How much variation is there in a sample statistic when we draw a random sample from a population? We investigate using Monte Carlo simulations.

February 10, 2020
Random sampling and probability

Lecture: Random sampling and probability

Use combinatorial methods to calculate probabilities of compound events.

February 7, 2020
Sampling, probability and looping

Labs: Sampling, probability and looping

This lab covers sampling, probability, for loops, and defining your own function.

February 6, 2020
Random Sampling and Probability

Lecture: Random Sampling and Probability

We use Python to demonstrate random sampling and explore the corresponding probabilities of different outcomes.

February 5, 2020
Python, Pandas and Data Frames - Quantitative variables

Lecture: Python, Pandas and Data Frames - Quantitative variables

Let's explore summary statistics, distributions and visuals for quantitative data, and see how to define our own functions for analysis.

February 3, 2020
Quantitative Data Exploration

Lecture: Quantitative Data Exploration

What if the data have missing values? How can we summarize and visualize qualitative and quantitative information in the data?

January 31, 2020
Data Frames

Labs: Data Frames

In this lab you will learn more about data structure in Python, read external data from csv files, and perform basic data extraction, analytics, and visualization.

January 30, 2020
Working with Data Frames

Lecture: Working with Data Frames

Let's see how to extract information from data frames, add more data, sort the data, and merge information from two or more sources.

January 29, 2020
Structure of Data Frames

Lecture: Structure of Data Frames

Let's delve more deeply into the structure of data frames and how we can process the data, extract subsets, and set up for further analysis.

January 27, 2020
Notebooks and Git Repositories

Lecture: Notebooks and Git Repositories

Examples of Python, Jupyter notebooks and git operations.

January 24, 2020
Data Science Setup

Labs: Data Science Setup

In this first lab, you will set up your account and computer for Data Science Exploration and begin to work with Python notebooks

January 23, 2020
Introduction to Data Science Exploration

Lecture: Introduction to Data Science Exploration

We explore data from a Pew Research Center political opinion survey.

January 22, 2020
Welcome to Data Science Exploration!

Welcome to Data Science Exploration!

Our first lecture is Wednesday, Jan. 22 at 1:00pm in 1090 Lincoln Hall. See you there!

January 17, 2020