


Lecture: Regularized Linear Regression
Visualizing high dimensional data and selecting the regularization tuning parameter

Lecture: Regularized Linear Regression for High-Dimensional Data
We explore a machine learning approach for improving accuracy of multiple linear regression using penalized least squares, with application to gene expression analysis

Lecture: Regularized Logistic Regression
Regularization penalties and cross-validation accuracy of regularized logit classifiers

Lecture: Regularized Logistic Regression for High-Dimensional Data
Compare different regularization penalties for logistic regression with many feature variables

Lecture: Train/Test and Cross Validation with Scikit-Learn
Examples with regularized logistic regression

Lecture: Model Selection in Logistic Regression
We explore the tradeoff between model fit and model simplicity using criteria such as AIC and BIC

Lecture: Train/Test methods for model assessment
We use tools provided in scikit-learn for modeling and machine learning in Python

Lecture: Train/Test ROC Analysis
We split the data into training and testing data to reduce bias in ROC evaluation of a classfier.

Lecture: Exam 2
In class exam, 218 Ceramics Building. Bring non-programmable calculator.



Lecture: Classification via Logistic Regression
Sensitivity, Specificity, ROC curves

Lecture: Logistic regression modeling
Building and interpreting logit models with multiple explanatory variables

Lecture: Modeling probabilities using logit models
Odds ratios, 2 x 2 tables, and logistic regression

Lecture: ANOVA, F tests and Model Selection
More examples with results and interpretation

Lecture: Comparing nested regression models
Examples of ANOVA F tests in model building

Lecture: Structured regression and categorical predictor variables
Analysis of variance and F tests for building models

Lecture: Regression model assessment and prediction
We'll see how to access and use information about the model parameters, model fit and residuals

Lecture: Regression modeling and inference
Coefficient standard errors, confidence intervals and prediction intervals

Lecture: Introduction to Regression Modeling using StatsModels
Python examples with results and interpretation

Lecture: z-tests, t-tests and degrees of freedom
We compare z-tests, which rely on the central limit theorem for large sample validity, and t-tests, which provide a small sample adjustment

Lecture: Formulating and testing hypotheses
We study a general approach for testing hypotheses about parameters of interest in several representative examples.

Lecture: Confidence intervals and significance tests for differences
Building on the results for single samples we explore how to compare samples from different subpopulations such as treatment/control, A/B testing and other grouping variables

Lecture: Confidence intervals for general means
We explore large sample confidence intervals for the mean of a population and solve the mystery of n-1!

Lecture: Exam 1
In class exam, 218 Ceramics Building. Bring non-programmable calculator.

Lecture: Review
Let's review what we've done so far - bring questions to class!

Lecture: Normal Approximation and Confidence Intervals
We explore how to us sample means, proportions and other statistics to compute confidence intervals for population parameters.

Lecture: Margin of Error for Sample-Based Estimates
Let's study the variation in sums, means and proportions and use their proporties to determine margin of error for these estimates.

Lecture: Computing and Visualizing Interval Probabililites and Quantiles
We develop the basics for computing the interval probabilities and percentiles needed for many confidence intervals and tests.

Lecture: Case Study in Data Science
Albert Man guest lectures on a data science project he did as part of a job interview!

Lecture: Statistics, Parameters and Random Variables
In order to understand uncertainty better, the concept of a random variable is extremely useful for understanding the variation in sample statistics.

Lecture: Statistics, Parameters and Estimation
How shall we collect data to estimate key population parameters? How can we estimate those parameters and determine the margin of error?

Lecture: Monte Carlo Studies of Sampling Distributions
How much variation is there in a sample statistic when we draw a random sample from a population? We investigate using Monte Carlo simulations.

Lecture: For Loops and Functions for Simulation
Python for loops and functions enable us to automate and simplify repetitive tasks, which is essential for Monte Carlo simulations.

Lecture: Random Sampling and Probability
We use Python to demonstrate random sampling and explore the corresponding probabilities of different outcomes.

Lecture: Working with Data Frames
Let's see how to extract information from data frames, add more data, sort the data, and merge information from two or more sources.

Lecture: Structure of Data Frames
Let's delve more deeply into the structure of data frames and how we can process the data, extract subsets, and set up for further analysis.

Lecture: Python, Pandas and Data Frames - Quantitative variables
Let's explore summary statistics, distributions and visuals for quantitative data

Lecture: Labor Day

Lecture: Data and Python
What if the data have missing values? How can we summarize qualitative and quantitative data, and study relations between different variables in the data? Let's explore how core modules like NumPy, Pandas, and Matplotlib help us manage and visualize data for these purposes.

Lecture: Data and Python
What do we mean by data? How can we organize data? How can we visualize and summarize data? Python is a powerful data science environment for organizing and understanding our data.

Lecture: Introduction to Data Science Exploration
Building on STAT 107, Data Science Discovery, let's explore data science and statistical analysis in real world settings!

Welcome to Data Science Exploration!
Our first lecture is Monday, Aug. 26 at 1:00pm in 218 Ceramics Building. See you there!