STAT 207: Data Science Exploration at The University of Illinois

Lecture: Review

Final_study_guide

December 11, 2019

Lecture: Review

Final_study_guide

December 9, 2019

Lecture: Regularized Linear Regression

Visualizing high dimensional data and selecting the regularization tuning parameter

15_regularized_regression

December 6, 2019

Labs: Regularized Linear Regression

Experiment with Lasso regression with high dimensional features

December 5, 2019

Lecture: Regularized Linear Regression for High-Dimensional Data

We explore a machine learning approach for improving accuracy of multiple linear regression using penalized least squares, with application to gene expression analysis

15_regularized_regression

December 4, 2019

Lecture: Regularized Logistic Regression

Regularization penalties and cross-validation accuracy of regularized logit classifiers

14_cross_validation

December 2, 2019

Labs: Regularized Logistic Regression

Use gene expression profiles and clinical data to build a regularized logit classifer for breast cancer recurrence

November 24, 2019

Lecture: Regularized Logistic Regression for High-Dimensional Data

Compare different regularization penalties for logistic regression with many feature variables

14_cross_validation

November 22, 2019

Lecture: Train/Test and Cross Validation with Scikit-Learn

Examples with regularized logistic regression

14_cross_validation

November 20, 2019

Lecture: Model Selection in Logistic Regression

We explore the tradeoff between model fit and model simplicity using criteria such as AIC and BIC

13_model_selection

November 18, 2019

Lecture: Train/Test methods for model assessment

We use tools provided in scikit-learn for modeling and machine learning in Python

12_roc_analysis

November 15, 2019

Labs: Logit Classifier Training and Testing

Build a logit classifer using training data and evaluate on test data

November 14, 2019

Lecture: Train/Test ROC Analysis

We split the data into training and testing data to reduce bias in ROC evaluation of a classfier.

12_roc_analysis

November 13, 2019

Lecture: Exam 2

In class exam, 218 Ceramics Building. Bring non-programmable calculator.

November 11, 2019

Lecture: Review

Work on practice problems in class

Exam_2_study_guide

November 8, 2019

Lecture: Review

Work on practice problems in class

Exam_2_study_guide

November 6, 2019

Lecture: Classification via Logistic Regression

Sensitivity, Specificity, ROC curves

12_roc_analysis

November 4, 2019

Lecture: Logistic regression modeling

Building and interpreting logit models with multiple explanatory variables

11_logistic_regression

November 1, 2019

Labs: Odds Ratios and Logistic Regression

Study association between categorical variables and model categorical responses using logits

October 31, 2019

Lecture: Modeling probabilities using logit models

Odds ratios, 2 x 2 tables, and logistic regression

11_logistic_regression

October 30, 2019

Lecture: ANOVA, F tests and Model Selection

More examples with results and interpretation

10_anova

October 28, 2019

Lecture: Comparing nested regression models

Examples of ANOVA F tests in model building

10_anova

October 25, 2019

Labs: Compare Regression Models

Compare nested regression models for U.S. melanoma mortality rates

October 24, 2019

Lecture: Structured regression and categorical predictor variables

Analysis of variance and F tests for building models

10_anova

October 23, 2019

Lecture: Regression model assessment and prediction

We'll see how to access and use information about the model parameters, model fit and residuals

09_linear_regression

October 21, 2019

Lecture: Regression modeling and inference

Coefficient standard errors, confidence intervals and prediction intervals

09_linear_regression

October 18, 2019

Labs: Regression models and inference

Apply two-sample analysis and linear regression modeling to real and simulated data.

October 17, 2019

Lecture: Introduction to Regression Modeling using StatsModels

Python examples with results and interpretation

09_linear_regression

October 16, 2019

Lecture: z-tests, t-tests and degrees of freedom

We compare z-tests, which rely on the central limit theorem for large sample validity, and t-tests, which provide a small sample adjustment

08_testing

October 14, 2019

Lecture: Formulating and testing hypotheses

We study a general approach for testing hypotheses about parameters of interest in several representative examples.

08_testing

October 11, 2019

Labs: Confidence Intervals and Hypothesis Tests

Analyze lead exposure data while exploring connections between confidence intervals and hypothesis tests.

October 10, 2019

Lecture: Confidence intervals and significance tests for differences

Building on the results for single samples we explore how to compare samples from different subpopulations such as treatment/control, A/B testing and other grouping variables

08_testing

October 9, 2019

Lecture: Confidence intervals for general means

We explore large sample confidence intervals for the mean of a population and solve the mystery of n-1!

07_standard_errors

October 7, 2019

Lecture: Exam 1

In class exam, 218 Ceramics Building. Bring non-programmable calculator.

October 4, 2019

Lecture: Review

Let's review what we've done so far - bring questions to class!

October 2, 2019

Lecture: Normal Approximation and Confidence Intervals

We explore how to us sample means, proportions and other statistics to compute confidence intervals for population parameters.

07_standard_errors

September 30, 2019

Lecture: Margin of Error for Sample-Based Estimates

Let's study the variation in sums, means and proportions and use their proporties to determine margin of error for these estimates.

07_standard_errors

September 27, 2019

Labs: Standard Errors for Means and Proportions

Work with the uniform and binomial distributions, and normal approximations for the sample mean and sample proportion.

September 26, 2019

Lecture: Computing and Visualizing Interval Probabililites and Quantiles

We develop the basics for computing the interval probabilities and percentiles needed for many confidence intervals and tests.

06_statistical_estimation

September 25, 2019

Lecture: Case Study in Data Science

Albert Man guest lectures on a data science project he did as part of a job interview!

September 23, 2019

Lecture: Statistics, Parameters and Random Variables

In order to understand uncertainty better, the concept of a random variable is extremely useful for understanding the variation in sample statistics.

06_statistical_estimation

September 20, 2019

Labs: Normal and Bernoulli Distributions

This lab covers the normal distribution, Bernoulli distribution, parameters and random samples.

September 19, 2019

Lecture: Statistics, Parameters and Estimation

How shall we collect data to estimate key population parameters? How can we estimate those parameters and determine the margin of error?

06_statistical_estimation

September 18, 2019

Lecture: Monte Carlo Studies of Sampling Distributions

How much variation is there in a sample statistic when we draw a random sample from a population? We investigate using Monte Carlo simulations.

05_simulation

September 16, 2019

Lecture: For Loops and Functions for Simulation

Python for loops and functions enable us to automate and simplify repetitive tasks, which is essential for Monte Carlo simulations.

05_simulation

September 13, 2019

Labs: Sampling, probability and looping

This lab covers sampling, probability, for loops, and making your own function for simulations.

September 12, 2019

Lecture: Random Sampling and Probability

We use Python to demonstrate random sampling and explore the corresponding probabilities of different outcomes.

04_sampling

September 11, 2019

Lecture: Working with Data Frames

Let's see how to extract information from data frames, add more data, sort the data, and merge information from two or more sources.

03_dataframe

September 9, 2019

Lecture: Structure of Data Frames

Let's delve more deeply into the structure of data frames and how we can process the data, extract subsets, and set up for further analysis.

03_dataframe

September 6, 2019

Labs: Data Frames

In this lab you will learn more about data types in Python, read external data from csv files, and perform basic data extraction, analytics, and interpretation.

September 5, 2019

Lecture: Python, Pandas and Data Frames - Quantitative variables

Let's explore summary statistics, distributions and visuals for quantitative data

02_quant

September 4, 2019

Lecture: Labor Day

September 2, 2019

Lecture: Data and Python

What if the data have missing values? How can we summarize qualitative and quantitative data, and study relations between different variables in the data? Let's explore how core modules like NumPy, Pandas, and Matplotlib help us manage and visualize data for these purposes.

02_quant

August 30, 2019

Labs: Data Science Setup

Data scientists use powerful tools to help learn about data. In this first lab, you will set up your account and computer for Data Science Exploration and begin to work with Python notebooks

August 29, 2019

Lecture: Data and Python

What do we mean by data? How can we organize data? How can we visualize and summarize data? Python is a powerful data science environment for organizing and understanding our data.

August 28, 2019

Lecture: Introduction to Data Science Exploration

Building on STAT 107, Data Science Discovery, let's explore data science and statistical analysis in real world settings!

August 26, 2019

Welcome to Data Science Exploration!

Our first lecture is Monday, Aug. 26 at 1:00pm in 218 Ceramics Building. See you there!

August 21, 2019