Class Resources
Questions?
Guides
Course Topics
- Introduction to the “data science pipeline”.
- Formulating research questions.
- What kind of research questions can you ask, given how the data was collected?
- Types of analyses: question first or dataset first?
- Types of dataset collection.
- Numerical vs. categorical
- Random sample
- Representative samples
- Data from random experiments
- Data management
- How to read in csvs into python
- How to write dataframes to csvs
- Data cleaning, manipulation, and representation
- Creating dataframes
- Quickly describing dataframes
- Summarizing/aggregating a dataframe
- Filtering dataframes
- Subsetting/splicing dataframes
- Combining dataframes
- Sorting dataframes
- Sampling dataframes
- Basic detecting and dealing with missing values
- Descriptive Analytics
- Summary Statistics and Visualizations
- Numerical variables
- Categorical variables
- Categorical variable and a numerical variable
- Two categorical variables
- Two numerical variables
- Three or more variables
- Dimensionality Reduction
- Principal Component Analysis (PCA)
- Basic probability
- Sampling
- With and without replacement
- Two definitions of probability
- Law of large numbers
- Calculating probabilities:
- Using the uniform probability rule
- Of two independent events
- Of two dependent events
- Using combinatorics rules
- Using probability mass functions
- Using probability density functions
- Random variables
- Discrete random variables
- Continuous random variables
- Types of probability distributions and their properties
- Normal Distribution
- Standard Normal Distribution
- Inference Basics
- Population distribution vs. sample distribution vs. sampling distribution
- Population distribution of numerical values vs. sample distribution of numerical values vs. sampling distribution of sample means
- Population distribution of categorical values vs. sample distribution of categorical values vs. sampling distribution of sample proportions
- Two population distributions of numerical values vs. two sample distributions of numerical values vs. sampling distribution of sample mean differences
- Two population distributions of categorical values vs. two sample distributions of categorical values vs. sampling distribution of sample proportion differences
- What is the mean, standard deviation, and shape of the a.) population distribution, b.) sample distribution, and c.) sampling distribution (respectively) and how do they relate to each other?
- What is the Central Limit Theorem and why does it help us conduct inference?
- Making an Inference
- Make an inference about a population parameter using one of the following techniques
- A confidence interval
- Test statistic
- P-value
- Make an inference about one of the following population parameters:
- Population mean
- Population proportion
- Difference between two population means
- Difference between two population proportions
- One or more population slopes in a regression equation
- Predictive Analytics
- Fit a linear regression equation (for a numerical response variable).
- Checking conditions for a linear regression equation.
- Assess the fit for a linear regression equation.
- Select which explanatory variables to use in a linear regression equation.
- Fit a logistic regression equation (for a categorical response variable).
- Checking conditions for a linear regression equation.
- Assess the fit for a linear regression equation.
- Select which explanatory variables to use in a linear regression equation.
- Build classifier models using:
- Logistic regression model
- Make predictions using a regression equation.
- Prescriptive Analytics
- How to use your data science analysis to make “good decisions” given the problem you are trying to solve with data.
- Coding
- Github Version Control
- Pulling a remote repository to your local computer
- Pushing the updates made on your local computer to the remote repository
- Python
- Types of objects
- Creating functions
- if/else statements
- for loop creations