Class Resources
Questions?
Guides
Course Staff and Tips
This course is comprised of the following four components.
Main Lecture
- Lecture Time: TuTh 11:00AM - 12:20PM CST
- Location: Online (Canvas Zoom Links)
- Instructor: Dr. Victoria (Tori) Ellison
Lecture Attendance Policy
Not part of participation grade, but highly encouraged! See the bonus point opportunites section for how you can earn 0.33-0.67 bonus points for each lecture you attend!
Before Lecture Materials
Before the lecture, zip files containing the Jupyter notebooks, csv files, and pdfs that we will markup can be found in the Course Schedule for the corresponding day. These should be posted at most 24 hours before the lecture starts.
- Slides pdf: This pdf will focus more on the theoretical content of the class. This pdf contains "skeleton notes" that we will fill out in class. I would suggest downloading (perhaps printing) these pdf's yourself and taking notes in class.
- csv file (usually)
- Jupyter notebook file: The Jupyter notebook file will introduce new Python functions, parameters, packages etc. We will use it to go through real-world statistical applications of the theoretical content that we talk about in the slides pdf. I would suggest opening this Jupyter notebook and going through the code yourself in class. Try changing some of the parameters/values in the code to see what it does!
- Notebook pdf: This is simply just a pdf copy of the Jupyter notebook file. I would suggest downloading (perhaps printing) these pdf's yourself and taking notes in class.
Post Lecture Materials
After the lecture, copies of the pdfs that we marked up as well as the lecture video will be posted on Canvas within 24 hours.
Assigned Lab Sections
These are the lab sections/times that you specifically enrolled in. You should only attend the lab that you enrolled in.
Lab Section Time | Location | TA | |
W 9:00am-10:20 | Armory 428 | Junseok Yang | jyang247@illinois.edu |
W 11:00am-12:20 | Armory 428 | Junseok Yang | jyang247@illinois.edu |
W 1:00pm-2:20 | Armory 428 | Dongxiao Wu | dw12@illinois.edu |
W 3:00pm-4:20 | Armory 428 | Dongxiao Wu | dw12@illinois.edu |
Lab Attendance Policy: Your attendance at the labs is part of your participation grade. You should only attend the lab section that you enrolled in. You get 5 points for each lab that you attend not including the final presentation lab. You max out at 50 lab attendence points. This means that you can miss 4 labs penalty free.
Lab Purpose: The purpose of the assigned lab sections is to give you a place to work on the weekly lab assignments where you can ask for help and get quick feedback from the teaching assistant and course assistants assigned to your lab section. Later in the semester, you will use these lab sections to work on your final projects.
Lab Assignment Components: Your weekly lab assignments will be comprised of two parts: a.) an individual lab assignment and b.) a group lab assignment. Thus, these assigned labs will give you the space to meet and collaborate with your group in person.
Additional (Optional) Lab
The lab assignments will generally be due Tuesdays at 11:59pm CST. So to get additional help answering any last minute questions we will have additional (optional) lab times.
Lab Section Time | Location | CAS |
T 5pm-7pm | In-Person at 1028 Lincoln Hall OR Online via Zoom |
Afnan Dzaharudin Harshitha Vetrivel Parkin (Peter) Pongrojpaw |
These optional labs will be held in-person, however you can also attend online if you'd like (see Canvas for Zoom links.)
STAT207 Open Lab Queue
If you have a question to ask during Open Labs, make sure that you submit your question to the queue first so the TAs can answer everyone's questions in an orderly fashion.
https://queue.illinois.edu/q/stat207
Instructor Office Hours
Office Hours Time | Location | Instructor |
Fridays 3:30-4:30pm CST Mondays 11:30am-12:30pm CST |
Online (Zoom) | Tori Ellison |
Feel free to drop by and ask me questions about course content, assignment/project questions, or any other class related topics. I also worked in the data science industry for 6 years, so feel free to ask me any general career/industry related questions that you might have.
Alternatively, if you wanted to let a struggling millennial know to to pull off wearing white sneakers, I'm quite curious.
- Instructor: Tori Ellison, Department of Statistics
- Teaching Assistants:
Lab Section Time TA Email W 9:00am-10:20 Junseok Yang jyang247@illinois.edu W 11:00am-12:20 Junseok Yang jyang247@illinois.edu W 1:00pm-2:20 Dongxiao Wu dw12@illinois.edu W 3:00pm-4:20 Dongxiao Wu dw12@illinois.edu - Course Assistants: We have two TAs and 7 CAs affiliated with this class who will be assisting you with your questions and the grading. Almost everyone on our team was once a former STAT207 student who did great in the class. Get to know your TAs/CAs and what tips they have for STAT207 success here!
- Official Course Website: http://courses.las.illinois.edu/fall2023/stat207/
- Course Canvas Page: https://canvas.illinois.edu/courses/30296
- UIUC Courses Github Enterprise Organization Page: https://github.com/illinois-cs-coursework
- Piazza Discussion Board: https://piazza.com/illinois/fall2023/stat207 (Access Code: 1e1jvnuwvyp)
- Open Labs STAT207 Queue: https://queue.illinois.edu/q/stat207
Overview: Building on the foundation of STAT 107, Data Science Discovery, we use Python, Jupyter notebooks, and GitHub to explore statistical concepts and the data science pipeline, combined with the statistical analysis of STAT 200. As we explore data science we will do the following.
- Develop an understanding of probability models for noisy data and how these translate into uncertainty analysis and statistical inference
- Understand how modeling assumptions and sampling frames affect our conclusions
- Become adept with multiple regression modeling, basic machine learning, and inference
- Become proficient in Python coding for data management, analytics, visualization
- Understand and use GitHub repositories, the industry standard for submitting code and reports
STAT107
-
Required Calculator (You can use your computer's calculator.)
-
Laptop Computer: You need a laptop running Windows, OS X, or Linux. Tablets, Chromebooks, and iPads are not supported. You will need to be able to install both Python and git to complete the labs (instructions provided).
-
Lecture notes: These will be posted on the course schedule.
- STAT207 E-Book: Dr. Deeke and I wrote a STAT207 e-book this summer! Link TBD. :
-
Other Helpful Online Books:To read more about the topics in this course.
- J. VanderPlas (2016) Python Data Science Handbook, https://jakevdp.github.io/PythonDataScienceHandbook/
- Diez, Barr, and Cetinkaya-Rundel, (2015), OpenIntro Statistics https://www.openintro.org/download.php?file=os3&redirect=/stat/textbook/os3.php
Graded Components
Course grades are computed based on your percentage out of 910 points for the course. The graded components are as follows:
Graded Component | Total Points | Percentage of Final Grade |
Lab Attendance | 50 | 5.5% |
Individual Lab Assignment Part | 250 | 27.5% |
Group Lab Assignment Part | 50 | 5.5% |
Midterm 1 | 100 | 11.0% |
Midterm 2 | 100 | 11.0% |
Final Exam | 150 | 16.5% |
Mini-Project 1 (Individual) | 30 | 3.3% |
Mini-Project 2 (Individual) | 30 | 3.3% |
Final Project (Group) | 150 | 16.5% |
Total | 910 | 100.0% |
Drop the 2 Lowest Assignments Policy:
Each lab assignment will be worth a total of 30 points total:
- 25 points for the individual lab assignment part
- 5 points for the group lab assignment part.
There will be 12 lab assignments, however, we will drop the lowest two assignment grades. Thus a perfect total lab assignment score will amount to 300. This "drop the two lowest assignments" policy is designed to "catch" the first two assignments that you are unable to complete in time for a variety of reasons. If you find that you are unable to complete more than two assignments due to illness/university obligations/religious observances/other extenuating circumstances, contact Dr. Ellison and we can discuss options at that point.
Late Policies:
- Homework that is late by 5 minutes up to 24 hours will be deducted 30% of the assignment.
- Homework that is late by more than 24 hours will receive 0 points.
Regrade Policies:
You have ONE week to request a grade correction after a homework score is posted. You should clearly present the following information to YOUR lab TA (i.e., either Junseok Yang jyang247@illinois.edu or Dongxiao Wu dw12@illinois.edu):
- Which lab assignment is involved (e.g. lab assignment #6)
- A detailed explanation of the suspected error
- The number of points you feel you should have received for the question.
Bonus Point Opportunities
There are two ways that you can earn bonus points in this class: 1.) interacting with the lectures and 2.) participating in the Piazza discussions that I post.
Bonus Point Opportunity | Points Each Lecture | Total Possible Bonus Points for 28 Lecture Days | Total Possible Course Grade % Boost |
Lecture Interaction | 18.67 | ||
Option 1: Synchronous Attendance + Active Participation | 2/3 pt | ||
Option 2: Synchronous Attendance | 1/2 pt | ||
Option 3: Just watched the video within 48 hours of it posting | 1/3 pt | ||
Piazza Discussion | 1/2 pt | 14 | |
Total | 32.67 | 3.59% | |
Lecture Interaction: Somewhere in every lecture (lecture video) you will see a QR code that you can scan to get bonus point credit for the lecture. You should scan this QR code with your phone to get the bonus points.
- If you attended the lecture live and actively particpated, then you can get 0.67 bonus points for this lecture.
- If you attended the lecture live and DIDN'T actively particpate, then you can get 0.5 bonus points for this lecture.
- If you WATCHED the VIDEO lecture within 48 hours of me posting it (rather than attending live), then you get 0.33 bonus points for this lecture.
Piazza Discussion Questions: Data Science is not a "plug and chug" activity! That is, as a data scientist sometimes you will be faced with methodological or perhaps ethical questions in which there is no easily agreed upon answer. Or perhaps there may be some insight about your research question or analysis that your colleague might see that you may not have thought of.
At least 24 hours before every subsequent lecture, I will post a few open-ended discussion questions on Piazza that will pertain to the next lecture. You can get an additional 0.5 bonus points for a given lecture if you provide a thoughtful, unique perspective to at least one of these discussion questions. "Cool story bro" doesn't count!
Course points will be translated into a course grade at the end of the semester. The grade thresholds will be based on your percentage score out of 910:
Grade | Min Pct | Grade | Min Pct | Grade | Min Pct |
A+ | 97 | A | 93 | A- | 90 |
B+ | 87 | B | 83 | B- | 80 |
C+ | 77 | C | 73 | C- | 70 |
D+ | 67 | D | 63 | D- | 60 |
Lab Attendance
For each lab section (aside from the final presentation section) that you attend and participate in you will get 5 points towards your participation grade. A perfect participation grade in the class is worth 50 points. Thus this means that you can miss up to 4 labs penalty free.
Midterm and Final Exams
In-Person
The midterm and final exams will be in-person. Check the course schedule for the locations and times of these exams. The TAs will be physically proctoring this exam, but Dr. Ellison will be available via Zoom to answer any questions that you might have. If you have a question for Dr. Ellison during your exam, ask the TA to bring their laptop with Zoom pulled up on it over.
Mostly Conceptual with Some Code Interpretation
Most, but not all of these exams will involve testing your mastery of data science concepts that do not involve coding in Python.
However, given the ease of use but imperfect nature of AI tools that can sometimes assist the data scientist with the coding element of their task, having an intuition for the coding essentials and potential flaws in your code is going to become more important than ever. Thus, in these exams you will NOT be asked to WRITE code. However, you may be asked to interpret code and describe the type of output you would expect to get for JUST Pandas and base Python functions that we learn in class.
The assignments and practice tests will mimic these types of code interpretation questions that you might see. This should give you a strong sense as to what to expect.
Mini-Projects
There will be two individual mini-projects due this semester. You can choose to work on your own dataset for the mini-projects or use the "backup" dataset that is supplied. You will get at least two weeks to work on this small project.
This mini-project will be in the form of a Jupyter notebook report. After you submit your report, your report will be randomly assigned to another classmate for peer evaluation (and vice versa).
Final Group Project
There will be a final group project due at the end of this semester. Your group should be comprised of 4 people. (You need to check with Dr. Ellison first to work in a group with less than 4 people). You can choose to work on your own dataset or use the "backup" datasets that are supplied. You will get at least four weeks to work on this final group project.
This final group will be in the form of a Jupyter notebook report. You will also present your findings in slides on your final lab section day. On presentation day, your group will be randomly assigned to evaluate the presentation of another group in your lab section and provide feedback.
Learning Collaboratively
We encourage you to discuss all of your course activities (with the exception of exams) with your friends and classmates! You will learn more through talking through the problems, teaching others, and sharing ideas.
Continue to read on "Academic Integrity" to understand the difference between collaboration and giving an answer away.
Academic Integrity
Collaboration is about working together. Collaboration is not giving the direct answer to a friend or sharing the source code of an assignment. Collaboration requires you to make a serious attempt at every assignment and discuss your ideas and doubts with others so everyone gets more out of the discussion. Your answers must be in your own words and your code must be typed (not copied/pasted) by you.
Academic dishonesty is taken very seriously in STAT 207, and all cases will be reported to the University, your college, and your department. You should understand how academic integrity applies specifically to STAT 207: the sanctions for cheating on an assignment include a loss of all points for the assignment and a lowering of the final course grade by one whole letter grade (70 points). A second incident or cheating on an exam results in an automatic F in the course.
Academic integrity also includes protecting your work. If your work ends up submitted by someone else, we consider this a violation of academic integrity, just as though you submitted someone else’s work.
Checking Wednesday: Understanding the Code and Claims that you are Submitting
AI Tools are not Perfect and have Consequences when They're Wrong
Given the ease of use, but imperfect nature of AI tools that can sometimes assist the data scientist with the coding element of their task, it has become easier than ever to write code and make claims that are incorrect and that you don't understand. Unfortunately, using AI to write code or make claims about concepts that you don't undertand can have negative impacts on society, your organization that you work for, your career etc. (see Piazza discussion).
Do you understand the code/statements that you are making?
Therefore, the ultimate goal is to ensure that you understand the code that you wrote or the conceptual claims that you are making.
Checking Wednesday
Therefore, every Wednesday about 3 people in each lab will be randomly selected to discuss their thought process for solving a few questions (or almost similar questions) to the individual assignment that you just submitted in your assignment the night before.
If you did not simply just copy paste your assignment answers/code without thinking about it, then you should have nothing to worry about in checking Wednesday! Even if you got the answer wrong to the corresponding question in your individual assignment, if you thoughtfully engaged with the question, then you should have some sort of steps/logic process etc. that you can discuss. This discussion of your thought process is ultimately what I'm looking for.
On the other hand, if you got the answer right (or mostly right) on your individual assignment, but you have no idea why/how you got this answer, then this is not demonstrating that you understand that code/concepts that you wrote. In an instance such as this, you can automatically lose up to 7 points (23% of the assignment) on the corresponding assignment questions that you failed to explain your thought process for.
Important Data Scientist Job and Interview Skill
Being able to explain your decision making process to your boss, client, or job interviewer is a very important skill to demonstrate as a data scientist. Many data scientist interviews will require you to do this! This can be an especially useful skill to develop in instances when you don't automatically know the answer to an interview question. Data scientist job interviewers are first and foremost interested in how your decision making process works, when it comes to problems. Being able to articulate your thought process is a very important skill to have in situations like these.
How many times will I be checked?
Each student in the class will be randomly checked either 1 or 2 times at some point during the semester.
How does the checking process work?
In your assigned lab, your lab TA will let you know if you've been randomly selected for a check. You then have until the end of the lab to log onto the "checking Wednesday" Zoom link. (The checks should generally happen in the last 30 minutes of the lab so you have time to get all of the group work done.) Try to find a less occupied space to go in the room. Note that that there will be two others in your lab that have also been selected, so time your log-in effectively. This checking Zoom meeting should take no more than 7 minutes.
After you log into the Zoom meeting, Dr. Ellison will ask you to share your screen and give you a Google colab link. This link will take you to a Jupyter notebook that has a few questions that you should try to answer or discuss your thought process for.
You're allowed to reference the class materials to help you answer this question. However, if you choose to reference your previously submitted assignment or other materials (like Google), Dr. Ellison may ask you a follow up question to check your understanding. She won't ask you anything outside of what was taught in the course.
What happens if I was randomly selected for a given lab, but I didn't attend that day?
You'll be checked in the next lab that you do attend with questions that look similar to the questions you would have been asked that week. For instance, if you missed the lab in which we were asking you questions that looked very similar to assignment 5. Then we would still ask you questions that looked very similar to assignment 5.
Key thing Dr. Ellison wants you know about "checking Wednesday"
Relax! If you have put forth a good faith effort to try to understand what you wrote on your assignment, then you have nothing to worry about, even if you got it wrong or don't quite understand the concept fully.
You can additionally take this as an opportunity to ask me clarification questions about class concepts.
If DO I take off points on your assignment based on my checking Wednesday assessment, I am NOT making an academic integrity violation claim. Being able to articulate your understanding of data science code/concepts that you wrote is a graded learning outcome of this course. Therefore, if you are not able to do this, then you do not get full points for this graded learning outcome.