Class Resources
Questions?
Guides
Course Staff and Tips
This course is comprised of the following four components.
Main Lecture
- Lecture Time: TuTh 2:00pM - 3:20PM CST
- Location: Online (Canvas Zoom Links)
- Instructor: Dr. Victoria (Tori) Ellison
Lecture Attendance Policy
Not part of participation grade, but highly encouraged! See the bonus point opportunites section for how you can earn 0.33-0.67 bonus points for each lecture you attend!
Before Lecture Materials
Before the lecture, zip files containing the Jupyter notebooks, csv files, and pdfs that we will markup can be found in the Course Schedule for the corresponding day. These should be posted at most 24 hours before the lecture starts.
- Slides pdf: This pdf will focus more on the theoretical content of the class. This pdf contains "skeleton notes" that we will fill out in class. I would suggest downloading (perhaps printing) these pdf's yourself and taking notes in class.
- csv file (usually)
- Jupyter notebook file: The Jupyter notebook file will introduce new Python functions, parameters, packages etc. We will use it to go through real-world statistical applications of the theoretical content that we talk about in the slides pdf. I would suggest opening this Jupyter notebook and going through the code yourself in class. Try changing some of the parameters/values in the code to see what it does!
- Notebook pdf: This is simply just a pdf copy of the Jupyter notebook file. I would suggest downloading (perhaps printing) these pdf's yourself and taking notes in class.
Post Lecture Materials
After the lecture, copies of the pdfs that we marked up as well as the lecture video will be posted on Canvas within 24 hours.
Assigned Lab Sections
These are the lab sections/times that you specifically enrolled in. You should only attend the lab that you enrolled in.
Lab Section Time | Location | TA | |
Wednesday 9:30-10:50am | 430 Armory | David Kim | davidk9@illinois.edu |
Wednesday 11-12:20PM | 430 Armory | Dongxiao Wu | dw12@illinois.edu |
Wednesday 11-12:20PM | 4101 Materials Science & Eng Bld | Junseok Yang | jyang247@illinois.edu |
Wednesday 12:30pm-1:50pm | 430 Armory | Dongxiao Wu | dw12@illinois.edu |
Wednesday 2-3:20 PM | 143 Armory | David Kim | davidk9@illinois.edu |
Wednesday 3-4:20PM | 1024 Lincoln Hall | Junseok Yang | jyang247@illinois.edu |
Lab Attendance Policy: Your attendance at the labs is part of your participation grade. You should only attend the lab section that you enrolled in. You get 5 points for each lab that you attend not including the final presentation lab. You max out at 55 lab attendence points. This means that you can miss 3 labs penalty free.
Lab Purpose: The purpose of the assigned lab sections is to give you a place to work on the weekly lab assignments where you can ask for help and get quick feedback from the teaching assistant and course assistants assigned to your lab section. Later in the semester, you will use these lab sections to work on your final projects.
Lab Assignment Components: Your weekly lab assignments will be comprised of two parts: a.) an individual lab assignment and b.) a group lab assignment. Thus, these assigned labs will give you the space to meet and collaborate with your group in person.
Additional (Optional) Lab
The lab assignments will generally be due Tuesdays at 11:59pm CST. So to get additional help answering any last minute questions we will have additional (optional) lab times.
Lab Section Time | Location | CAS |
Mondays and Tuesdays 5pm-7pm | In-Person at 1065 Lincoln Hall OR Online via Zoom |
Afnan Dzharudin Dorothy Wongkarnta Mengyue Huang Peter Farnham Kris Png Utkarsh Prasad |
These optional labs will be held in-person, however you can also attend online if you'd like (see Canvas for Zoom links.)
STAT207 Open Lab Queue
If you have a question to ask during Open Labs, make sure that you submit your question to the queue first so the TAs can answer everyone's questions in an orderly fashion.
https://queue.illinois.edu/q/stat207
Instructor Office Hours
Office Hours Time | Location | Instructor |
Thursdays 5:30-6:30pm CST Fridays 3-4pm CST |
Online (Zoom) | Tori Ellison |
Feel free to drop by and ask me questions about course content, assignment/project questions, or any other class related topics. I also worked in the data science industry for 6 years, so feel free to ask me any general career/industry related questions that you might have.
Alternatively, if you wanted to let a struggling millennial know to to pull off wearing white sneakers, I'm quite curious.
Lab Section Time | Location | TA | |
Wednesday 9:30-10:50am | 430 Armory | David Kim | davidk9@illinois.edu |
Wednesday 11-12:20PM | 430 Armory | Dongxiao Wu | dw12@illinois.edu |
Wednesday 11-12:20PM | 4101 Materials Science & Eng Bld | Junseok Yang | jyang247@illinois.edu |
Wednesday 12:30pm-1:50pm | 430 Armory | Dongxiao Wu | dw12@illinois.edu |
Wednesday 2-3:20 PM | 143 Armory | David Kim | davidk9@illinois.edu |
Wednesday 3-4:20PM | 1024 Lincoln Hall | Junseok Yang | jyang247@illinois.edu |
- Official Course Website: http://courses.las.illinois.edu/spring2024/stat207/
- Course Canvas Page: https://canvas.illinois.edu/courses/44397
- UIUC Courses Github Enterprise Organization Page: https://github.com/illinois-cs-coursework
- Piazza Discussion Board: https://piazza.com/illinois/spring2024/stat207 (Access Code: kq9upot8uaq)
- Open Labs STAT207 Queue: https://queue.illinois.edu/q/stat207
- STAT207 E-Book: https://exploration.stat.illinois.edu
Overview: Building on the foundation of STAT 107, Data Science Discovery, we use Python, Jupyter notebooks, and GitHub to explore statistical concepts and the data science pipeline, combined with the statistical analysis of STAT 200. As we explore data science we will do the following.
- Develop an understanding of probability models for noisy data and how these translate into uncertainty analysis and statistical inference
- Understand how modeling assumptions and sampling frames affect our conclusions
- Become adept with multiple regression modeling, basic machine learning, and inference
- Become proficient in Python coding for data management, analytics, visualization
- Understand and use GitHub repositories, the industry standard for submitting code and reports
STAT107
-
Required Calculator (You can use your computer's calculator.)
-
Laptop Computer: You need a laptop running Windows, OS X, or Linux. Tablets, Chromebooks, and iPads are not supported. You will need to be able to install both Python and git to complete the labs (instructions provided).
-
Lecture notes: These will be posted on the course schedule.
- STAT207 E-Book: Dr. Deeke and I wrote a STAT207 e-book this summer!
https://exploration.stat.illinois.edu -
Other Helpful Online Books:To read more about the topics in this course.
- J. VanderPlas (2016) Python Data Science Handbook, https://jakevdp.github.io/PythonDataScienceHandbook/
- Diez, Barr, and Cetinkaya-Rundel, (2015), OpenIntro Statistics https://www.openintro.org/download.php?file=os3&redirect=/stat/textbook/os3.php
Graded Components
Course grades are computed based on your percentage out of 930 points for the course. The graded components are as follows:
Graded Component | Total Points | Percentage of Final Grade |
Lab Attendance | 55 | 5.91% |
Individual Lab Assignment Part | 250 | 26.88% |
Group Lab Assignment Part | 50 | 5.38% |
Checking Wednesday (7.5 for each check) |
15 | 1.61% |
Midterm 1 | 100 | 10.75% |
Midterm 2 | 100 | 10.75% |
Final Exam | 150 | 16.13% |
Mini-Project 1 (Individual) | 30 | 3.23% |
Mini-Project 2 (Individual) | 30 | 3.23% |
Final Project (Group) | 150 | 16.13% |
Total | 930 | 100.00% |
Drop the 2 Lowest Assignments Policy:
Each lab assignment will be worth a total of 30 points total:
- 25 points for the individual lab assignment part
- 5 points for the group lab assignment part.
There will be 12 lab assignments, however, we will drop the lowest two assignment grades. Thus a perfect total lab assignment score will amount to 300. This "drop the two lowest assignments" policy is designed to "catch" the first two assignments that you are unable to complete in time for a variety of reasons. If you find that you are unable to complete more than two assignments due to illness/university obligations/religious observances/other extenuating circumstances, contact Dr. Ellison and we can discuss options at that point.
Late Policies:
- Homework that is late by 5 minutes up to 24 hours will be deducted 30% of the assignment.
- Homework that is late by more than 24 hours will receive 0 points.
Regrade Policies:
You have ONE week to request a grade correction after a homework score is posted. You should clearly present the following information to YOUR lab TA (i.e., either Junseok Yang jyang247@illinois.edu or Dongxiao Wu dw12@illinois.edu or David Kim davidk9@illinois.edu):
- Which lab assignment is involved (e.g. lab assignment #6)
- A detailed explanation of the suspected error
- The number of points you feel you should have received for the question.
Bonus Point Opportunities
There are two ways that you can earn bonus points in this class: 1.) interacting with the lectures and 2.) participating in the Piazza discussions that I post.
Bonus Point Opportunites |
Points |
Total Possible Points |
Total Possible Course Grade % Boost |
Daily Lecture Interaction Opportunity |
|
Full participation for all 28 lecture days = 18.67 pts |
2.01% |
Option 1: Synchronous Attendance (1+ hour) + Active Participation (2+ comments) |
2/3 pt per lecture |
||
Option 2: Synchronous Attendance (1+ hour) |
1/2 pt per lecture |
||
Option 3: Just watched the video (1+ hour) within 48 hours of it posting |
1/3 pt per lecture |
||
Completing the Unit Summary Worksheets |
2/3 pt per unit |
Complete all worksheets for 17 units = 11.33 |
1.22% |
Piazza Discussion Opportunity |
1/4 pt |
|
|
Total |
|
30+ |
3.23%+ |
Lecture Interaction:
- If you attended the lecture live and actively particpated, then you can get 0.67 bonus points for this lecture.
- If you attended the lecture live and DIDN'T actively particpate, then you can get 0.5 bonus points for this lecture.
- If you WATCHED the VIDEO lecture within 48 hours of me posting it (rather than attending live), then you get 0.33 bonus points for this lecture.
Unit Summary Worksheets: You can find each unit summary worksheet in Canvas quizzes. These worksheets are designed to help you learn and organize the main goals, goal complications, solutions to these complications, concepts, and code learned in each unit's lecture notes. You may find these worksheets useful when it comes to completing the corresponding labs and studying for the exams.
Piazza Discussion Questions: Data Science is not a "plug and chug" activity! That is, as a data scientist sometimes you will be faced with methodological or perhaps ethical questions in which there is no easily agreed upon answer. Or perhaps there may be some insight about your research question or analysis that your colleague might see that you may not have thought of.
Ocassionally I will post a few open-ended discussion questions on Piazza that will pertain to an upcoming lecture. You can get an additional 0.25 bonus points for a given lecture if you provide a thoughtful, unique perspective to at least one of these discussion questions. "Cool story bro" doesn't count!Course points will be translated into a course grade at the end of the semester. The grade thresholds will be based on your percentage score out of 910:
Grade | Minimum Percent Needed | Minimum Total Course Points Needed |
A+ | 97% | 902.1 |
A | 93% | 864.9 |
A- | 90% | 837 |
B+ | 87% | 809.1 |
B | 83% | 771.9 |
B- | 80% | 744 |
C+ | 77% | 716.1 |
C | 73% | 678.9 |
C- | 70% | 651 |
D+ | 67% | 623.1 |
D | 63% | 585.9 |
D- | 60% | 558 |
F | Below 60% | Below 558 |
Lab Attendance
For each lab section (aside from the final presentation section) that you attend and participate in you will get 5 points towards your participation grade. A perfect participation grade in the class is worth 55 points. Thus this means that you can miss up to 3 labs penalty free.
Midterm and Final Exams
In-Person
The midterm and final exams will be in-person. Check the course schedule for the locations and times of these exams. The TAs will be physically proctoring this exam, but Dr. Ellison will be available via Zoom to answer any questions that you might have. If you have a question for Dr. Ellison during your exam, ask the TA to bring their laptop with Zoom pulled up on it over.
Mostly Conceptual with Some Code Interpretation
Most, but not all of these exams will involve testing your mastery of data science concepts that do not involve coding in Python.
However, given the ease of use but imperfect nature of AI tools that can sometimes assist the data scientist with the coding element of their task, having an intuition for the coding essentials and potential flaws in your code is going to become more important than ever. Thus, in these exams you will NOT be asked to WRITE code. However, you may be asked to interpret code and describe the type of output you would expect to get for JUST Pandas and base Python functions that we learn in class.
The assignments and practice tests will mimic these types of code interpretation questions that you might see. This should give you a strong sense as to what to expect.
Remember, writing code is simply a means to an end for the main task of a data scientist which is to solve problems and answer questions with data. You will most likely have to use many different programming languages and even functions within the same programming language to perform the same task throughout your career. Therefore, perfectly memorizing EVERY new function you learn, practically speaking, is not as important once you learn the basics (ex: for loops) of a programming language.
Once you learn a programming language's basics, the more important skill to develop as a data scientist is how to QUICKLY FIND and UNDERSTAND the appropriate code to use to solve a problem and how to QUICKLY debug.
Mini-Projects
There will be two individual mini-projects due this semester. You can choose to work on your own dataset for the mini-projects or use the "backup" dataset that is supplied. You will get at least two weeks to work on this small project.
This mini-project will be in the form of a Jupyter notebook report. After you submit your report, your report will be randomly assigned to another classmate for peer evaluation (and vice versa).
Final Group Project
There will be a final group project due at the end of this semester. Your group should be comprised of 4 people. (You need to check with Dr. Ellison first to work in a group with less than 4 people). You can choose to work on your own dataset or use the "backup" datasets that are supplied. You will get at least four weeks to work on this final group project.
This final group will be in the form of a Jupyter notebook report. You will also present your findings in slides on your final lab section day. On presentation day, your group will be randomly assigned to evaluate the presentation of another group in your lab section and provide feedback.
Learning Collaboratively
We encourage you to discuss all of your course activities (with the exception of exams) with your friends and classmates! You will learn more through talking through the problems, teaching others, and sharing ideas.
Continue to read on "Academic Integrity" to understand the difference between collaboration and giving an answer away.
Academic Integrity
Collaboration is about working together. Collaboration is not giving the direct answer to a classmate or sharing the source code of an assignment. Collaboration requires you to make a serious attempt at every assignment and discuss your ideas and doubts with others so everyone gets more out of the discussion. Your answers must be in your own words and your code must be typed (not copied/pasted) by you.
Academic dishonesty is taken very seriously in STAT 207, and all cases will be reported to the University, your college, and your department. You should understand how academic integrity applies specifically to STAT 207: the sanctions for cheating on an assignment include a loss of all points for the assignment. A second incident or cheating on an exam results in an automatic F in the course.
Academic integrity also includes protecting your work. If your work ends up submitted by someone else, we consider this a violation of academic integrity, just as though you submitted someone else’s work.
Checking Wednesday: Understanding the Code and Claims that you are Submitting
AI Tools are not Perfect and have Consequences when They're Wrong
Given the ease of use, but imperfect nature of AI tools that can sometimes assist the data scientist with the coding element of their task, it has become easier than ever to write code and make claims that are incorrect and that you don't understand. Unfortunately, using AI to write code or make claims about concepts that you don't undertand can have negative impacts on society, your organization that you work for, your career etc. (see Piazza discussion).
Do you understand the code/statements that you are making?
Therefore, the ultimate goal is to ensure that you understand the code that you wrote or the conceptual claims that you are making.
Checking Wednesday Idea
Therefore, every Wednesday about 4 people in each lab will be randomly selected to discuss their thought process for solving a few questions (or almost similar questions) to the individual assignment that you just submitted in your assignment the night before.
If you did not simply just copy paste your assignment answers/code without thinking about it, then you should have nothing to worry about in checking Wednesday! Even if you got the answer wrong to the corresponding question in your individual assignment, if you thoughtfully engaged with the question, then you should have some sort of steps/logic process etc. that you can discuss. This discussion of your thought process is ultimately what I'm looking for.
On the other hand, if you got the answer right (or mostly right) on your individual assignment, but you have no idea why/how you got this answer, then this is not demonstrating that you understand and can EXPLAIN that code/concepts that you wrote.
Important Data Scientist Job and Interview Skill
Being able to explain your decision making process to your boss, client, or job interviewer is a very important skill to demonstrate as a data scientist. Many data scientist interviews will require you to do this! This can be an especially useful skill to develop in instances when you don't automatically know the answer to an interview question. Data scientist job interviewers are first and foremost interested in how your decision making process works, when it comes to problems. Being able to articulate your thought process is a very important skill to have in situations like these.
How many times will I be checked?
Each student in the class will be randomly checked 2 times, worth 7.5 points each at some point during the semester.
How to complete the check?
- Random selection: In your Wednesday lab, your lab TA will let you know if you've been randomly selected for a check.
- Google Colab Link: If selected, they will email you a link to a Google colab Jupyter notebook. You will not gain access to view this Google colab link until 20 minutes before the lab ends.
- Checking Wednesday Room: If selected, there will be a nearby room that you can go to to complete the check. Your TA will tell you the room for that day.
- Make a < 10 minute video in Zoom:
Once you are given access to view the Google colab link (20 minutes before the end of the lab), you should do the following.
- Complete the questions asked in the notebook. You will have to write a few lines of code and explain your answer based on the code.
- You should record yourself in Zoom (Record to the Cloud) writing the code in the Google colab Jupyter notebook.
- You should share your FULL screen when writing the code.
- Your camera should be on (let me know if you foresee an issue with this).
- You should explain what you've done.
- References you CAN use: Your assignments, class notes, the e-book.
- References you CANNOT use: Anything else, this includes assistance from the TA, another student, etc.
- Email your video link: In order to receive credit, you need to email your TA your Zoom video link BY THE END OF THE LAB. Once you stop the recording or log out of Zoom, you will be able to find the video link here: https://illinois.zoom.us/recording.
How is the check graded?
- Checks will NOT be penalized if:
- You got the wrong answer! Remember, the Checking Wednesday is not about evaluating whether you got the RIGHT answer. We are evaluating whether you can EXPLAIN your thought process for YOUR answer.
- Checks will be heavily penalized if:
- The video is submitted after the end of the lab.
- The video is more than 10 minutes long.
- You are not able to explain your thought process for how you arrived at your answer.
- Your camera was not turned on.
- You did not share your full screen while recording.
- You used a reference to help you was not allowed (like talking to one of your classmates).
What happens if I was randomly selected for a given lab, but I didn't attend that day?
You'll be checked in the next lab that you do attend with questions that look similar to the questions you would have been asked that week. For instance, if you missed the lab in which we were asking you questions that looked very similar to assignment 5. Then we would still ask you questions that looked very similar to assignment 5.
Key thing Dr. Ellison wants you know about "checking Wednesday"
Relax! If you have put forth a good faith effort to try to understand what you wrote on your assignment, then you have nothing to worry about, even if you got it wrong or don't quite understand the concept fully yet.
IMPORTANT: This is NOT an academic integrity violation check and I don't necessarily think you are cheating
An important point to remember is that, if you do not get the full 7.5 points on a check, I am NOT making an academic integrity violation claim and I do NOT necessarily think that you are cheating.
Being able to articulate your understanding of data science code/concepts that you wrote is a graded learning outcome of this course, that is an important skill for a data scientist to have. Therefore, if you are not able to do this, then you do not get full points for this graded learning outcome because you were unable to demonstrate this skill.