Final Project: You and Data Science

Due: Reading Day (Thursday, May 7th at 11:59pm)

Throughout this semester, you have grown into an amazing Data Scientist! You are analyzing datasets in Python, performing advanced statistical tests, and finding the answers to complex questions using data. You have seen dozens of datasets we have provided. For the final project, we want you to teach us something – we want to learn about something you are passionate about!

For this final project in Data Science Discovery, you will use Data Science to explore something you are passionate about or interested in learning more about. At the end, you will write a small paper telling us about what you found and teaching us something! We only have a few minimal requirements:

  • You must use a non-trivial dataset. The dataset must have at least 200 data points (this could be 20 rows with 10 columns, 50 rows with 4 columns, etc).
  • You must do some analysis using Python. You will turn in your code. You must do something, but it could be anything.
  • You must submit a paper/report that provides a summary of what you found and teach us about your passion/interest. The paper must be at least 1 page (and single-spaced), but up to half of that page can be figures/graphs. Full details below.

With students from so many different majors in Data Science Discovery, we are excited for everything we are going to learn from you! :)

Setting Up Your Project Workspace

To complete this project, there is no starter code – you are building it from scratch! However, we do want to check out your work so we need you to place it in a specific spot in your stat107 directory so you can turn it in and so that we can find it:

  • In your stat107 directory, navigate into the folder that contains all of your labs, extra credit notebook, etc.
  • Create a new directory called project2.
  • Complete all of your project work within that new project2 directory. You’ll turn in (commit + push) your project2 directory and we can check out your code that way!


Our hope is that you will use a dataset you are passionate about. It can be anything – it can be a dataset used from another class (eg: think if you had any data you get in Excel), it can be a dataset you found online, or it can be a dataset you gather yourself. Some ideas include:

  • A dataset about a hobby you’re interested in (eg: vacation destinations, best beaches, fashion trends, instagram, etc)
  • A dataset about something you enjoy doing or watching (eg: swimming, volleyball, Rocket League, Illini Football, etc)
  • A dataset about your a topic related to your major (economics, communications, political science, etc)
  • Any dataset that means something to you.

If you are completely out of ideas, is a well-known, free resource that contains millions of datasets.

Project Report

The major deliverable for this project is a small paper or report over what you found. We want to learn something from you about your interest/passion, so tell us a story about what you discovered!

The only requirements are:

  1. Your report must be at least one page. (It can be more, use enough space to tell us what amazing things you found.)
  2. Your report must be single spaced. (The default settings on Word or Google Docs is great, using line spacing of up to 1.15; the real-world is not double-spaced.)
  3. Your font size should not be greater than 12. (The default settings on most applications is 11, which seems great.)
  4. Feel free to include images, diagrams, figures, etc! The only requirement is that we want at least half a page of text in your report (you can have 3 pages of diagrams so long as there’s at least half a page of text somewhere in it all.)

Your audience is going to be Prof. Wade, Prof. Karle, and/or your lab TA. You do not need to explain Python or Data Science to us, but you should not assume we know anything about your specific interest/passion.


When you are ready to submit, there are two things you will submit.

Submission: Part 1 - Dataset and Code

For your code, you will turn in your project2 folder just like you have done for all of your other projects.

git add -A
git commit -m "submission (or any message here)"
git push origin master

Submission: Part 2 - Project Report

Your project report will be submitted online by May 7th at 11:59pm!

  • Upload your file to your Google Drive account (or use Google Docs to create your report)
  • Name your file stat107project2 NETID (anything else you want), replacing NETID with your NetID
  • Make sure your file has the correct name! We won’t find it without stat107project2!
  • Share that file with BOTH and

We can’t wait to read your project! :)