Data Science Setup

In this first lab, you will set up your account and computer for Data Science Exploration and begin to work with Python notebooks

Source Branch: lab_01
Due Date: Committed and pushed to git before January 29, 2020 at 11:59pm

Part 1: Software and Tools for Data Science

The first half of this lab will be spent getting you all set up for the semester – you will only need to do this once.

Part 1a: Installing Software Tools

To begin to do Data Science, you need a few basic tools installed on your computer. All of these tools are free, open-source and industry standard. We have prepared guides based on what type of computer you have:

Part 1b: Creating your STAT 207 git repository

When working in Data Science, you will want to store all of your code and data together, in the cloud, in a “repository”. For this class, we will be using an Illinois-hosted repository called GitHub Enterprise.

Part 1c: Set up your Python notebook

In Data Science, all of our programming will be done in “notebooks”. Your python install will need a few libraries in order to run the notebooks. Using your command line, run the following:

conda install jupyter
conda install pandas
conda install matplotlib
conda install seaborn

This might take a couple of minutes. You will need to press [Enter] to confirm you want to install of of the packages (the option [y]/n shows that y is default when you choose no option).

You can check what has been installed already using the command:

conda list

Part 2: Complete the “lab_01” Notebook

Using your command line, navigate to your stat207 repository (cd Desktop -> cd stat207 -> cd NETID), replacing NETID with your own, and fetch the notebook from our release repository by running the following two git commands:

git fetch release
git merge release/lab_01 -m "Merging initial files"

ONLY IF you get an error related to unrelated histories, use:

git merge release/lab_01 --allow-unrelated-histories -m "Merging initial files" 

Open the notebook with the command:

jupyter notebook

Inside of the notebook webpage:

  • Navigate into the folder containing lab_01.ipynb and open up the notebook
  • Follow the instructions inside of the notebook

Whenever you are done, you should checkpoint (using File -> Save Checkpoint in the notebook) your notebook to save your work. Once your work is saved, you can exit the command line running the notebook with Ctrl + C.

Turning in Your Work

When you’re ready to save your work online and/or submit your work, return to the command line and run:

git add -A
git commit -m "submission (or any message here)"
git push origin master

Submitting Your Work

When you have completed working, you should always submit your work (even if you're not quite finished). We will always grade the latest push you made before the due date (and ignore everything else) — submitting multiple times is okay and encouraged!

Inside of Jupyter:

  • Click File -> Save Checkpoint to ensure your notebook is saved.
  • Click File -> Close and Halt to exit your notebook.
  • Click Quit (in the top-right) to close the directory view.

After exiting Jupyter, your command prompt will return to accept new commands. Using your command prompt, run:

git add -A
git commit -m "submission (or any message here)"
git push origin master

You can verify your submission was made by visiting the web interface to github: