Data Science requires tools to help us learn about data. In this lab, you will accomplish two major things:

  1. Setting your account and computer up for Data Science Discovery!
  2. Playing around within your first Python notebook.

Part 1a: Set up the tools!

Follow our guide to setup your machine with the tools we will use to do Data Science:

Part 1b: Set up your git repository

In Data Science Discovery, we will use git to give you your initial files and you will use git to submit your work.

Part 1c: Set up your Python notebook

In Data Science, all of our programming will be done in “notebooks”. Your python install will need a few libraries in order to run the notebooks. Using your command line, run the following:

conda install jupyter
conda install pandas

This will take a bit. You will need to press [Enter] to confirm you want to install of of the packages (the option [y]/n shows that y is default when you choose no option).

Part 2: Complete the “Lab: Introduction” Notebook

Using your command line, navigate to your stat107 repository (cd Desktop -> cd stat107 -> cd [NETID]) and fetch the notebook from our release repository by running the following two git commands:

git fetch release
git merge release/lab_intro -m "Merging initial files"

ONLY IF you get an error related to unrelated histories, use:

git merge release/lab_intro --allow-unrelated-histories -m "Merging initial files" 

Open the notebook with the command:

jupyter notebook

Inside of the notebook webpage:

Whenever you are done, you should checkpoint (using File -> Save Checkpoint in the notebook) your notebook to save your work. Once your work is saved, you can exit the command line running the notebook with Ctrl + C.

Turning in Your Work

When you’re ready to save your work online and/or submit your work, return to the command line and run:

git add -A
git commit -m "submission (or any message here)"
git push origin master