STAT207 - Data Science Exploration
Fall 2023 - Ellison

Lab Assignment 1 Instructions

Deadlines and Submission

In this first lab, you will set up your account and computer for Data Science Exploration and begin to work with Python notebooks

Source Branch: lab_01
Due Date: Committed and pushed to git before August 29 at 11:59pm CST

Overview

Data Science requires tools to help us learn about data. In this lab, you will accomplish two major things:

  1. Setting your account and computer up for Data Science Exploration.
  2. Working with your first Python notebook for STAT 207.

Part 0: Create General Class Folder

First, you should create a folder named 'stat207' (we recommend on your Desktop) to hold all of your Python notebooks.


Part 1: Set up Software and Tools for Data Science

The first half of this lab will be spent getting you all set up for the semester – you will only need to do this once.

Installing Software Tools

To begin Data Science, you need a few basic tools installed on your computer. All of these tools are free, open-source and industry standard. We have prepared guides based on what type of computer you have. Complete these steps in the link below and come back to this page.

Creating your STAT 207 git repository

When working in Data Science, you will want to store all of your code and data together, in the cloud, in a “repository”. For this class, we will be using an Illinois-hosted repository called GitHub Enterprise. Complete these steps in the link below and come back to this page.

Set up your Python notebook

In Data Science, all of our programming will be done in “notebooks”. Your python install will need a few libraries in order to run the notebooks. Using your command line, run the following:

                
                    conda install jupyter
                    conda install pandas
                    conda install matplotlib
                    conda install seaborn
            

Potential Error Workaround: IF you get an error about "conda not found" when trying to do this, you can also install these packages by doing the following.

  • Searching for the "miniconda" program you just downloaded, and run what should say "Anaconda Prompt."
  • This will open up another command line window that is specifically for running python commands (for instance commands that install packages).
  • Run the code in this Anaconda Prompt instead
                        
                conda install jupyter
                conda install pandas
                conda install matplotlib
                conda install seaborn
                

Part 2: Complete the “lab_01” Notebooks

Part 2a: Fetching the Lab Assignment from the Class Respository

Using your command line, navigate to your stat207 repository (cd Desktop -> cd stat207 -> cd NETID), replacing NETID with your own, and fetch the notebook from our release repository by running the following two git commands:

                
            git fetch release
            git merge release/lab_01 -m "Merging initial files"
            

ONLY IF you get an error related to unrelated histories, use:

                
            git merge release/lab_01 --allow-unrelated-histories -m "Merging initial files" 
            

Part 2b: Opening the Jupyter Notebook

One way to open the notebook (may not work)

Open the notebook with the command:

                
            jupyter notebook
            

Another way to open the notebook: IF you get an error about "jupyter is not recognized" when trying to do this, you can also open the notebook by doing the following.

  • Searching for the "miniconda" program you just downloaded, and run what should say "Anaconda Prompt."
  • This will open up the Anaconda Command Line window that is specifically for running python commands (for instance commands that install python packages or launch jupyter notebooks).
  • If your Anaconda Command line window is not already there, navigate to your stat207 repository (cd Desktop -> cd stat207
  • Run the code in this Anaconda Command Line Prompt window instead
                        
            jupyter notebook
                

Also another way to open the notebook:

  • Search for the the program "jupyter" on your computer and run it.
  • This will open a window that displays the file system of your computer. Navigate to the folder your notebook is saved in by clicking on the folder links.
  • Once you've found your notebook, click on it to open it

Part 2c: Editing the Jupyter Notebook (aka Working on the Assignment)

Inside of the notebook webpage:

  • Navigate into the folder containing lab_01.ipynb and open up the notebook
  • Follow the instructions inside of the notebook

Whenever you are done, you should checkpoint (using File -> Save Checkpoint in the notebook) your notebook to save your work. Once your work is saved, you can exit the command line running the notebook with Ctrl + C.


Part 2d: Saving/Submitting your Notebook back to the Class Repository

When you’re ready to save your work online and/or submit your work, return to the command line and run:

                
            git add -A
            git commit -m "submission (or any message here)"
            git push