Fall 2022 - STAT207
Data Science Exploration


Lab 1: Data Science Setup


Overview

This first lab will be different from on the rest, in that it will be comprised of two parts.

  • Parts 0-1: General Class Set Up
      You should complete Parts 0 and 1 described below to set yourself up for this course. You should only need to complete these steps just for this first assignment.

  • Parts 2: Downloading, Editing, and Submitting the Individual and Group Notebooks for this Week

  • Due Date: For full credit, your lab 1 Jupyter notebooks need to be committed and pushed to git before Tuesday August 30 at 11:59pm CST



Part 0: General Class Folder

First, you should create a folder named 'stat207' (we recommend on your Desktop) to hold all of your Python notebooks.

Part 1: Downloading and Setting up Class Software and Tools

Part 1a: Installing Software Tools

To begin to do Data Science, you need a few basic tools installed on your computer. All of these tools are free, open-source and industry standard. We have prepared guides based on what type of computer you have.


Part 1b: Creating your STAT 207 git repository

When working in Data Science, you will want to store all of your code and data together, in the cloud, in a “repository”. For this class, we will be using an Illinois-hosted repository called GitHub Enterprise.


Part 1c: Installing Important Python Packages

In Data Science, all of our programming will be done in “notebooks”. Your python install will need a few libraries in order to run the notebooks.

    Using your command line, run the following:

    conda install jupyter
    conda install pandas
    conda install matplotlib
    conda install seaborn
    

    Potential Error Workaround: IF you get an error about "conda not found" when trying to do this, you can also install these packages by doing the following.

    • Searching for the "miniconda" program you just downloaded, and run what should say "Anaconda Prompt."
    • This will open up another command line window that is specifically for running python commands (for instance commands that install packages).
    • Run the code in this Anaconda Prompt instead
      conda install jupyter
      conda install pandas
      conda install matplotlib
      conda install seaborn
      

    This might take a couple of minutes. You will need to type [y] to confirm you want to install of of the packages (the option [y]/n shows that y is default when you choose no option).

    You can check what has been installed already using the command:

    conda list
    


Part 2: Complete the “lab_01” Notebooks (Individual and Group)

Part 2a: Fetching the Lab Assignment from the Class Respository

    Use your command line, navigate to your stat207 repository, replacing NETID with your own,

      cd Desktop
      cd stat207
      cd NETID

    Then fetch the notebook from our release repository by running the following two git commands:

      git fetch release
      git merge release/lab_01 -m "Merging initial files"
      

    ONLY IF you get an error related to unrelated histories, use:

      git merge release/lab_01 --allow-unrelated-histories -m "Merging initial files" 
      

Part 2b: Opening the Jupyter Notebook Application

Opening the Jupyter notebook application should open up a web browser which displays the file system on your personal computer.

One way to open the notebook application (may not work)

    Open the notebook in the command line with the command:

      jupyter notebook
      

Another way to open the notebook:

    IF you get an error about "jupyter is not recognized" when trying to do this, you can also open the notebook by doing the following.
    • Searching for the "miniconda" program you just downloaded, and run what should say "Anaconda Prompt."
    • This will open up the Anaconda Command Line window that is specifically for running python commands (for instance commands that install python packages or launch jupyter notebooks).
    • If your Anaconda Command line window is not already there, navigate to your stat207 repository (cd Desktop -> cd stat207
    • Run the code in this Anaconda Command Line Prompt window instead
      jupyter notebook
      

Also another way:

  • Search for the the program "jupyter" on your computer and run it.
  • This will open a window that displays the file system of your computer. Navigate to the folder your notebook is saved in by clicking on the folder links.
  • Once you've found your notebook, click on it to open it

Part 2c: Editing the Jupyter Notebooks and Working on the Assignments

Onced you have opened the Jupyter notebook application and you see the file system on your personal computer displayed in a web browser do the following to open a specific notebook that you would like to edit.

  • In the displayed file system, navigate into your netid folder on your desktop. It should contain the files lab_01_individual.ipynb and lab_01_group.ipynb.
  • Click on each of these displayed files to open them.
  • Follow the instructions inside of the notebooks.
  • lab_01_individual.ipynb is to be completed individually
  • lab_01_group.ipynb is to be completed in groups of 2-3. Only one person in a group needs to submit this completed group file. Make sure all teammate names are listed in this file.

Whenever you are done, you should checkpoint (using File -> Save Checkpoint in the notebook) your notebook to save your work. Once your work is saved, you can exit the command line running the notebook with Ctrl + C.


Part 2d: Saving/Submitting your Notebooks back to the Class Repository

When you’re ready to save your work online and/or submit your work, return to the command line and run:

git add -A
git commit -m "submission (or any message here)"
git push

Submitting Your Work

When you have completed working, you should always submit your work (even if you're not quite finished). We will always grade the latest push you made before the due date (and ignore everything else) — submitting multiple times is okay and encouraged!

Inside of Jupyter:

  • Click File -> Save Checkpoint to ensure your notebook is saved.
  • Click File -> Close and Halt to exit your notebook.
  • Click Quit (in the top-right) to close the directory view.

After exiting Jupyter, your command prompt will return to accept new commands. Using your command prompt, run:

git add -A
git commit -m "submission (or any message here)"
git push

Part 2e: Verifying your Submission on Github

You can verify your submission was made by visiting the web interface to github:

  • Visit https://github.com/illinois-cs-coursework
  • Click on your NetID repository.
  • Ensure your last commit was a few seconds or 1-2 minutes ago. (You can click on the file to visualize your submission as well).