STAT437 - Unsupervised Learning
Spring 2024 - Ellison

Data Science Tools Setup: Python and Git


Data Science Setup for Windows

Data Science requires a few tools to help us discover interesting features in our data. We will primarily use one tools and several libraries within this tool.

  • python, a simple programming language (this allows for the computer to do the work for us)

All of these tools are free (and open-source), so it just takes a few minutes for you to install them to get started!

Installing Python

You will need Python 3.6 (or later). We will first check if you have Python already (if you have done Data Science) and install it if you don’t already have it.

Checking for existing Python

  1. Open up your command prompt
  2. Type python --version and press Enter.
  • If you see Python 3.7.1 (or similar), you are all set – no need to install Python. (Skip to the git section.)
  • If you see 'python' is not recognized as an internal or external command, operable program or batch file., install it now:

Installing Python

  1. Visit https://conda.io/miniconda.html to get Miniconda, a light-weight version of the python programming language
  2. Download and install the latest Windows, 64-bit installer for the latest version of Python (eg: 3.7).
  3. After the install finishes, exit your command prompt, re-launch it, and verify it installed by following the steps above (in "checking for existing python").

 


Data Science Setup for Mac OS X

Data Science requires a few tools to help us discover interesting features in our data. We will primarily use two tools and several libraries within each of these tools. The three tools are:

  • python, a simple programming language (this allows for the computer to do the work for us)
  • git, a distributed version control system/repository tool (this runs technology behind “github”)

All of these tools are free (and open-source), so it just takes a few minutes for you to install them to get started!

Installing Python

You will need Python 3.6 (or later). We will first check if you have Python already (if you have done Data Science) and install it if you don’t already have it.

Checking for existing Python

  1. Open up your command prompt
  2. Type python --version and press Enter.
  • If you see Python 3.7.1 (or similar), you are all set – no need to install Python!
  • If you see an error or Python 2.7, we will install it now!

Installing Python


One approach that might work
  • Run the following in your command line.
    cd Downloads
  • bash Miniconda3-latest-MacOSX-x86_64.sh
  • You will need to press q to exit the license screen and all default options are fine.
  • Restart your terminal

If that approach doesn't work, try this
  • Run the following in your command line.
    mkdir -p ~/miniconda3

    curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh

    bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3

    rm -rf ~/miniconda3/miniconda.sh

  • After installing, initialize your newly-installed Miniconda. The following commands initialize for bash and zsh shells. ~/miniconda3/bin/conda init bash
    ~/miniconda3/bin/conda init zsh
  • Restart your terminal

Set up your Python notebook

In Data Science, all of our programming will be done in “notebooks”. Your python install will need a few libraries in order to run the notebooks. Using your command line, run the following:

                            
                                conda install jupyter
                                conda install pandas
                                conda install matplotlib
                                conda install seaborn
                        

Potential Error Workaround: IF you get an error about "conda not found" when trying to do this, you can also install these packages by doing the following.

  • Searching for the "miniconda" program you just downloaded, and run what should say "Anaconda Prompt."
  • This will open up another command line window that is specifically for running python commands (for instance commands that install packages).
  • Run the code in this Anaconda Prompt instead
                                    
                            conda install jupyter
                            conda install pandas
                            conda install matplotlib
                            conda install seaborn
                            

Editing Jupyter Notebook Files (ipynb) in the Jupyter Notebook Application


Opening the Jupyter Notebook

One way to open the notebook (may not work)

Open the notebook with the command:

                                
                            jupyter notebook
                            

Another way to open the notebook: IF you get an error about "jupyter is not recognized" when trying to do this, you can also open the notebook by doing the following.

  • Searching for the "miniconda" program you just downloaded, and run what should say "Anaconda Prompt."
  • This will open up the Anaconda Command Line window that is specifically for running python commands (for instance commands that install python packages or launch jupyter notebooks).
  • If your Anaconda Command line window is not already there, navigate to your stat207 repository (cd Desktop -> cd stat207
  • Run the code in this Anaconda Command Line Prompt window instead
                                        
                            jupyter notebook
                                

Also another way to open the notebook:

  • Search for the the program "jupyter" on your computer and run it.
  • This will open a window that displays the file system of your computer. Navigate to the folder your notebook is saved in by clicking on the folder links.
  • Once you've found your notebook, click on it to open it

Editing the Jupyter Notebook (aka Working on the Assignment)

Inside of the notebook webpage:

  • Navigate into the folder containing Assignment1.ipynb and open up the notebook
  • Follow the instructions inside of the notebook

Whenever you are done, you should checkpoint (using File -> Save Checkpoint in the notebook) your notebook to save your work. Once your work is saved, you can exit the command line running the notebook with Ctrl + C.