Logo

Exercises - Programming Like a Data Scientist

Basics of Python for Data Science

This set of exercises requires practicing and combining various skills: creating a project folder with a virtual environment, working with arrays and dataframes, using conditional statements and loops. Some of these exercises are bit advanced for the purposes of the present course!

This first part in blue concerns managing a virtual environment for reproducibility. It is not strictly mandatory for completing the rest of the exercise, so you could skip it, but it is good for doing practice… so you are encouraged to try!

  • Create a new project folder somewhere on your computer. In it, create a new python virtual environment (with this command in your bash/terminal: python -m venv nameOfYourEnv), and activate it as shown in the slides depending on your operating system and whether you work in an IDE. The whole project folder should be organized as follows:
yourProjectFolder/
├── venv/virtual environment
├── data/.csv (see below)
├── scripts/.py scripts
├── results/          ← output files, figures
├── requirements.txt  ← list of installed packages for reproducibility (see below)
└── README.md         ← brief description of the project (with something related to this course)
  • In the just created virtual environment install numpy, pandas, matplotlib, seaborn (even if you had already installed them in the main python installation, you should re-install them in your local virtual environment)

  • After having successfully installed the above packages, generate the requirements.txt file for reproducibility, using this command in your bash/terminal (after having activated the virtual environment): pip freeze > requirements.txt

Finally, tidy up!: place all documents, code, results, figures, and related files into the appropriate subfolders within your project folder.