Basics of Python for Data Science

PhD Course in Psychological Sciences - University of Padova

Author

About This Course

This is a short (10 hours) introductory course on Python that is offered within the PhD program in Psychological Sciences (University of Padova). R is more extensively used for statistical analysis and data science in this PhD program, but familiarity with Python is useful for different purposes, including advanced machine learning applications, deep learning models, natural language processing, computational efficiency in some scenarios, and programming experiments. Also, Python is widely required in industry and business, so basic proficiency with it a valuable asset! No prerequisites, but having attended Basics of R for Data Science may greatly help (well, it is a mandatory course for our PhD students)

You are encouraged to look at this online book available on GitHub: Python Data Science Handbook

Dates and Rooms 2025

Day	Time	Room
Tuesday, June 3rd	09:00-11:30	4R
Wednesday, June 4th	13:00-15:30	4R
Thursday, June 5th	09:00-11:30	4S
Friday, June 6th	09:00-11:30	4S

Getting Started

Bookmark this course homepage to have a quick access to this material (https://enricotoffalini.github.io/Basics-Python/). Then do this:

Install Python: go to the official Python download page and follow the installation instructions for your operating system.
Install an IDE (Integrated Development Environment): Suggested IDEs are Spyder or Posit-RStudio (which you should have already installed for the other courses; it also supports Python!).
Test your local setup: Make sure that your Python installation works; open your IDE of choice and run the following code in the console:

!pip install numpy # install package "numpy" from inside IDE

import numpy as np
np.random.normal(0,1,size=10)

If you get any errors when running the first line, try install the package via terminal with
pip install numpy

Take a look at Colab: Some basic practice and exercises will be conducted on Google Colab (you need to log in with a Google account), a free online environment for writing and running Python in the browser without any local installation.

Course Topics

Getting Started with Python: Environment, Syntax, Tools

An introduction to Python and its ecosystem: setting up Python locally or in the cloud (Google Colab), using IDEs, understanding basic syntax and operations, creating and naming variables, using packages and functions, and working with core data structures (lists, tuples, dictionaries) and indexing.

Basics of Programming in Python

A hands-on to basic programming concepts such as conditional logic, loops, and write and use custom functions. These are core skills for writing flexible and efficient Python code.

Entering the World of Data Science in Python: `pandas`, `numpy`, and more

Explore the Python core libraries for data science. We will learn how to manipulate and analyze tabular data with pandas, handle arrays and numerical operations with numpy, and get a taste of statistical modeling and basic machine learning using statsmodels and scikit-learn.

A Bit on Fancy Topics?

Depending on interest, we may explore more advanced topics such as data visualization, basic machine learning, and use of deep learning and language models available via HuggingFace 🤗, or even simple experiment programming. After all, that’s truly why we want to use Python.

Materials

Slides

Exercises

— The following exercises are fundamental, and they importantly integrate concepts from the slides and introduce new functions and methods that you want to know!

— These other exercises are beyond the scope of this introductory course, but they could be stimulating and useful as simple tutorials for some users:

Basics of Clustering with K-Means and GMM
- additional exercise on Simulating clusters and assessing inferential risks
Basics of Sentiment Analysis with HuggingFace Transformers
Basics of Text Embeddings (plus PCA and Clustering)
Basics of Text Classification using Embeddings and Cosine Similarity
- additional exercise on Text embeddings to automatically evaluate construct validity
Other Examples of Language, Speech, and Image Processing
- additional mini-tutorial on AI/LLM as research assistants in systematic reviews

GitHub repository associated to the present course website: https://github.com/EnricoToffalini/Basics-Python

Access padlet

Many thanks to Filippo Gambarota for sharing his expertise with using GitHub and Quarto, to Margherita Calderan for her valuable assistance with programming experiments, and Tommaso Feraco for a fruitful collaboration on the use and interpretation of semantic embeddings

Read a modest proposal for open source software in the PhD program in Psychological Sciences