Logo

Exercises - Dataframes: The nightmare… in Python

Basics of Python for Data Science

Like a data scientist data cleaner

This is the 🐍 version of the exercise that you (as students of the PhD program in Psychological Sciences) previously encountered for the course on Basics of R for Data Science

Scenario

You have received messy datasets from students or colleagues who have been collecting data from different tests: INVALSI, Wechsler, an experimental attention task, and personality questionnaires. Before you can do any meaningful analysis, you need to clean, merge, combine, and analyze the data

Your Final Goals

  • Produce a clean dataframe with one row per participant including only the total/aggregate scores for each type of data (e.g., “InvalsiTot” for INVALSI items data, “WechslerTot” for Wechsler subtests data, “meanAcc for the lab-based trials data,”OpennessTot” and “AgreabTot” for the personality-questionnaires data);
  • Produce a readable correlation matrix between all aggregate scores;
  • Produce some descriptive statistics for the aggregate scores (e.g., means, standard deviations, skewness coefficients, counts of missing values);
  • T-test comparison on INVALSI data (for males vs females);
  • Some histograms and scatter plots for distributions and pairs of variables.

Datasets: