Exercises - Basics of Programming
Basics of Python for Data Science
First of all
import numpy as np
import matplotlib.pyplot as plt
then:
Write a
forloop that runs 20 iterations, each time generating and printing a single random number, rounded to the 3rd decimal point, and drawn from a distribution of your choice (e.g.,np.random.binomial(),np.random.normal(),np.random.gamma(),np.random.poisson(),np.random.logistic()).Write a
forloop that runs 20 iterations, each time:- drawing a single number from a standard normal distribution (with
np.random.normal()); ifthe number is negative print it,elseprint the string"positive".
- drawing a single number from a standard normal distribution (with
Create an array of 10 random numbers from a standard normal distribution (using
np.random.normal()), then usenp.where()to create a new character array where each element is:"neg"if the number is negative,"pos"otherwise.We want to examine the sampling variability of the correlation between two normally distributed but unrelated variables (i.e., their underlying true correlation is \(r = 0\)), when the sample size is \(N = 30\); so write a
forloop that:- runs thousands of iterations, each time generating two independent random variables (using
np.random.normal()) with \(N = 30\); - at each iteration, compute the estimated correlation coefficient with
np.corrcoef()and store it as a number in a list or array (warning:np.corrcoeff()returns a small 2x2 matrix, so you have to extract the coefficient of interest from there before storing it); - after completing all the iterations, visualize the distribution of estimated correlation coefficients with
plt.hist()(hint: then also callplt.show()) or any other plotting method; also, compute thenp.median()and the 95% confidence interval using the quantile method (i.e., withnp.quantile()settingq=[.025,.975]).
- runs thousands of iterations, each time generating two independent random variables (using
Write a custom function called
describe_sign()that takes a number as input and returns as output:"negative"if the input value is below zero,"zero"if it is exactly zero, or"positive"if it is greater than zero.Write a custom function called
simulate_correlations()that does the following (this is a bit advanced but funny 🙂):- take two numeric arguments as input:
Nandn_sim; - initializes an empty array (
np.empty()) that hasn_simelements, all filled withnp.nan; - run a
forloopn_simtimes; - inside the loop, at each iteration it generates two independent normally distributed variables each with
Nobservations, computes the correlation coefficient between them, and stores it in the appropriate position of the previously initialized array (hint: this had already been done in the exercise above); - return as output the array filled with simulated correlation coefficients
- take two numeric arguments as input:
Extra for advanced users: Repeat the above exercise on the
simulate_correlations()custom function, but now adding a third numeric argument as input, calledr, which defines the true correlation between the two normally distributed variables. To generate correlated variables, you can usenp.random.multivariate_normal()(first of all, see and understand the documentation!)You are working with standardized test scores and you want to label each score as
"low"(below −1),"average"(between −1 and +1), or"high"(above +1):- Generate an array of 50 standard normal scores;
- Use
np.select()to transform the scores into the above labels.