Exercises - Basics of Programming
Basics of Python for Data Science
First of all
import numpy as np
import matplotlib.pyplot as plt
then:
Write a
for
loop that runs 20 iterations, each time generating and printing a single random number, rounded to the 3rd decimal point, and drawn from a distribution of your choice (e.g.,np.random.binomial()
,np.random.normal()
,np.random.gamma()
,np.random.poisson()
,np.random.logistic()
).Write a
for
loop that runs 20 iterations, each time:- drawing a single number from a standard normal distribution (with
np.random.normal()
); if
the number is negative print it,else
print the string"positive"
.
- drawing a single number from a standard normal distribution (with
Create an array of 10 random numbers from a standard normal distribution (using
np.random.normal()
), then usenp.where()
to create a new character array where each element is:"neg"
if the number is negative,"pos"
otherwise.We want to examine the sampling variability of the correlation between two normally distributed but unrelated variables (i.e., their underlying true correlation is \(r = 0\)), when the sample size is \(N = 30\); so write a
for
loop that:- runs thousands of iterations, each time generating two independent random variables (using
np.random.normal()
) with \(N = 30\); - at each iteration, compute the estimated correlation coefficient with
np.corrcoef()
and store it as a number in a list or array (warning:np.corrcoeff()
returns a small 2x2 matrix, so you have to extract the coefficient of interest from there before storing it); - after completing all the iterations, visualize the distribution of estimated correlation coefficients with
plt.hist()
(hint: then also callplt.show()
) or any other plotting method; also, compute thenp.median()
and the 95% confidence interval using the quantile method (i.e., withnp.quantile()
settingq=[.025,.975]
).
- runs thousands of iterations, each time generating two independent random variables (using
Write a custom function called
describe_sign()
that takes a number as input and returns as output:"negative"
if the input value is below zero,"zero"
if it is exactly zero, or"positive"
if it is greater than zero.Write a custom function called
simulate_correlations()
that does the following (this is a bit advanced but funny 🙂):- take two numeric arguments as input:
N
andn_sim
; - initializes an empty array (
np.empty()
) that hasn_sim
elements, all filled withnp.nan
; - run a
for
loopn_sim
times; - inside the loop, at each iteration it generates two independent normally distributed variables each with
N
observations, computes the correlation coefficient between them, and stores it in the appropriate position of the previously initialized array (hint: this had already been done in the exercise above); - return as output the array filled with simulated correlation coefficients
- take two numeric arguments as input:
Extra for advanced users: Repeat the above exercise on the
simulate_correlations()
custom function, but now adding a third numeric argument as input, calledr
, which defines the true correlation between the two normally distributed variables. To generate correlated variables, you can usenp.random.multivariate_normal()
(first of all, see and understand the documentation!)You are working with standardized test scores and you want to label each score as
"low"
(below −1),"average"
(between −1 and +1), or"high"
(above +1):- Generate an array of 50 standard normal scores;
- Use
np.select()
to transform the scores into the above labels.