Exercises - Basics of Programming… With Some Data Science
Basics of R for Data Science
Write a
for
loop that runs 20 iterations, each time generating and printing a single random number from a distribution of your choice (e.g.,rbinom()
,rnorm()
,rgamma()
,rpois()
,rlogis()
).Write a
for
loop that runs 20 iterations, each time: 1) drawing a single number from a standard normal distribution (withrnorm()
); 2)if
the number is negative print it,else
print the string"positive"
.Create a vector of 10 random numbers from a standard normal distribution (using
rnorm()
), then theifelse
function to convert the vector into a character one where each element is:"neg"
if the number is negative,"pos"
otherwise.Use a
for
loop to address the following task: We aim to examine the sampling variability of the correlation between two normally distributed but unrelated variables (i.e., their underlying true correlation is \(r = 0\)), when the sample size is \(N = 30\)- run thousands of iterations, each time generating two independent random variables (using
rnorm()
) with the above characteristics; - for each iteration, compute the estimated correlation coefficient with
cor()
and store it as an number in a vector; - after completing the iterations, visualize the distribution of estimated correlation coefficients with
hist()
or any other plotting method; also, compute themedian()
and the 95% confidence interval using the quantile methods (i.e., withquantile()
setting the argumentprobs=c(.025,.975)
).
- run thousands of iterations, each time generating two independent random variables (using
Write a custom function called
describe_sign()
that takes a number as input and returns"negative"
if the input is below zero,"zero"
if it is exactly zero, or"positive"
if the input is greater than zero.Write a custom function called
simulate_correlations()
that does the following (this is a bit advanced but funny 🙂):- take two numeric arguments as input:
N
andn_sim
; - initializes a vector that has
n_sim
elements, all filled withNA
; - runs a
for
loopn_sim
times; - inside the loop, at each iteration it generates two independent normally distributed variables each with
N
observations, computes the correlation coefficient between them, and stores it in the appropriate position of the previously initialized vector; - returns the vector filled with simulated correlation coefficients
- take two numeric arguments as input:
Extra for advanced users: Repeat one of the previous exercise that used the
for
loop, but without using thefor
loop (i.e., employing an alternative iterative method)Extra for super advanced users: Repeat the above exercise where you had to define the
simulate_correlations()
custom function, but now adding a third numeric argument as input, calledr
, which defines the true correlation between the two normally distributed variables. To generate correlated variables, you can use themvrnorm()
function from theMASS
package