Exercises - Basics of Programming… With Some Data Science
Basics of R for Data Science
Write a
forloop that runs 20 iterations, each time generating and printing a single random number from a distribution of your choice (e.g.,rbinom(),rnorm(),rgamma(),rpois(),rlogis()).Write a
forloop that runs 20 iterations, each time: 1) drawing a single number from a standard normal distribution (withrnorm()); 2)ifthe number is negative print it,elseprint the string"positive".Create a vector of 10 random numbers from a standard normal distribution (using
rnorm()), then theifelsefunction to convert the vector into a character one where each element is:"neg"if the number is negative,"pos"otherwise.Use a
forloop to address the following task: We aim to examine the sampling variability of the correlation between two normally distributed but unrelated variables (i.e., their underlying true correlation is \(r = 0\)), when the sample size is \(N = 30\)- run thousands of iterations, each time generating two independent random variables (using
rnorm()) with the above characteristics; - for each iteration, compute the estimated correlation coefficient with
cor()and store it as an number in a vector; - after completing the iterations, visualize the distribution of estimated correlation coefficients with
hist()or any other plotting method; also, compute themedian()and the 95% confidence interval using the quantile methods (i.e., withquantile()setting the argumentprobs=c(.025,.975)).
- run thousands of iterations, each time generating two independent random variables (using
Write a custom function called
describe_sign()that takes a number as input and returns"negative"if the input is below zero,"zero"if it is exactly zero, or"positive"if the input is greater than zero.Write a custom function called
simulate_correlations()that does the following (this is a bit advanced but funny 🙂):- take two numeric arguments as input:
Nandn_sim; - initializes a vector that has
n_simelements, all filled withNA; - runs a
forloopn_simtimes; - inside the loop, at each iteration it generates two independent normally distributed variables each with
Nobservations, computes the correlation coefficient between them, and stores it in the appropriate position of the previously initialized vector; - returns the vector filled with simulated correlation coefficients
- take two numeric arguments as input:
Extra for advanced users: Repeat one of the previous exercise that used the
forloop, but without using theforloop (i.e., employing an alternative iterative method)Extra for super advanced users: Repeat the above exercise where you had to define the
simulate_correlations()custom function, but now adding a third numeric argument as input, calledr, which defines the true correlation between the two normally distributed variables. To generate correlated variables, you can use themvrnorm()function from theMASSpackage