Exercises - Programming Like a Data Scientist
Basics of R for Data Science
This set of exercises includes practicing and integrating various skills: working with dataframes and lists, applying conditional logic and iterative programming. Some of these exercises are pretty advanced for the purposes of the present course!
Download this workspace and import it using the
load()
function.The dataframe
df1
has all its columns stored as character, but many are mostly numeric, and should be treated as such. Your task is to determine which columns can reasonably be coerced to numeric. Use a combination of iterative and conditional programming to complete the following steps:- Iterate over all columns of
df1
(using afor
loop is recommended); - For each column, calculate the percentage of observations that remain valid (i.e., do not become
NA
) when coerced to numeric usingas.numeric()
; - If the percentage exceeds 80%, coerce the column to numeric;
- Keep track of the columns that were coerced to numeric;
- After processing all columns, compute a correlation matrix that includes only the columns that were coerced to numeric (use the
cor()
function; set argumentuse="pairwise.complete"
to avoid losing too many observation with listwise-deletion); - for fun: use the
corrplot()
function from thecorrplot
package to obtain a more colorful correlation matrix
- Iterate over all columns of
Extra for advanced users: The dataframe
dfWide1
contains data for treated subjects at three times ("T0"
,"T1"
, and"T2"
) in wide format, but it needs to be reshaped to long format for data analysis. Your task is to reshape it to long format using the basicreshape()
function (first of all, have a look at the documentation ofreshape()
).Extra for advanced users: Repeat the previous task using the convenient
pivot_longer()
function from thetidyr
package.Extra for super advanced users: The dataframe
dfWide2
is even more complex: as it contains two variables,Acc
andRT
, measured at three times. ReshapedfWide2
to long format, but keepingAcc
andRT
values in separate columns. Use any approach you like, but try to keep the code compact if possible.