Exercises - Programming Like a Data Scientist
Basics of R for Data Science
This set of exercises includes practicing and integrating various skills: working with dataframes and lists, applying conditional logic and iterative programming. Some of these exercises are pretty advanced for the purposes of the present course!
Download this workspace and import it using the
load()function.The dataframe
df1has all its columns stored as character, but many are mostly numeric, and should be treated as such. Your task is to determine which columns can reasonably be coerced to numeric. Use a combination of iterative and conditional programming to complete the following steps:- Iterate over all columns of
df1(using aforloop is recommended); - For each column, calculate the percentage of observations that remain valid (i.e., do not become
NA) when coerced to numeric usingas.numeric(); - If the percentage exceeds 80%, coerce the column to numeric;
- Keep track of the columns that were coerced to numeric;
- After processing all columns, compute a correlation matrix that includes only the columns that were coerced to numeric (use the
cor()function; set argumentuse="pairwise.complete"to avoid losing too many observation with listwise-deletion); - for fun: use the
corrplot()function from thecorrplotpackage to obtain a more colorful correlation matrix
- Iterate over all columns of
Extra for advanced users: The dataframe
dfWide1contains data for treated subjects at three times ("T0","T1", and"T2") in wide format, but it needs to be reshaped to long format for data analysis. Your task is to reshape it to long format using the basicreshape()function (first of all, have a look at the documentation ofreshape()).Extra for advanced users: Repeat the previous task using the convenient
pivot_longer()function from thetidyrpackage.Extra for super advanced users: The dataframe
dfWide2is even more complex: as it contains two variables,AccandRT, measured at three times. ReshapedfWide2to long format, but keepingAccandRTvalues in separate columns. Use any approach you like, but try to keep the code compact if possible.