Exercises - Dataframes
Basics of R for Data Science
Creating, Importing, and Indexing dataframes
Create a dataframe named
dxwith \(15\) rows and three variables including: a character variable, a numeric variable, and a logical variable (Hint: use thedata.frame()function to combine previously created vectors each with \(15\) observations into a single dataframe)Use the
mean()function to compute the average value of the numeric variable indxUse the
mean()function to compute the average value of the numeric variable indxbut only for rows where the logical variable has valueTRUE(Hint: use indexing with[]to select values of the numerical variable based on the values of the logical variable)Download this dataset, then import it in R as a dataframe named
dfUse functions
head(),str(), andsummary()ondfand understand what they doIn
df, select and display the values of the variablemathAvgTimethat are greater than \(30,000\) (thirty thousand)In
df, select and display the values of the variablemathAcconly for rows where variablemathAvgTimeis greater than \(30,000\) (thirty thousand)
Working with dataframes
This section continues using df downloaded and imported for the previous exercises
Use the
table()function on theschoolvariable in thedfdataframe to count the number of observations for each schoolSelect and display only the rows in
dfwhere theschoolvariable is equal to"school4". Do this using indexing with[], then redo using thesubsetfunctionLearn how to use the
aggregate()function by reading its documentation. Then, use it to calculate the mean value of each variable indf, grouped by theschoolvariableRepeat the previous exercise, but this time calculate the median and the standard deviation of each variable grouped by
schoolUnderstand why warnings occurred in the previous two exercises. Then, avoid these warnings by limiting the computations to some variables only (achieve this by using indexing with
[])Use the
cor()function to compute a correlation matrix for numerical variables indfRepeat the previous task, but round the correlation coefficients in the matrix to two decimal places using the
round()function