Exercises - Dataframes
Basics of R for Data Science
Creating, Importing, and Indexing dataframes
Create a dataframe named
dx
with \(15\) rows and three variables including: a character variable, a numeric variable, and a logical variable (Hint: use thedata.frame()
function to combine previously created vectors each with \(15\) observations into a single dataframe)Use the
mean()
function to compute the average value of the numeric variable indx
Use the
mean()
function to compute the average value of the numeric variable indx
but only for rows where the logical variable has valueTRUE
(Hint: use indexing with[]
to select values of the numerical variable based on the values of the logical variable)Download this dataset, then import it in R as a dataframe named
df
Use functions
head()
,str()
, andsummary()
ondf
and understand what they doIn
df
, select and display the values of the variablemathAvgTime
that are greater than \(30,000\) (thirty thousand)In
df
, select and display the values of the variablemathAcc
only for rows where variablemathAvgTime
is greater than \(30,000\) (thirty thousand)
Working with dataframes
This section continues using df
downloaded and imported for the previous exercises
Use the
table()
function on theschool
variable in thedf
dataframe to count the number of observations for each schoolSelect and display only the rows in
df
where theschool
variable is equal to"school4"
. Do this using indexing with[]
, then redo using thesubset
functionLearn how to use the
aggregate()
function by reading its documentation. Then, use it to calculate the mean value of each variable indf
, grouped by theschool
variableRepeat the previous exercise, but this time calculate the median and the standard deviation of each variable grouped by
school
Understand why warnings occurred in the previous two exercises. Then, avoid these warnings by limiting the computations to some variables only (achieve this by using indexing with
[]
)Use the
cor()
function to compute a correlation matrix for numerical variables indf
Repeat the previous task, but round the correlation coefficients in the matrix to two decimal places using the
round()
function