library(RandomData)
<- race_stats
dat
# change NA to 0s and to numeric
$fastestLapSpeed <-as.numeric(
datifelse(dat$fastestLapSpeed >= 0, dat$fastestLapSpeed, 0)
)
Base R Descriptive Stats
First, let’s find the minimum speed recorded and the maximum speed recorded during the fastest laps for each race. Since fastest lap time is a character we need to change it from a character to a numeric value and lets remove any NAs.
Min and Max
min(dat$fastestLapSpeed)
[1] 0
max(dat$fastestLapSpeed)
[1] 255.014
range(dat$fastestLapSpeed)
[1] 0.000 255.014
Mean
- The mean is average value of all the numbers in a set.
mean(dat$fastestLapSpeed)
[1] 201.4552
Median
- The median is the middle value in a set of numbers when they are ordered from least to greatest.
median(dat$fastestLapSpeed)
[1] 203.003
First and Third Quartiles
- The first quartile range is the value under which 25 percent of the data points are found when they are arranged in increasing order, and the third quartile range is where 75 percent of the data points are found when they are arranged in increasing order
quantile(dat$fastestLapSpeed, 0.25)
25%
191.142
quantile(dat$fastestLapSpeed, 0.75)
75%
214.339
IQR
- The IQR is the difference between the first and third quartile.
IQR(dat$fastestLapSpeed)
[1] 23.197
Standard Deviation and Variance
- Variance is the average squared difference between data points in a set, which measures how much the values in a set vary from each other. While Standard Deviation is the measure of how far the values in a set are from the mean
sd(dat$fastestLapSpeed)
[1] 23.22281
var(dat$fastestLapSpeed)
[1] 539.2991