Chapter 6 Numerical Summaries
The numerical summary must match up with the type of variable(s).
Variable | Type of summary |
---|---|
1 Qualitative | frequency table, most common category |
1 Quantitative | mean, median, SD, IQR etc |
2 Qualitative | contingency table |
2 Quantitative | correlation, linear model |
1 Quantitative, 1 Qualitative | mean, median, SD, IQR etc across categories |
We’ll keep working with the mtcars
dataset.
So again remind yourself what it is like.
6.1 Frequency and contingency tables
- A frequency table summarises 1 qualitative variable.
- A contingency table summarises 2 qualitative variable.
6.2 Mean and median
- The mean and median measure centre for quantitative variables.
6.3 Standard deviation (SD)
The standard deviation measures spread for quantitative variables.
The
sd
command calculates the sample standard deviation. The squared SD is the variance.
- The
popsd
command calculates the population standard deviation, but requires themulticon
package.
#install.packages(multicon) # a package only needs to be installed once.
library(multicon)
popsd(mtcars$gear)
# Longer way
N = length(mtcars$gear)
sd(mtcars$gear)*sqrt((N-1)/N)
- Note: When we model a population by the box model [Section 8 and following], we will require the population SD.
6.4 Interquartile range (IQR)
- The quickest method is to use
IQR
.
- There are lots of different methods of working out the quartiles. We can use the
quantile
command, and then work out the IQR.
What is the 50% quantile equivalent to?
6.5 Summary
- The numerical summaries for quantitative variables can all be produced with
summary
, which is an expanded version of the 5 number summary. Sometimes these values will vary from usingquantile
as there are different conventions for calculating quartiles.
- We can consider a subset of the data. Here, we choose the mpg of cars which have a weight greater or equal to 3.
Here we take all the data from mtcars dataset for a specific cylinder e.g. 6.