Chapter 6 Numerical Summaries
The numerical summary must match up with the type of variable(s).
| Variable | Type of summary |
|---|---|
| 1 Qualitative | frequency table, most common category |
| 1 Quantitative | mean, median, SD, IQR etc |
| 2 Qualitative | contingency table |
| 2 Quantitative | correlation, linear model |
| 1 Quantitative, 1 Qualitative | mean, median, SD, IQR etc across categories |
We’ll keep working with the mtcars dataset.
So again remind yourself what it is like.
6.1 Frequency and contingency tables
- A frequency table summarises 1 qualitative variable.
- A contingency table summarises 2 qualitative variable.
6.2 Mean and median
- The mean and median measure centre for quantitative variables.
6.3 Standard deviation (SD)
The standard deviation measures spread for quantitative variables.
The
sdcommand calculates the sample standard deviation. The squared SD is the variance.
- The
popsdcommand calculates the population standard deviation, but requires themulticonpackage.
#install.packages(multicon) # a package only needs to be installed once.
library(multicon)
popsd(mtcars$gear)
# Longer way
N = length(mtcars$gear)
sd(mtcars$gear)*sqrt((N-1)/N)- Note: When we model a population by the box model [Section 8 and following], we will require the population SD.
6.4 Interquartile range (IQR)
- The quickest method is to use
IQR.
- There are lots of different methods of working out the quartiles. We can use the
quantilecommand, and then work out the IQR.
What is the 50% quantile equivalent to?
6.5 Summary
- The numerical summaries for quantitative variables can all be produced with
summary, which is an expanded version of the 5 number summary. Sometimes these values will vary from usingquantileas there are different conventions for calculating quartiles.
- We can consider a subset of the data. Here, we choose the mpg of cars which have a weight greater or equal to 3.
Here we take all the data from mtcars dataset for a specific cylinder e.g. 6.