3.5 Linear models
We can visually observe that age seems associated with number of DNMs in both males and females, but we need a way to ask if that this is a statistically meaningful association.
We can do this with a linear model. This model fits a line to the plots that we just made, and asks if the slope is significantly different from 0 (i.e., if there’s a significant increase in DNM count as age increases).
If this is a statistical test, what’s the null hypothesis?
The null hypothesis for this linear model is that the slope is 0 – i.e., that there’s no association between parental age and the number of DNMs from that parent.
If the slope is significantly different from 0, we can reject the null hypothesis.
We’ll fit a linear model using R’s lm
function. Run the following code block to open a manual describing the function.
lm
requires two arguments:
- The formula or equation it’s evaluating
- A table of data
The formula must be in the format response variable ~ predictor variable(s)
, where each variable is the name of a column in our data table.
Is our predictor variable the parental age or the number of DNMs?
The predictor variable is parental age. We expect the number of DNMs to change as a consequence of parental age.