3.6 Fitting a linear model for DNMs
Run the following code to fit a model for the effect of age on paternal DNMs.
# fit linear model for paternal DNMs
fit_pat <- lm(formula = n_paternal_dnm ~ Father_age,
data = dnm_by_age)
# print results of model
summary(fit_pat)
##
## Call:
## lm(formula = n_paternal_dnm ~ Father_age, data = dnm_by_age)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.785 -5.683 -0.581 5.071 31.639
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.58819 1.70402 6.214 1.34e-09 ***
## Father_age 1.34849 0.05359 25.161 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.426 on 388 degrees of freedom
## Multiple R-squared: 0.62, Adjusted R-squared: 0.619
## F-statistic: 633.1 on 1 and 388 DF, p-value: < 2.2e-16
How do you interpret results from a linear model?
For our purposes, the only part of the results you need to look at is the line under (Intercept)
in the Coefficients
section:
Estimate Std. Error t value Pr(>|t|)
Father_age 1.34849 0.05359 25.161 < 2e-16 ***
- The fourth columm,
Pr(>|t|)
, is the p-value.
Because this p-value is < 2e-16
, we can reject the null hypothesis and say that there is association between paternal age and the number of paternal DNMs.
- The first column,
Estimate
, is the slope, or coefficient.
Linear regression fits a line to our plot of paternal age vs. number of DNMs. The coefficient estimate is the slope of that line.
The slope for paternal age given by this linear model is 1.34849
. We can interpret this number this way: For every additional year of paternal age, we expect 1.35 additional paternal DNMs in the child.
Modify your code to assess the relationship between maternal age and number of maternal DNMs. Is this relationship significant? How many maternal DNMs do we expect for every additional year of maternal age?
# fit linear model for maternal DNMs
fit_mat <- lm(formula = n_maternal_dnm ~ Mother_age,
data = dnm_by_age)
# print results of model
summary(fit_mat)
##
## Call:
## lm(formula = n_maternal_dnm ~ Mother_age, data = dnm_by_age)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.8683 -3.1044 -0.2329 2.2394 17.5379
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.51442 0.98193 2.561 0.0108 *
## Mother_age 0.37846 0.03509 10.785 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.503 on 388 degrees of freedom
## Multiple R-squared: 0.2307, Adjusted R-squared: 0.2287
## F-statistic: 116.3 on 1 and 388 DF, p-value: < 2.2e-16
The p-value is <2e-16
and the Mother_age
slope is 0.37846
.
This relationship is significant, and we expect 0.38 more maternal DNMs for every additional year of maternal age.