3.6 Fitting a linear model for DNMs

Run the following code to fit a model for the effect of age on paternal DNMs.

# fit linear model for paternal DNMs
fit_pat <- lm(formula = n_paternal_dnm ~ Father_age,
              data = dnm_by_age)

# print results of model
summary(fit_pat)
## 
## Call:
## lm(formula = n_paternal_dnm ~ Father_age, data = dnm_by_age)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.785  -5.683  -0.581   5.071  31.639 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.58819    1.70402   6.214 1.34e-09 ***
## Father_age   1.34849    0.05359  25.161  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.426 on 388 degrees of freedom
## Multiple R-squared:   0.62,  Adjusted R-squared:  0.619 
## F-statistic: 633.1 on 1 and 388 DF,  p-value: < 2.2e-16

How do you interpret results from a linear model?

For our purposes, the only part of the results you need to look at is the line under (Intercept) in the Coefficients section:

            Estimate Std. Error t value Pr(>|t|)
Father_age   1.34849    0.05359  25.161  < 2e-16 ***
  • The fourth columm, Pr(>|t|), is the p-value.

Because this p-value is < 2e-16, we can reject the null hypothesis and say that there is association between paternal age and the number of paternal DNMs.

  • The first column, Estimate, is the slope, or coefficient.

Linear regression fits a line to our plot of paternal age vs. number of DNMs. The coefficient estimate is the slope of that line.

The slope for paternal age given by this linear model is 1.34849. We can interpret this number this way: For every additional year of paternal age, we expect 1.35 additional paternal DNMs in the child.



Modify your code to assess the relationship between maternal age and number of maternal DNMs. Is this relationship significant? How many maternal DNMs do we expect for every additional year of maternal age?
# fit linear model for maternal DNMs
fit_mat <- lm(formula = n_maternal_dnm ~ Mother_age,
              data = dnm_by_age)

# print results of model
summary(fit_mat)
## 
## Call:
## lm(formula = n_maternal_dnm ~ Mother_age, data = dnm_by_age)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.8683 -3.1044 -0.2329  2.2394 17.5379 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.51442    0.98193   2.561   0.0108 *  
## Mother_age   0.37846    0.03509  10.785   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.503 on 388 degrees of freedom
## Multiple R-squared:  0.2307, Adjusted R-squared:  0.2287 
## F-statistic: 116.3 on 1 and 388 DF,  p-value: < 2.2e-16

The p-value is <2e-16 and the Mother_age slope is 0.37846.

This relationship is significant, and we expect 0.38 more maternal DNMs for every additional year of maternal age.