11.10 Differential gene expression
Using this dataset, we can fit regression models to ask about differences in gene expression between conditions. For example, let’s say we’re interested in whether the ACE2 gene – the receptor bound by the SARS-CoV-2 virus – exhibits differences in expression between males and females.
First we subset the data to the relevant gene and tissue (ACE2 and lung) for our test:
subset <- gtex %>%
# filter for tissue and gene of interest
filter(Tissue == "Lung" & Gene_Name == "ACE2")
head(subset)
## Sample Age Sex Death_Hardy Tissue Gene_ID Gene_Name Counts
## 1 GTEX-111YS 60-69 M 0 Lung ENSG00000130234.10 ACE2 640
## 2 GTEX-1128S 60-69 F 2 Lung ENSG00000130234.10 ACE2 64
## 3 GTEX-11DXX 60-69 F 0 Lung ENSG00000130234.10 ACE2 236
## 4 GTEX-11DXZ 50-59 M 0 Lung ENSG00000130234.10 ACE2 174
## 5 GTEX-11EMC 60-69 F 2 Lung ENSG00000130234.10 ACE2 162
## 6 GTEX-11EQ9 30-39 M 2 Lung ENSG00000130234.10 ACE2 85
Then we can fit a model to our data. Note that we’re using the glm.nb
function, which uses a negative binomial distribution – good for modeling discrete (i.e., non-continuous) data, such as sequencing read counts.
# fit model and print summary
model <- glm.nb(formula = Counts ~ factor(Sex),
data = subset)
summary(model)
##
## Call:
## glm.nb(formula = Counts ~ factor(Sex), data = subset, init.theta = 0.9820746913,
## link = log)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.5530 0.1709 32.494 < 2e-16 ***
## factor(Sex)M -0.7907 0.2121 -3.727 0.000194 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.9821) family taken to be 1)
##
## Null deviance: 129.67 on 99 degrees of freedom
## Residual deviance: 115.01 on 98 degrees of freedom
## AIC: 1214.5
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.982
## Std. Err.: 0.123
##
## 2 x log-likelihood: -1208.475
Does sex impact ACE2 expression?
The p-value is significant (p = 0.000194
), suggesting that sex does affect ACE2 expression. Based on the coefficient of -0.7907
, it looks like males tend to have ~0.8 fewer ACE2 transcripts in lung tissue.