11.10 Differential gene expression

Using this dataset, we can fit regression models to ask about differences in gene expression between conditions. For example, let’s say we’re interested in whether the ACE2 gene – the receptor bound by the SARS-CoV-2 virus – exhibits differences in expression between males and females.

First we subset the data to the relevant gene and tissue (ACE2 and lung) for our test:

subset <- gtex %>%
  # filter for tissue and gene of interest
  filter(Tissue == "Lung" & Gene_Name == "ACE2")

head(subset)
##       Sample   Age Sex Death_Hardy Tissue            Gene_ID Gene_Name Counts
## 1 GTEX-111YS 60-69   M           0   Lung ENSG00000130234.10      ACE2    640
## 2 GTEX-1128S 60-69   F           2   Lung ENSG00000130234.10      ACE2     64
## 3 GTEX-11DXX 60-69   F           0   Lung ENSG00000130234.10      ACE2    236
## 4 GTEX-11DXZ 50-59   M           0   Lung ENSG00000130234.10      ACE2    174
## 5 GTEX-11EMC 60-69   F           2   Lung ENSG00000130234.10      ACE2    162
## 6 GTEX-11EQ9 30-39   M           2   Lung ENSG00000130234.10      ACE2     85

Then we can fit a model to our data. Note that we’re using the glm.nb function, which uses a negative binomial distribution – good for modeling discrete (i.e., non-continuous) data, such as sequencing read counts.

# fit model and print summary
model <- glm.nb(formula = Counts ~ factor(Sex),
                data = subset)
summary(model)
## 
## Call:
## glm.nb(formula = Counts ~ factor(Sex), data = subset, init.theta = 0.9820746913, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0205  -0.9991  -0.5441  -0.0219   4.9984  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    5.5530     0.1709  32.494  < 2e-16 ***
## factor(Sex)M  -0.7907     0.2121  -3.727 0.000194 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.9821) family taken to be 1)
## 
##     Null deviance: 129.67  on 99  degrees of freedom
## Residual deviance: 115.01  on 98  degrees of freedom
## AIC: 1214.5
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.982 
##           Std. Err.:  0.123 
## 
##  2 x log-likelihood:  -1208.475

Does sex impact ACE2 expression?

The p-value is significant (p = 0.000194), suggesting that sex does affect ACE2 expression. Based on the coefficient of -0.7907, it looks like males tend to have ~0.8 fewer ACE2 transcripts in lung tissue.