11.12 Homework

11.12.0.1 Learning Objectives

  • Become familiar with the data available in the GTEx dataset
  • Interpret linear models in the context of gene expression

11.12.0.2 Assignment

Fit a regression model (or multiple!) to the GTEx data we downloaded in class and test for differential expression between two (or more) conditions. For example, you could look at another gene, compare between tissues, age groups, or Hardy classifications, etc.

Based on the results, does the condition affect expression of your gene?


Solution

As an example, we’ll test for differences in ACE2 expression between liver and lung tissue. First we subset the GTEx data to just the ACE2 gene:

subset <- gtex %>%
  # filter gene of interest
  filter(Gene_Name == "ACE2")

head(subset)
##       Sample   Age Sex Death_Hardy Tissue            Gene_ID Gene_Name Counts
## 1 GTEX-111YS 60-69   M           0   Lung ENSG00000130234.10      ACE2    640
## 2 GTEX-1128S 60-69   F           2   Lung ENSG00000130234.10      ACE2     64
## 3 GTEX-11DXX 60-69   F           0   Lung ENSG00000130234.10      ACE2    236
## 4 GTEX-11DXZ 50-59   M           0  Liver ENSG00000130234.10      ACE2    211
## 5 GTEX-11DXZ 50-59   M           0   Lung ENSG00000130234.10      ACE2    174
## 6 GTEX-11EMC 60-69   F           2   Lung ENSG00000130234.10      ACE2    162

Then we fit a model with glm.nb:

# fit model and print summary
model <- glm.nb(formula = Counts ~ factor(Tissue),
                data = subset)
summary(model)
## 
## Call:
## glm.nb(formula = Counts ~ factor(Tissue), data = subset, init.theta = 1.034483188, 
##     link = log)
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          4.6651     0.1370   34.05  < 2e-16 ***
## factor(Tissue)Lung   0.4491     0.1688    2.66  0.00781 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.0345) family taken to be 1)
## 
##     Null deviance: 180.46  on 151  degrees of freedom
## Residual deviance: 173.75  on 150  degrees of freedom
## AIC: 1819
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.034 
##           Std. Err.:  0.106 
## 
##  2 x log-likelihood:  -1812.977

The p-value is significant (p = 0.00781), suggesting that ACE2 expression does differ between liver and lung.

The coefficient estimate, 0.4491, indicates that RNA-seq counts for ACE2 are \(0.45\) transcripts higher in lung than in liver.