11.12 Homework
11.12.0.1 Learning Objectives
- Become familiar with the data available in the GTEx dataset
- Interpret linear models in the context of gene expression
11.12.0.2 Assignment
Fit a regression model (or multiple!) to the GTEx data we downloaded in class and test for differential expression between two (or more) conditions. For example, you could look at another gene, compare between tissues, age groups, or Hardy classifications, etc.
Based on the results, does the condition affect expression of your gene?
Solution
As an example, we’ll test for differences in ACE2 expression between liver and lung tissue. First we subset the GTEx data to just the ACE2 gene:
## Sample Age Sex Death_Hardy Tissue Gene_ID Gene_Name Counts
## 1 GTEX-111YS 60-69 M 0 Lung ENSG00000130234.10 ACE2 640
## 2 GTEX-1128S 60-69 F 2 Lung ENSG00000130234.10 ACE2 64
## 3 GTEX-11DXX 60-69 F 0 Lung ENSG00000130234.10 ACE2 236
## 4 GTEX-11DXZ 50-59 M 0 Liver ENSG00000130234.10 ACE2 211
## 5 GTEX-11DXZ 50-59 M 0 Lung ENSG00000130234.10 ACE2 174
## 6 GTEX-11EMC 60-69 F 2 Lung ENSG00000130234.10 ACE2 162
Then we fit a model with glm.nb
:
# fit model and print summary
model <- glm.nb(formula = Counts ~ factor(Tissue),
data = subset)
summary(model)
##
## Call:
## glm.nb(formula = Counts ~ factor(Tissue), data = subset, init.theta = 1.034483188,
## link = log)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.6651 0.1370 34.05 < 2e-16 ***
## factor(Tissue)Lung 0.4491 0.1688 2.66 0.00781 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(1.0345) family taken to be 1)
##
## Null deviance: 180.46 on 151 degrees of freedom
## Residual deviance: 173.75 on 150 degrees of freedom
## AIC: 1819
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 1.034
## Std. Err.: 0.106
##
## 2 x log-likelihood: -1812.977
The p-value is significant (p = 0.00781
), suggesting that ACE2 expression does differ between liver and lung.
The coefficient estimate, 0.4491
, indicates that RNA-seq counts for ACE2 are \(0.45\) transcripts higher in lung than in liver.