11.9 Data
The GTEx Portal provides links for downloading curated and summarized forms of its data, including giant matrices that encode the expression of every gene across all samples and tissues.
For ease of manipulation in R, we’ve subset this data to 150 samples, highly expressed genes, and only data from liver and lung tissue.
## Sample Age Sex Death_Hardy Tissue Gene_ID Gene_Name Counts
## 1 GTEX-111YS 60-69 M 0 Lung ENSG00000187634.11 SAMD11 59
## 2 GTEX-111YS 60-69 M 0 Lung ENSG00000188976.10 NOC2L 2789
## 3 GTEX-111YS 60-69 M 0 Lung ENSG00000187961.13 KLHL17 716
## 4 GTEX-111YS 60-69 M 0 Lung ENSG00000187583.10 PLEKHN1 47
## 5 GTEX-111YS 60-69 M 0 Lung ENSG00000187642.9 PERM1 23
## 6 GTEX-111YS 60-69 M 0 Lung ENSG00000188290.10 HES4 534
The columns of this dataframe are:
Sample
: Individual sequencedAge
: Individual’s age rangeSex
: Individual’s sexDeath_Hardy
: Individual’s cause of death, measured on the Hardy ScaleTissue
: Tissue measuredGene_ID
: Ensembl gene IDGene_Name
: The common gene nameCounts
: Expression level for the gene- Ex:
GTEX-111YS
has 59 sequencing reads that mapped to theSAMD11
gene
- Ex:
Data normalization
The expression levels in this table have been normalized to account for factors such as sequencing variation between samples – i.e., if we collected more sequencing data from one individual than another.