11.2 The Genotype-Tissue Expression project

In 2010, the NIH launched the first large-scale dataset of gene expression data, called the Genotype-Tissue Expression (GTEx) project. The final version of this dataset (v8) was released in 2020.

GTEx is currently the most comprehensive gene expression dataset in existence. It involved the sequencing of whole genomes (DNA-seq) as well as transcriptomes (RNA-seq) from 948 recently-deceased individuals, with up to 54 tissues sampled throughout their bodies.

One of the main motivations of GTEx was to better understand the genetic control of gene expression. How does genetic variation contribute to variation in amount, splicing, and tissue specificity of expressed RNA?

Fig. 2 (source). Summary of individuals sequenced by GTEx.