8.2 Data

GWAS requires information on both genotype and phenotype in the same individuals.

The genotype data we’re using are real data from the Yoruba population in the 1000 Genomes Project, but the phenotype data is simulated.


Why can’t we use real phenotype data?

The combination of genotype and phenotype data poses a privacy risk, so real genotype and phenotype data are often stored in controlled-access databases such as dbGaP.

Although these data are still available to researchers who want to work with it, access usually requires submitting an application to explain what your intend to do with it.