7.2 GWAS is just linear regression

At their core, GWAS involve fitting linear models to test for relationships between variants and phenotypes using data from large samples of individuals.

As with the linear models we covered in the DNM module, GWAS fits a line to a set of points. In this case, each point is one individual in the dataset, stratified by their genotype for a variant of interest.

Fig. 2 (source). GWAS fits a linear model for every variant, where the x axis is genotype and the y axis is a phenotype.
Fig. 2 (source). GWAS fits a linear model for every variant, where the x axis is genotype and the y axis is a phenotype.

Because there are so many variants in the genome and we perform a separate statistical test for each one, we often end up fitting millions of linear models for a GWAS.