8.11 GWAS for one variant

Under the hood, GWAS is just linear regression – simple statistical models to assess evidence of a relationship between two variables. We can perform this linear regression by hand, using data from the first SNP in the VCF.

In our model, we’ll be asking whether there’s a relationship between an individual’s genotype (their dosage of the SNP) and phenotype (their \(\mathrm{IC_{50}}\) for GS451).

Why did we merge our genotype and phenotype data?

When we fit linear models in the DNM module, we needed our variables (age and # of DNMs) to be separate columns of the same table.

Similarly, now that our variables are genotype and phenotype, they need to be in the same dataframe.