8.11 GWAS for one variant

Under the hood, GWAS is just linear regression – simple statistical models to assess evidence of a relationship between two variables. We can perform this linear regression by hand, using data from the first SNP in the VCF.

In our model, we’ll be asking whether there’s a relationship between an individual’s genotype (their dosage of the SNP) and phenotype (their IC50 for GS451).


Why did we merge our genotype and phenotype data?

When we fit linear models in the DNM module, we needed our variables (age and # of DNMs) to be separate columns of the same table.

Similarly, now that our variables are genotype and phenotype, they need to be in the same dataframe.