8.19 Conclusion

We used genotype and simulated phenotype data from the 1000 Genomes Project to perform a genome-wide association study for variants associated with drug \(\mathrm{IC_{50}}\).

  • Using linear regression, we first did GWAS “by hand” on just one variant in the VCF. We fit a linear model to ask whether there’s a significant relationship between genotype and phenotype.

  • We then used PLINK to perform this test on every SNP in the genome.

  • We followed up on the top SNP from our GWAS by plotting boxplots of phenotype stratified by genotype.