8.14 GWAS for multiple SNPs

A GWAS performs the linear regression we just did, for every SNP in the dataset. We could write a for loop to do this in R ourselves, but it would be slow because there are 256,896 variants in the full VCF.

Because GWAS is such a common approach, researchers have developed software to standardize this process and make it extremely efficient. The most popular software package for GWAS is called PLINK, which is preloaded into your Cloud session.

PLINK is a “command line” tool, so we could either use it by working from the Terminal tab in Posit Cloud, or using the system() command within R. For this class we’ll use the latter approach.


The system() command

The command line is a text interface that takes in commands for your computer’s operating system to run. RStudio and Posit Cloud are a more interactive interface for writing code that you’d normally have to run on the command line.

The system() command tells RStudio to run a snippet of command line code for you, without you having to leave the R environment.