9.7 The genetic_diff
function
We’ll compute \(\textrm{F}_{ST}\) using vcfR
’s genetic_diff
function. (This function technically calculates \(\textrm{G}_{ST}\), a version of \(\textrm{F}_{ST}\) that considers when there are more than two alleles at a given locus. When a locus is biallelic, \(\textrm{F}_{ST} = \textrm{G}_{ST}\).)
genetic_diff
requires:
vcfR
object (in our case,vcf
)- Factor indicating populations
“Factor indicating populations”
The second object for genetic_diff
needs to be a vector (i.e., a list) of population labels for the samples in the VCF.
These labels must be factors, which is an R data type that limits a variable to a set of values. In our case, these values are the specific population labels in our dataset. We’ll be using the superpopulation groupings for this calculation.
We can use our metadata
table to generate a vector of superpopulation labels. Since the superpopulation IDs are in the superpop
column of that dataframe, we can convert the column from character to factor values with the as.factor
function.
## [1] EUR EUR EUR EUR EUR EUR
## Levels: AFR AMR EAS EUR SAS
Previewing pop_labels
shows us that there are five “levels” in this vector, where each level is a superpopulation name.