9.7 The genetic_diff function

We’ll compute \(\textrm{F}_{ST}\) using vcfR’s genetic_diff function. (This function technically calculates \(\textrm{G}_{ST}\), a version of \(\textrm{F}_{ST}\) that considers when there are more than two alleles at a given locus. When a locus is biallelic, \(\textrm{F}_{ST} = \textrm{G}_{ST}\).)

?genetic_diff

genetic_diff requires:

  1. vcfR object (in our case, vcf)
  2. Factor indicating populations

“Factor indicating populations”

The second object for genetic_diff needs to be a vector (i.e., a list) of population labels for the samples in the VCF.

These labels must be factors, which is an R data type that limits a variable to a set of values. In our case, these values are the specific population labels in our dataset. We’ll be using the superpopulation groupings for this calculation.


We can use our metadata table to generate a vector of superpopulation labels. Since the superpopulation IDs are in the superpop column of that dataframe, we can convert the column from character to factor values with the as.factor function.

pop_labels <- as.factor(metadata$superpop)
head(pop_labels)
## [1] EUR EUR EUR EUR EUR EUR
## Levels: AFR AMR EAS EUR SAS

Previewing pop_labels shows us that there are five “levels” in this vector, where each level is a superpopulation name.