9.8 Calculating FST

Run genetic_diff on the VCF:

# calculate gst
gst_results <- genetic_diff(vcf, pop_labels) %>%
  # order dataframe by descending gst value
  arrange(-Gst)
# preview highest gst variants
head(gst_results)
##   CHROM      POS    Hs_AFR     Hs_AMR    Hs_EAS      Hs_EUR      Hs_SAS
## 1 chr21 17753762 0.3537087 0.20174987 0.1326531 0.029319019 0.040063399
## 2 chr21 18668817 0.1477089 0.38831400 0.1377374 0.424382716 0.414679179
## 3 chr21 15620159 0.4997750 0.09318240 0.0000000 0.007905014 0.000000000
## 4 chr21 16235733 0.4994938 0.09318240 0.0000000 0.007905014 0.002042899
## 5 chr21 22780904 0.4992826 0.09836474 0.0000000 0.001982159 0.000000000
## 6 chr21 22786927 0.4991001 0.09318240 0.0000000 0.001982159 0.026231489
##          Ht n_AFR n_AMR n_EAS n_EUR n_SAS       Gst     Htmax    Gstmax
## 1 0.3650242  1320   694  1008  1008   978 0.5572530 0.8286973 0.8049790
## 2 0.4847713  1320   694  1008  1008   978 0.4082388 0.8484665 0.6618973
## 3 0.2439190  1320   694  1008  1008   978 0.4004813 0.8289905 0.8235999
## 4 0.2341095  1320   694  1008  1008   978 0.3739730 0.8290489 0.8232205
## 5 0.2323592  1320   694  1008  1008   978 0.3732539 0.8288159 0.8242912
## 6 0.2346916  1320   694  1008  1008   978 0.3609202 0.8297041 0.8192288
##    Gprimest
## 1 0.6922578
## 2 0.6167706
## 3 0.4862571
## 4 0.4542805
## 5 0.4528180
## 6 0.4405610
# preview lowest gst variants
tail(gst_results)
##      CHROM      POS      Hs_AFR      Hs_AMR      Hs_EAS      Hs_EUR      Hs_SAS
## 9743 chr21 45527242 0.001514004 0.000000000 0.000000000 0.001982159 0.002042899
## 9744 chr21 46135735 0.001514004 0.000000000 0.001982159 0.000000000 0.002042899
## 9745 chr21 10718788 0.001514004 0.000000000 0.001982159 0.001982159 0.000000000
## 9746 chr21 43949497 0.001514004 0.002877692 0.001982159 0.000000000 0.002042899
## 9747 chr21 33087300 0.003025712 0.005747079 0.005934666 0.003960380 0.004081616
## 9748 chr21  7948042 0.499885216 0.499995847 0.499968506 0.499992126 0.500000000
##               Ht n_AFR n_AMR n_EAS n_EUR n_SAS          Gst     Htmax    Gstmax
## 9743 0.001197365  1320   694  1008  1008   978 3.251980e-04 0.7924231 0.9984895
## 9744 0.001197365  1320   694  1008  1008   978 3.251980e-04 0.7924231 0.9984895
## 9745 0.001197365  1320   694  1008  1008   978 3.150481e-04 0.7924255 0.9984895
## 9746 0.001596168  1320   694  1008  1008   978 2.547455e-04 0.7924784 0.9979864
## 9747 0.004383322  1320   694  1008  1008   978 1.475473e-04 0.7930368 0.9944736
## 9748 0.499976954  1320   694  1008  1008   978 3.141686e-05 0.8960702 0.4420513
##          Gprimest
## 9743 3.256899e-04
## 9744 3.256899e-04
## 9745 3.155247e-04
## 9746 2.552595e-04
## 9747 1.483672e-04
## 9748 7.107064e-05

genetic_diff outputs a table of \(\textrm{G}_{ST}\) results, where every line corresponds to one variant from the input VCF. Our \(\textrm{G}_{ST}\) values range from \(0.0021\) to \(0.00033\).