9.6 Data (for FST)

We’ll calculate \(\mathrm{F_{ST}}\) using genotype data from the 1000 Genomes Project. Read in the VCF using thevcfR package:

# read genotype data with vcfR
vcf <- read.vcfR(file = "random_variable_sites.vcf.gz")
## Scanning file to determine attributes.
## File attributes:
##   meta lines: 19
##   header_line: 20
##   variant count: 9748
##   column count: 2513
## Meta line 19 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
##   Character matrix gt rows: 9748
##   Character matrix gt cols: 2513
##   skip: 0
##   nrows: 9748
##   row_num: 0
## Processed variant 1000Processed variant 2000Processed variant 3000Processed variant 4000Processed variant 5000Processed variant 6000Processed variant 7000Processed variant 8000Processed variant 9000Processed variant: 9748
## All variants processed

We’ll also read in a metadata table with information on which populations each sample is from.

# read metadata
metadata <- read.table("integrated_call_samples.txt",
                       header = TRUE)

head(metadata)
##    sample pop superpop    sex
## 1 HG00096 GBR      EUR   male
## 2 HG00097 GBR      EUR female
## 3 HG00099 GBR      EUR female
## 4 HG00100 GBR      EUR female
## 5 HG00101 GBR      EUR   male
## 6 HG00102 GBR      EUR female