9.5 The FST statistic
\(\mathbf{F_{ST}}\) is a statistic that quantifies differences in allele frequencies between populations at one variable site.
The version of \(\mathrm{F_{ST}}\) that we’ll calculate today compares genotypic variance within subpopulations (“S”) against the total population (“T”). One way to conceptualize this is the deficiency of heterozygotes observed across subpopulations, relative to the proportion that would be expected under random mating (i.e., no population structure).
We calculate this by taking the difference between:
- \(\mathbf{H_T}\): The expected frequency of heterozygotes when individuals across all subpopulations are pooled
- \(\mathbf{mean(H_S)}\): The mean frequency of heterozygotes, calculated within each subpopulation and then averaged
- where \(H = 2pq\), and \(p\) and \(q\) are the frequencies of the two alleles at a site
\[ \textrm{F}_{ST} = \frac{H_T - \textrm{mean}(H_S)}{H_T} \]
\(\mathrm{F_{ST}}\) ranges from 0 to 1:
- \(\mathrm{F_{ST}} = 0\): No population structure (separating the subpopulations doesn’t affect heterozygosity estimates)
- \(\mathrm{F_{ST}} = 1\): Subopulations are very different (ex: one population only carries one allele, while the other population only carries the other)
See this Nature Review Genetics article for a more thorough discussion on the use and interpretation of \(\mathrm{F_{ST}}\) and related statistics.