9.24 Conclusion

In this lab, we used three approaches to identify selection in multi-population sequencing data.

  • Using genotype data from the 1000 Genomes Project, we calculated FST, a measure of how different a variant’s allele frequency is between populations.
    • We confirmed in the GGV Browser that the top FST variant shows strong population-specific AF differences.

  • We then calculated the population branch statistic (PBS) to identify variants under selection in a human population on Flores Island, Indonesia.
    • One of the top PBS hits was in the fatty acid desaturase gene cluster (FADS).

  • Finally, we discussed extended haplotype homozygosity (EHH) and related statistics, which detect long haplotypes that result from a selective sweep.
    • Using the PopHuman browser, we saw that the LCT locus – the most famous example of selection in humans – exhibits both elevated EHH and reduced genetic diversity (\(\pi\)).