4.14 Conclusion

In this lab, we used genotype data from the 1000 Genomes Project to ask whether there is linkage disequilibrium between two SNPs on chr21.

  • Using data from the VCF, we used table to count how often we observe combinations of alleles at these SNPs.

  • We used the data in the table to calculate three LD statistics:
    • \(\mathbf{D}\): the deviation of the observed haplotype frequency from the expected haplotype frequency
    • \(\mathbf{D'}\): a normalization of \(D\) that ranges from \([-1, 1]\)
    • \(\mathbf{r^2}\): how well the allele at one locus predicts the allele at another locus

  • We used LDlink to visualize how blocks of LD define haplotypes.