4.14 Conclusion
In this lab, we used genotype data from the 1000 Genomes Project to ask whether there is linkage disequilibrium between two SNPs on chr21.
- Using data from the VCF, we used
table
to count how often we observe combinations of alleles at these SNPs. - We used the data in the table to calculate three LD statistics:
- \(\mathbf{D}\): the deviation of the observed haplotype frequency from the expected haplotype frequency
- \(\mathbf{D'}\): a normalization of \(D\) that ranges from \([-1, 1]\)
- \(\mathbf{r^2}\): how well the allele at one locus predicts the allele at another locus
- We used LDlink to visualize how blocks of LD define haplotypes.