4.15 Homework
4.15.0.2 Assignment
We’ve subset the VCF from class to show haplotypes for two different pair of SNPs:
chr21:15005329
andchr21:15007704
chr21:15336586
andchr21:15336794
# read data for first set of SNPs
hw1 <- read.table("snp_haplotypes_hw1.txt", header = TRUE)
# read data for second set of SNPs
hw2 <- read.table("snp_haplotypes_hw2.txt", header = TRUE)
Assignment: Using the code from class, calculate \(D\), \(D'\), and \(r^2\) for these sets of SNPs. Which alleles are segregating together? What does each LD statistic indicate? (Feel free to check your work on LDpair.)
Solution for first set of SNPs
First use table
to count the occurences of the four haplotypes.
##
## A G
## C 747 2508
## T 2 1751
All four possible haplotypes exist in this population.
\(\mathbf{D = h_{12} - p_1*p_2}\)
## [1] 0.05195286
\(D\) is non-zero, which suggests that these SNPs might be in LD.
\(\mathbf{D' = \frac{D}{\mathrm{min}(p_1 (1-p_2), p_2(1-p_1))}}\) (because \(D > 0\))
First we determine the denominator by calculating which of \(p_1 (1-p_2)\) and \(p_2(1-p_1)\) is smaller:
## [1] 0.5527516
## [1] 0.05235222
\(p_2(1-p_1)\) is smaller, so we use it for the denominator. \(D'\) is:
## [1] 0.9923717
\(D' = 0.99\), which indicates very high LD (almost no recombination has occurred between alleles on this haplotype).
\(\mathbf{r^2 = \frac{D^2}{p_1 (1-p_1) q_1 (1-q_1)}}\)
## [1] 0.09327254
However, \(r^2 = 0.09\) (linkage equilibrium)! This is because one of the haplotypes, A
T
, is very rare – there are only two copies in the population. \(r^2\) tells us that the counts of the A
T
haplotype are so low that an A
at SNP1 doesn’t do a great job of predicting when SNP2 is T
.
Solution for second set of SNPs
First use table
to count the occurences of the four haplotypes.
##
## A G
## A 3522 0
## G 0 1486
The only haplotypes that exist in this population are A
C
and G
G
.
\(\mathbf{D = h_{12} - p_1*p_2}\)
## [1] 0.2086794
\(D\) is non-zero, which suggests that these SNPs might be in LD.
\(\mathbf{D' = \frac{D}{\mathrm{min}(p_1 (1-q_1), (1-p_1)q_1 )}}\) (because \(D > 0\))
First we determine the denominator by calculating which of \(p_1 (1-p_1)\) and \(p_2 (1-p_2)\) is smaller:
## [1] 0.2086794
## [1] 0.2086794
The two values are exactly the same, so we can use either for the denominator. \(D'\) is:
## [1] 1
\(D' = 1\)! These SNPs are in maximum LD (no recombination has occured between them).
\(\mathbf{r^2 = \frac{D^2}{p_1 (1-p_1) p_2 (1-p_2)}}\)
## [1] 1
\(r^2 = 1\)! These SNPs are in maximum LD (everyone who carries an A
at SNP1 has an A
at SNP2, and everyone with a G
at SNP1 has a G
at SNP2).