4.8 Calculating \(D\)
We can re-run our table code to find the probabilities we need for calculating \(D\).
##
## C T
## A 2655 801
## G 170 1382
\[ D = h_{12} - p_1*p_2 \]
What are \(h_{12}\), \(p_1\), and \(p_2\)?
\(h_{12}\) is the probability of seeing the A C haplotype. This is equal to the number of A C haplotypes over the number of total haplotypes:
\[\frac{2655}{2655 + 170 + 801 + 1382} = \frac{2655}{5008}\]
\(p_1\) is the probability that SNP1 is A. We can get this by adding across the first row of the table (i.e., adding the number of A C and A T haplotypes):
\[\frac{2655 + 801}{5008}\]
\(p_2\) is the probability that SNP2 is C. We can get this by adding across the first column of the table (i.e., adding the number of A C and G G haplotypes):
\[\frac{2655 + 170}{5008}\]
(Note that the denominator is always 5008 – the total number of haplotypes in our dataframe.)
Now we can plug in the corresponding probabilities to calculate D:
# define our probabilities of interest
h <- 2655 / 5008
p1 <- (2655 + 801) / 5008
p2 <- (2655 + 170) / 5008
# calculate D
D <- h - (p1 * p2)
D## [1] 0.1408705
\(D = 0.14\), which is non-zero, suggesting that these SNPs are in LD.