4.8 Calculating \(D\)
We can re-run our table
code to find the probabilities we need for calculating \(D\).
##
## C T
## A 2655 801
## G 170 1382
\[ D = h_{12} - p_1*p_2 \]
What are \(h_{12}\), \(p_1\), and \(p_2\)?
\(h_{12}\) is the probability of seeing the A
C
haplotype. This is equal to the number of A
C
haplotypes over the number of total haplotypes:
\[\frac{2655}{2655 + 170 + 801 + 1382} = \frac{2655}{5008}\]
\(p_1\) is the probability that SNP1 is A
. We can get this by adding across the first row of the table (i.e., adding the number of A
C
and A
T
haplotypes):
\[\frac{2655 + 801}{5008}\]
\(p_2\) is the probability that SNP2 is C
. We can get this by adding across the first column of the table (i.e., adding the number of A
C
and G
G
haplotypes):
\[\frac{2655 + 170}{5008}\]
(Note that the denominator is always 5008 – the total number of haplotypes in our dataframe.)
Now we can plug in the corresponding probabilities to calculate D
:
# define our probabilities of interest
h <- 2655 / 5008
p1 <- (2655 + 801) / 5008
p2 <- (2655 + 170) / 5008
# calculate D
D <- h - (p1 * p2)
D
## [1] 0.1408705
\(D = 0.14\), which is non-zero, suggesting that these SNPs are in LD.