4.8 Calculating \(D\)

We can re-run our table code to find the probabilities we need for calculating \(D\).

table(haplotypes$snp1_allele, haplotypes$snp2_allele)

##    
##        C    T
##   A 2655  801
##   G  170 1382

\[ D = h_{12} - p_1*p_2 \]

What are \(h_{12}\), \(p_1\), and \(p_2\)?

\(h_{12}\) is the probability of seeing the A C haplotype. This is equal to the number of A C haplotypes over the number of total haplotypes:

\[\frac{2655}{2655 + 170 + 801 + 1382} = \frac{2655}{5008}\]

\(p_1\) is the probability that SNP1 is A. We can get this by adding across the first row of the table (i.e., adding the number of A C and A T haplotypes):

\[\frac{2655 + 801}{5008}\]

\(p_2\) is the probability that SNP2 is C. We can get this by adding across the first column of the table (i.e., adding the number of A C and G G haplotypes):

\[\frac{2655 + 170}{5008}\]

(Note that the denominator is always 5008 – the total number of haplotypes in our dataframe.)

Now we can plug in the corresponding probabilities to calculate D:

# define our probabilities of interest
h <- 2655 / 5008
p1 <- (2655 + 801) / 5008
p2 <- (2655 + 170) / 5008

# calculate D
D <- h - (p1 * p2)
D

## [1] 0.1408705

\(D = 0.14\), which is non-zero, suggesting that these SNPs are in LD.