10.5 The \(D\) statistic

Introgression creates an imbalance in the number of ABBA or BABA trees if only one of the human populations has admixed with Neanderthals. The \(\textbf{D}\) statistic quantifies this imbalance:

\[ D = \frac{\textrm{# BABA sites} - \textrm{# ABBA sites}}{\textrm{# BABA sites} + \textrm{# ABBA sites}} \]

\(D > 0\) is evidence for Neanderthal gene flow into the H2 population, while \(D < 0\) is evidence for gene flow into H1.

Choice of populations for the \(D\) statistic

The choice of populations is very important when calculating the \(D\) statistic.

When assessing archaic introgression, H2 is typicaly set as a human population without archaic admixture (e.g., a population from Africa). If H2 instead were a European population that did possess introgressed sequence, we would not expect a significant \(D\) statistic.