10.10 Computing the D statistic

Let’s compute \(D\) for four of the individuals we have data for: French, Sardinian, Vindija (Neanderthal), and chimpanzee.

d_result <- d(data = snps,
              # provide population names to calculate D between
              W = "French", X = "Sardinian", Y = "Vindija", Z = "Chimp")

d_result

##        W         X       Y     Z      D stderr Zscore  BABA  ABBA  nsnps
## 1 French Sardinian Vindija Chimp 0.0038 0.0074  0.511 10974 10891 487843

How do we interpret these results?

The last three columns count the number of ABBA and BABA sites, as well as the total number of variants being analyzed. First, note that the ABBA/BABA sites are only a small fraction of the total number of variants – most variants conform to the species-level tree.

The number of ABBA and BABA variants also looks similar, which implies that the discordant trees in these four populations primarily result from ILS rather than introgression.

The middle columns give the actual value of \(D\) and its standard error, as well as the Z score (which is equal to \(\frac{D}{\textrm{stderr}}\)).