10.14 \(f_{4}\) statistic

The \(\mathbf{f_{4}}\) statistic – not to be confused with the \(\mathrm{F_{ST}}\) from the previous week – is very similar to the D statistic. Its main advantage is that it is proportional to the branch length separating two pairs of populations.

Compute the \(f_{4}\) statistic for all populations using the code below:

f4_result <- f4(data = snps,
                W = pops, X = "Yoruba", Y = "Vindija", Z = "Chimp") %>%
  # convert z score into pvalue
  mutate(p = 2 * pnorm(-abs(Zscore)))

f4_result

##             W      X       Y     Z        f4   stderr Zscore  BABA  ABBA  nsnps
## 1      French Yoruba Vindija Chimp  0.001965 0.000437  4.501 15802 14844 487753
## 2   Sardinian Yoruba Vindija Chimp  0.001798 0.000427  4.209 15729 14852 487646
## 3         Han Yoruba Vindija Chimp  0.001746 0.000418  4.178 15780 14928 487925
## 4      Papuan Yoruba Vindija Chimp  0.002890 0.000417  6.924 16131 14721 487694
## 5 Khomani_San Yoruba Vindija Chimp  0.000436 0.000415  1.051 16168 15955 487564
## 6       Mbuti Yoruba Vindija Chimp -0.000030 0.000410 -0.074 15751 15766 487642
## 7       Dinka Yoruba Vindija Chimp -0.000057 0.000380 -0.151 15131 15159 487667
##              p
## 1 6.763451e-06
## 2 2.565034e-05
## 3 2.940837e-05
## 4 4.390659e-12
## 5 2.932586e-01
## 6 9.410104e-01
## 7 8.799757e-01

Note that the p-values are the same as when we calculated the \(D\) statistic, but the actual \(f_4\) values are different.