10.21 Homework
10.21.0.1 Learning Objectives
- Practice calculating introgression statistics in
admixr
- Interpret the biological significance of region-specific values of the \(f_4\)-ratio
10.21.0.2 Assignment
Follow these steps to create your own genome stratifications for calculating the \(f_4\) ratio statistic.
- Go to the UCSC Table Browser, where you can find a wide selection of annotations for the human genome.
- Make sure you set the
assembly:
drop-down box toFeb. 2009 (GRCh37/hg19)
. - Use the
group:
andtrack:
menus to select any set of genomic regions.- You can click the
data format description
button and scroll to theDescription
section to find out what each annotation represents.
- You can click the
- Under the
Retrieve and display data
section, set theoutput format:
to BED. - Enter an
output filename:
(ex:all_genes.bed
). - Click
get output
to download the file. - In Posit Cloud, upload your file using the
Upload
button in theFiles
panel (bottom right). - Run the code block below to reformat the BED file. The code matches the UCSC’s chromosome naming format with the format used in the
snps
data:
# fill in blank with the name of your bed file
system(command = "sed -i 's/chr//g' ________")
# get the path to your bed file
bed <- file.path("________")
Compute the \(f_4\) ratio statistic within and outside of the genomic intervals. Repeat for another set of genome annotations to contrast Neanderthal ancestry in different genomic elements.
Solution
Download tracklist of haploinsufficient genes (Phenotype and Literature
-> Haploinsufficiency
).
# get the path to the `regions.bed` file
bed <- file.path("haploinsufficient.bed")
# option 1: KEEP only these regions for analysis
new_snps_keep <- filter_bed(snps, bed)
# option 2: REMOVE these regions from analysis
new_snps_remove <- filter_bed(snps, bed, remove = TRUE)
Re-calculate the \(f_4\)-ratio:
# f4-ratio with the regions kept
f4_keep <- f4ratio(data = new_snps_keep,
X = pops, A = "Altai", B = "Vindija", C = "Yoruba", O = "Chimp") %>%
# convert z score to pvalue
mutate(p = 2 * pnorm(-abs(Zscore)))
f4_keep
## A B X C O alpha stderr Zscore p
## 1 Altai Vindija French Yoruba Chimp 0.030580 0.009385 3.258 1.122004e-03
## 2 Altai Vindija Sardinian Yoruba Chimp 0.018057 0.009117 1.981 4.759127e-02
## 3 Altai Vindija Han Yoruba Chimp 0.023127 0.009617 2.405 1.617247e-02
## 4 Altai Vindija Papuan Yoruba Chimp 0.036462 0.008964 4.068 4.741838e-05
## 5 Altai Vindija Khomani_San Yoruba Chimp -0.001223 0.008774 -0.139 8.894501e-01
## 6 Altai Vindija Mbuti Yoruba Chimp -0.015781 0.009100 -1.734 8.291808e-02
## 7 Altai Vindija Dinka Yoruba Chimp -0.004929 0.008235 -0.599 5.491729e-01
# f4-ratio with the regions removed
f4_remove <- f4ratio(data = new_snps_remove,
X = pops, A = "Altai", B = "Vindija", C = "Yoruba", O = "Chimp") %>%
# convert z score to pvalue
mutate(p = 2 * pnorm(-abs(Zscore)))
f4_remove
## A B X C O alpha stderr Zscore p
## 1 Altai Vindija French Yoruba Chimp 0.016554 0.007929 2.088 3.679783e-02
## 2 Altai Vindija Sardinian Yoruba Chimp 0.027980 0.007826 3.575 3.502279e-04
## 3 Altai Vindija Han Yoruba Chimp 0.020285 0.007348 2.761 5.762468e-03
## 4 Altai Vindija Papuan Yoruba Chimp 0.036136 0.007550 4.786 1.701381e-06
## 5 Altai Vindija Khomani_San Yoruba Chimp 0.005713 0.007593 0.752 4.520511e-01
## 6 Altai Vindija Mbuti Yoruba Chimp 0.009803 0.007233 1.355 1.754176e-01
## 7 Altai Vindija Dinka Yoruba Chimp -0.000663 0.006798 -0.097 9.227264e-01
Some of the alpha values for each population change when excluding/restricting to haploinsufficient genes, but their standard error ranges still overlap between the two \(f_4\)-ratio calculations, so they likely aren’t truly different.