12.13 Homework
12.13.0.1 Learning Objectives
- Practice building and interpreting phylogenies in R
12.13.0.2 Assignment
hw_accessions.txt
and hw_aligned.fa
provide the alignments and metadata for 24 human SARS-CoV-2 sequences, plus the bat RaTG13 coronavirus strain.
Use this data to repeat the analysis we did in class (compute distance matrix, construct neighbor joining tree, perform bootstrapping for confidence values) to build a phylogeny of SARS-CoV-2 strains. You should use RaTG13 (accession: MN996532
) as an outgroup.
Solution
Read in accession data:
<- read.table("hw_accessions.txt", sep = "\t", header = TRUE)
accessions # make vectors of accessions IDs and full names
<- accessions$id
ids <- accessions$name names
Construct distance matrix:
# read fasta file
<- read.dna("hw_aligned.fa", format = "fasta", as.matrix = TRUE)
dna # compute pairwise distance matrix
<- dist.dna(dna, model = "TN93", as.matrix = TRUE) D
Build neighbor joining tree:
# make the tree
<- nj(D)
tree # manually root tree at RaTG13 coronavirus
<- root(tree, which(ids == "MN996532"))
tree # ladderize tree
<- ladderize(tree) tree
Bootstrap to determine tree uncertainty:
set.seed(123)
<- boot.phylo(tree, dna,
myBoots function(x) ladderize(root(nj(dist.dna(x,
model = "TN93")),
which(ids == "MN996532"))),
rooted = TRUE)
##
Running bootstraps: 100 / 100
## Calculating bootstrap values... done.
# replace "NA" with zero
is.na(myBoots)] <- 0 myBoots[
Plot the bootstrapped tree:
ggtree(tree, branch.length = "none") +
theme_tree2() +
geom_tiplab(label = names) +
geom_nodelab(label = myBoots, geom = "label", fill = "#deebf7") +
xlim(0, 25)