6.15 Annotate with population labels

Our last step is adding a column to our PCA dataframe with information about each individual’s population.

To do this, we merge pca_results with our metadata table. The merge function combines two tables, merging them by matching a column of your choice (specified with by =).

# merge pca_results and metadata
pca_results <- merge(pca_results, metadata,
                     # specify columns to merge on
                     by.x = "sample", by.y = "sample")

head(pca_results)
##    sample      PC1       PC2        PC3 pop superpop    sex
## 1 HG00096 3.060014 -5.822356 -1.2683268 GBR      EUR   male
## 2 HG00097 2.839200 -6.278675  0.8609691 GBR      EUR female
## 3 HG00099 1.803619 -5.171999  0.4033319 GBR      EUR female
## 4 HG00100 3.160473 -4.504760  1.8926507 GBR      EUR female
## 5 HG00101 4.035908 -4.545304  0.9407191 GBR      EUR   male
## 6 HG00102 3.608347 -4.668695  0.7327117 GBR      EUR female