6.12 Reformatting data for PCA
We’re using R’s prcomp
function to perform PCA on our genotype data. This function takes a matrix where the rows are the data objects (i.e., individuals) and the columns are the associated measurements (i.e., variants).
First, run the code below to subset our data to just the genotypes:
# subset to just genotype columns
gt_matrix <- common[, 7:2510] %>%
as.matrix()
# view first 10 columns of matrix
head(gt_matrix[, 1:10])
## HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103 HG00104
## chr21_10467583 0 0 1 0 0 1 0 0
## chr21_10605468 1 0 0 0 1 0 0 0
## chr21_10616824 1 0 0 0 1 0 0 0
## chr21_10666513 0 0 0 0 0 0 0 0
## chr21_10700275 0 0 0 0 0 0 0 0
## chr21_10728655 1 0 0 0 0 0 0 0
## HG00105 HG00106
## chr21_10467583 1 0
## chr21_10605468 0 0
## chr21_10616824 0 0
## chr21_10666513 0 0
## chr21_10700275 1 0
## chr21_10728655 0 0
We then transpose the matrix with prcomp
’s t
function so that the rows are individuals and the columns are variants:
# transpose (i.e., rotate)
gt_matrix_T <- t(gt_matrix)
# view first 10 columns of transposed matrix
head(gt_matrix_T[, 1:10])
## chr21_10467583 chr21_10605468 chr21_10616824 chr21_10666513
## HG00096 0 1 1 0
## HG00097 0 0 0 0
## HG00099 1 0 0 0
## HG00100 0 0 0 0
## HG00101 0 1 1 0
## HG00102 1 0 0 0
## chr21_10700275 chr21_10728655 chr21_10732526 chr21_12976114
## HG00096 0 1 0 0
## HG00097 0 0 0 0
## HG00099 0 0 0 0
## HG00100 0 0 0 0
## HG00101 0 0 1 0
## HG00102 0 0 0 0
## chr21_12977074 chr21_13065545
## HG00096 0 0
## HG00097 0 0
## HG00099 0 0
## HG00100 0 0
## HG00101 0 0
## HG00102 0 0