8.8 Counting allele dosage

We’re often interested in encoding genotypes as a 0, 1, or 2, which you can think of as the dosage of the minor allele. This is an additive model, and assumes that the phenotype of the heterozygote is intermediate between those of the two homozygotes.

We can use the table function on the gt_GT_alleles column to quickly check how many individuals have each genotype.

# tabulate genotype counts
table(test_snp_gt$gt_GT_alleles)
## 
## A/A A/G 
##  66  20

Now we’ll use the mutate function to create a new column of the dataframe that counts the dosage of the minor allele (i.e., how many G’s each person has at that SNP):

# convert genotypes to counts (i.e., dosage) of minor allele
test_snp_gt <- test_snp_gt %>%
  # count number of Gs
  mutate(dosage = str_count(gt_GT_alleles, "G")) %>%
  drop_na()

head(test_snp_gt)
## # A tibble: 6 × 6
##   ChromKey    POS Indiv gt_GT gt_GT_alleles dosage
##      <int>  <int> <chr> <chr> <chr>          <int>
## 1        1 558185 1001  0/0   A/A                0
## 2        1 558185 1002  0/0   A/A                0
## 3        1 558185 1003  0/0   A/A                0
## 4        1 558185 1004  0/1   A/G                1
## 5        1 558185 1005  0/0   A/A                0
## 6        1 558185 1006  0/1   A/G                1

Checking our work with table

If we run table on the dosage column, we should get the same breakdown of genotypes as we got from the gt_GT_alleles columns.

# make sure we get the same genotype counts
table(test_snp_gt$dosage)
## 
##  0  1 
## 66 20