8.18 Top GWAS SNP
One common future direction for GWAS studies is following up on the top SNP(s). Read in top_snp.vcf
, a VCF of just the top SNP in the dataset, so that we can plot boxplots of the top SNP genotype stratified by phenotype:
# extract genotypes of the top SNP
<- vcfR2tidy(read.vcfR("top_snp.vcf")) top_snp
## Scanning file to determine attributes.
## File attributes:
## meta lines: 27
## header_line: 28
## variant count: 1
## column count: 185
##
Meta line 27 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 1
## Character matrix gt cols: 185
## skip: 0
## nrows: 1
## row_num: 0
##
Processed variant: 1
## All variants processed
## Extracting gt element GT
<- top_snp$gt %>%
top_snp_gt drop_na()
# merge with phenotype data
<- merge(top_snp_gt, phenotypes,
gwas_data by.x = "Indiv", by.y = "IID")
# plot boxplots
ggplot(data = gwas_data) +
geom_boxplot(aes(x = gt_GT_alleles,
y = GS451_IC50))
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
Other potential follow-up directions include:
- Investigating the genomic environment in the UCSC Genome Browser
- Looking at nearby haplotype structure with LDproxy
- Note that the genotype data we’re using come from the Yoruba population
- Using the Geography of Genetic Variants browser to find the global allele frequencies of the variant
- Search for SNP in a phenotype database to see if there are other associations with it