• 1 Homepage
  • 2 Genome browsers
    • 2.1 DNA sequencing data
    • 2.2 Assembling a genome
    • 2.3 The human reference genome
    • 2.4 UCSC genome browser
      • 2.4.1 Homepage
    • 2.5 Viewing one region of the genome
    • 2.6 IGV
      • 2.6.1 Homepage
    • 2.7 Navigating IGV
    • 2.8 Loading sequencing data
    • 2.9 The 1000 Genomes Project
    • 2.10 SRA
      • 2.10.1 Previewing sequencing data
    • 2.11 Viewing sequencing reads in IGV
    • 2.12 Interpreting IGV alignments
    • 2.13 Conclusion
    • 2.14 Homework
  • 3 Discovering mutations
    • 3.1 De novo mutations
    • 3.2 Recombination
    • 3.3 Setup
      • 3.3.1 R packages
      • 3.3.2 Data
    • 3.4 Visualizing the data
    • 3.5 Linear models
    • 3.6 Fitting a linear model for DNMs
    • 3.7 Confidence intervals
    • 3.8 Calculate 95% CIs
    • 3.9 Conclusion
    • 3.10 Homework
    • 3.11 Required homework
    • 3.12 Optional homework
  • 4 Linkage disequilibrium
    • 4.1 What is linkage disequilibrium?
    • 4.2 Why do we care about LD?
    • 4.3 Setup
    • 4.4 Are these SNPs in LD?
    • 4.5 Counting haplotypes with table
    • 4.6 Fisher’s exact test
    • 4.7 Measuring LD with \(D\)
    • 4.8 Calculating \(D\)
    • 4.9 Measuring LD with \(D'\)
    • 4.10 Measuring LD with \(r^2\)
    • 4.11 LDlink
    • 4.12 Visualizing LD blocks
    • 4.13 LD in association studies
    • 4.14 Conclusion
    • 4.15 Homework
  • 5 Simulating evolution
    • 5.1 Genetic drift
    • 5.2 The Wright-Fisher model
    • 5.3 Allele frequency, fixation, and loss
    • 5.4 Modeling allele frequencies
    • 5.5 The binomial distribution
    • 5.6 Setup
      • 5.6.1 R packages
      • 5.6.2 Data
    • 5.7 The rbinom function
    • 5.8 Increasing population size
    • 5.9 Simulating multiple generations
    • 5.10 For loops
    • 5.11 Updating variables within a for loop
    • 5.12 Adding a population size variable
    • 5.13 Changes in AF over generations
    • 5.14 Storing AFs in a vector
    • 5.15 Reformatting AFs for plotting
    • 5.16 Plotting AF trajectory
    • 5.17 Simulating different parameters with a function
    • 5.18 Creating a Wright-Fisher function
    • 5.19 Running a function
    • 5.20 Conclusion
    • 5.21 Homework
  • 6 Population structure
    • 6.1 What is a population?
    • 6.2 Population structure
    • 6.3 Geography of Genetic Variants
    • 6.4 Setup
      • 6.4.1 R packages
    • 6.5 Genotype data
    • 6.6 Metadata
    • 6.7 The allele frequency spectrum
    • 6.8 Theoretical AFS
    • 6.9 AF correlations between populations
    • 6.10 Common variation
    • 6.11 Principal component analysis
    • 6.12 Reformatting data for PCA
    • 6.13 Performing PCA
    • 6.14 Reformatting PCA output
    • 6.15 Annotate with population labels
    • 6.16 PCA plot
    • 6.17 Proportion of variance explained
    • 6.18 Conclusion
    • 6.19 Homework
    • 6.20 Required homework
    • 6.21 Optional homework
  • 7 Genome-wide association studies I
    • 7.1 Association studies
    • 7.2 GWAS is just linear regression
    • 7.3 Multiple testing
    • 7.4 LD and GWAS
    • 7.5 Imputation
    • 7.6 QQ plots
    • 7.7 Manhattan plots
    • 7.8 Sample size
    • 7.9 Interpreting GWAS results
    • 7.10 Conclusion
    • 7.11 Homework
  • 8 Genome-wide association studies II
    • 8.1 Setup
      • 8.1.1 R packages
    • 8.2 Data
    • 8.3 Variant Call Format (VCF)
    • 8.4 VCF header
    • 8.5 VCF data
    • 8.6 Reading in genotype data
    • 8.7 Tidying VCF
    • 8.8 Counting allele dosage
    • 8.9 Phenotype data
    • 8.10 Merging genotype and phenotype data
    • 8.11 GWAS for one variant
    • 8.12 Genotype-phenotype boxplots
    • 8.13 Linear regression
    • 8.14 GWAS for multiple SNPs
    • 8.15 GWAS of one SNP with PLINK
    • 8.16 GWAS of all SNPs with PLINK
    • 8.17 Plotting GWAS results
    • 8.18 Top GWAS SNP
    • 8.19 Conclusion
    • 8.20 Homework
  • 9 Scans for selection
    • 9.1 Signatures of positive selection
    • 9.2 Frequency-based signatures
    • 9.3 Haplotype-based signatures
    • 9.4 Setup
      • 9.4.1 R packages
    • 9.5 The FST statistic
    • 9.6 Data (for FST)
    • 9.7 The genetic_diff function
    • 9.8 Calculating FST
    • 9.9 Distribution of GST across the genome
    • 9.10 Top GST hits
    • 9.11 Viewing GST hits in GGV
    • 9.12 Population branch statistic
    • 9.13 Calculating PBS
    • 9.14 Data (for PBS)
    • 9.15 Reading in PBS data
    • 9.16 Calculating PBS
    • 9.17 Manhattan plot of PBS results
    • 9.18 Top PBS hits
    • 9.19 Plotting PBS trees
    • 9.20 Extended haplotype homozygosity
    • 9.21 Plotting EHH
    • 9.22 Integrated haplotype statistic
    • 9.23 The PopHuman browser
    • 9.24 Conclusion
    • 9.25 Homework
  • 10 Archaic admixture
    • 10.1 Neanderthal and Denisovan introgression
    • 10.2 Inferring introgression from phylogenetic trees
    • 10.3 Incomplete lineage sorting
    • 10.4 Evidence of introgression
    • 10.5 The \(D\) statistic
    • 10.6 Setup
      • 10.6.1 R packages
    • 10.7 Data
    • 10.8 Reading in data
    • 10.9 The d() function
    • 10.10 Computing the D statistic
    • 10.11 Converting to p-values
    • 10.12 Computing D for all populations
    • 10.13 Plotting the D statistic
    • 10.14 \(f_{4}\) statistic
    • 10.15 \(f_{4}\)-ratio statistic
    • 10.16 Plotting \(f_{4}\)-ratio results
    • 10.17 Computing statistics in genomic intervals
    • 10.18 BED files
    • 10.19 Region-specific \(f_4\) ratio
    • 10.20 Conclusion
    • 10.21 Homework
  • 11 Gene expression
    • 11.1 Gene expression
    • 11.2 The Genotype-Tissue Expression project
    • 11.3 GTEx portal
    • 11.4 Genetic effects on gene expression
    • 11.5 Expression QTLs
    • 11.6 eQTLs in the GTEx Portal
    • 11.7 Splicing QTLs
    • 11.8 Setup
      • 11.8.1 R packages
    • 11.9 Data
    • 11.10 Differential gene expression
    • 11.11 Conclusion
    • 11.12 Homework
  • 12 Coronavirus phylogenetics
    • 12.1 Phylogenetic trees
    • 12.2 Nextstrain
    • 12.3 Incomplete sampling
    • 12.4 Tracking SARS-CoV-2 with phylogenetics
    • 12.5 SARS-CoV-2 mutation landscape
    • 12.6 Setup
      • 12.6.1 R packages
    • 12.7 Data
    • 12.8 Neighbor joining trees
    • 12.9 Computing pairwise distance
    • 12.10 Building a phylogenetic tree
    • 12.11 Conclusion
    • 12.12 Homework
  • Authors
  • Published with bookdown & the OTTR template)

    Style adapted from: rstudio4edu-book (CC-BY 2.0)

Human Genome Variation Lab

4.11 LDlink

LDlink is a web application that allows you to compute and visualize linkage disequilibrium using data from the 1000 Genomes Project (the same dataset we’ve been using for this module).

Go to LDlink’s LDpair tool, which computes \(D'\) and \(r^2\) between pairs of SNPs. Using either the rsIDs or the chromosome and position of the two SNPs we looked at today, check our calculations for \(D'\) and \(r^2\). Make sure you:

  • Select All Populations, since we didn’t subset our data by population.
  • If using SNP position, note that our data was aligned to the GRCh38 reference genome.

Fig. 5. LDpair results for the two SNPs from this class.
Fig. 5. LDpair results for the two SNPs from this class.

We can see that these \(D'\) and \(r^2\) statistics, as well as the 4x4 table, are very similar to what we calculated by hand! (The values aren’t identical because we’re using a slightly different genotyping dataset.)


All illustrations CC-BY.
All other materials CC-BY unless noted otherwise.