10.7 Data

The admixr package provides real example data from 10 human individuals, which can be acquired by running its download_data() function:

# download data into current directory
prefix <- download_data(dirname = ".")

We now have a directory called snps that contain four files:

  • snps.geno: Genotype of each individual (column) at each SNP (row)
    • Represented as counts of the alternative allele (0, 1, 2)
  • snps.ind: Population IDs for each individual
  • snps.snp: SNP IDs, positions, and alleles
  • regions.bed: A file of genomic regions (not required for basic admixr analysis)

EIGENSTRAT format

Together, the three .geno, .ind, and .snp files constitute EIGENSTRAT format. This is just a way of representing genotype data, similar to a VCF – in fact, several software packages exist to convert between VCF and EIGENSTRAT.