Pipeline

pipeline.png

Ancestry estimation

We use LASER to perform principal components analysis (PCA) based on the genotypes of each sample and to place them into a reference PCA space which was constructed using a set of reference individuals [14]. We built reference coordinates based on 938 samples from the Human Genome Diversity Project (HGDP) [15] and labeled them by the ancestry categories proposed by the GWASCatalog [16] which are also used in PGS Catalog.