TY - JOUR
T1 - Imputation Performance in Latin American Populations
T2 - Improving Rare Variants Representation With the Inclusion of Native American Genomes
AU - Jiménez-Kaufmann, Andrés
AU - Chong, Amanda Y.
AU - Cortés, Adrián
AU - Quinto-Cortés, Consuelo D.
AU - Fernandez-Valverde, Selene L.
AU - Ferreyra-Reyes, Leticia
AU - Cruz-Hervert, Luis Pablo
AU - Medina-Muñoz, Santiago G.
AU - Sohail, Mashaal
AU - Palma-Martinez, María J.
AU - Delgado-Sánchez, Gudalupe
AU - Mongua-Rodríguez, Norma
AU - Mentzer, Alexander J.
AU - Hill, Adrian V.S.
AU - Moreno-Macías, Hortensia
AU - Huerta-Chagoya, Alicia
AU - Aguilar-Salinas, Carlos A.
AU - Torres, Michael
AU - Kim, Hie Lim
AU - Kalsi, Namrata
AU - Schuster, Stephan C.
AU - Tusié-Luna, Teresa
AU - Del-Vecchyo, Diego Ortega
AU - García-García, Lourdes
AU - Moreno-Estrada, Andrés
N1 - Publisher Copyright:
Copyright © 2022 Jiménez-Kaufmann, Chong, Cortés, Quinto-Cortés, Fernandez-Valverde, Ferreyra-Reyes, Cruz-Hervert, Medina-Muñoz, Sohail, Palma-Martinez, Delgado-Sánchez, Mongua-Rodríguez, Mentzer, Hill, Moreno-Macías, Huerta-Chagoya, Aguilar-Salinas, Torres, Kim, Kalsi, Schuster, Tusié-Luna, Del-Vecchyo, García-García and Moreno-Estrada.
PY - 2022/1/3
Y1 - 2022/1/3
N2 - Current Genome-Wide Association Studies (GWAS) rely on genotype imputation to increase statistical power, improve fine-mapping of association signals, and facilitate meta-analyses. Due to the complex demographic history of Latin America and the lack of balanced representation of Native American genomes in current imputation panels, the discovery of locally relevant disease variants is likely to be missed, limiting the scope and impact of biomedical research in these populations. Therefore, the necessity of better diversity representation in genomic databases is a scientific imperative. Here, we expand the 1,000 Genomes reference panel (1KGP) with 134 Native American genomes (1KGP + NAT) to assess imputation performance in Latin American individuals of mixed ancestry. Our panel increased the number of SNPs above the GWAS quality threshold, thus improving statistical power for association studies in the region. It also increased imputation accuracy, particularly in low-frequency variants segregating in Native American ancestry tracts. The improvement is subtle but consistent across countries and proportional to the number of genomes added from local source populations. To project the potential improvement with a higher number of reference genomes, we performed simulations and found that at least 3,000 Native American genomes are needed to equal the imputation performance of variants in European ancestry tracts. This reflects the concerning imbalance of diversity in current references and highlights the contribution of our work to reducing it while complementing efforts to improve global equity in genomic research.
AB - Current Genome-Wide Association Studies (GWAS) rely on genotype imputation to increase statistical power, improve fine-mapping of association signals, and facilitate meta-analyses. Due to the complex demographic history of Latin America and the lack of balanced representation of Native American genomes in current imputation panels, the discovery of locally relevant disease variants is likely to be missed, limiting the scope and impact of biomedical research in these populations. Therefore, the necessity of better diversity representation in genomic databases is a scientific imperative. Here, we expand the 1,000 Genomes reference panel (1KGP) with 134 Native American genomes (1KGP + NAT) to assess imputation performance in Latin American individuals of mixed ancestry. Our panel increased the number of SNPs above the GWAS quality threshold, thus improving statistical power for association studies in the region. It also increased imputation accuracy, particularly in low-frequency variants segregating in Native American ancestry tracts. The improvement is subtle but consistent across countries and proportional to the number of genomes added from local source populations. To project the potential improvement with a higher number of reference genomes, we performed simulations and found that at least 3,000 Native American genomes are needed to equal the imputation performance of variants in European ancestry tracts. This reflects the concerning imbalance of diversity in current references and highlights the contribution of our work to reducing it while complementing efforts to improve global equity in genomic research.
KW - GWAS
KW - Imputation
KW - Latin Americans
KW - Native American ancestry
KW - reference panels
KW - underrepresented populations
UR - http://www.scopus.com/inward/record.url?scp=85123123221&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123123221&partnerID=8YFLogxK
U2 - 10.3389/fgene.2021.719791
DO - 10.3389/fgene.2021.719791
M3 - Article
AN - SCOPUS:85123123221
SN - 1664-8021
VL - 12
JO - Frontiers in Genetics
JF - Frontiers in Genetics
M1 - 719791
ER -