Genomic Ancestry Inference of Admixed Population by Identifying Approximate Boundaries of Ancestry Change

Document Type : Original Article


1 Faculty of Electrical and Computer Engineering, Babol Noshirvani University of Technology, Babol, Iran

2 Department of Molecular and Cell Biology, Faculty of Science University of Mazandaran, Babolsar, Iran

3 School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW), Sydney, Australia

4 UNSW Data Science Hub, University of New South Wales (UNSW), Sydney, Australia


Admixture is a common phenomenon in human populations, resulting from the mating of individuals from two or more previously isolated populations. This can lead to the formation of mosaic DNA segments, with each segment originating from a different ancestral population. Local ancestry inference methods are used to identify the ancestry of each segment, which can provide insights into the history of admixture in a population. Many local ancestry inference (LAI) methods require the determination of various parameters that may be difficult to obtain, which can hamper using LAI methods. In this paper, we present a novel method for identifying approximate boundaries of ancestry change (IABAC) in admixed haplotypes and then determining the ancestry between boundaries. Unlike many LAI methods, our method does not rely on many statistical or biological parameters, therefore more robust to variations in admixture patterns. We evaluate our method on human data, and show that it is more accurate than existing methods for ancestry detection. Our results suggest that IABAC is a promising new method for identifying ancestry boundaries in admixed haplotypes. This method could be used to study the history of admixture in human populations, and to identify genetic variants that are associated with different ancestral populations.

Graphical Abstract

Genomic Ancestry Inference of Admixed Population by Identifying Approximate Boundaries of Ancestry Change


Main Subjects

  1. Cavalli-Sforza LL, Feldman MW. The application of molecular genetic approaches to the study of human evolution. Nature genetics. 2003;33(Suppl 3):266-75.
  2. Yang JJ, Cheng C, Devidas M, Cao X, Fan Y, Campana D, et al. Ancestry and pharmacogenomics of relapse in acute lymphoblastic leukemia. Nature genetics. 2011;43(3):237-41.
  3. Koehl AJ. Estimating ancestry and genetic diversity in admixed populations: The University of New Mexico; 2016.
  4. Geza E, Mugo J, Mulder NJ, Wonkam A, Chimusa ER, Mazandu GK. A comprehensive survey of models for dissecting local ancestry deconvolution in human genome. Briefings in bioinformatics. 2019;20(5):1709-24.
  5. Price AL, Tandon A, Patterson N, Barnes KC, Rafaels N, Ruczinski I, et al. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS genetics. 2009;5(6):e1000519.
  6. Gravel S. Population genetics models of local ancestry. Genetics. 2012;191(2):607-19.
  7. Hu Y, Willer C, Zhan X, Kang HM, Abecasis GR. Accurate local-ancestry inference in exome-sequenced admixed individuals via off-target sequence reads. The American Journal of Human Genetics. 2013;93(5):891-9.
  8. Ma Y, Zhao J, Wong J-S, Ma L, Li W, Fu G, et al. Accurate inference of local phased ancestry of modern admixed populations. Scientific reports. 2014;4(1):5800.
  9. Durand EY, Do CB, Mountain JL, Macpherson JM. Ancestry composition: a novel, efficient pipeline for ancestry deconvolution. biorxiv. 2014:010512.
  10. Khayatzadeh N, Mészáros G, Gredler B, Schnyder U, Curik I, Sölkner J. Prediction of global and local Simmental and Red Holstein Friesian admixture levels in Swiss Fleckvieh cattle. Poljoprivreda. 2015;21(1 SUPPLEMENT):63-7.
  11. Alizadeh F, Jazayeriy H, Jazayeri O, Vafaee F, editors. SMIA: a simple way for inference of admixed population ancestors. 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE); 2020: IEEE.
  12. Pool JE, Nielsen R. Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics. 2009;181(2):711-9.
  13. Pasaniuc B, Zaitlen N, Lettre G, Chen GK, Tandon A, Kao WL, et al. Enhanced statistical tests for GWAS in admixed populations: assessment using African Americans from CARe and a Breast Cancer Consortium. PLoS genetics. 2011;7(4):e1001371.
  14. Wang X, Zhu X, Qin H, Cooper RS, Ewens WJ, Li C, et al. Adjustment for local ancestry in genetic association analysis of admixed populations. Bioinformatics. 2011;27(5):670-7.
  15. Omberg L, Salit J, Hackett N, Fuller J, Matthew R, Chouchane L, et al. Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations. BMC genetics. 2012;13:1-10.
  16. Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. The American Journal of Human Genetics. 2013;93(2):278-88.
  17. Sankararaman S, Sridhar S, Kimmel G, Halperin E. Estimating local ancestry in admixed populations. The American Journal of Human Genetics. 2008;82(2):290-303.
  18. Alizadeh F, Jazayeriy H, Jazayeri O, Vafaee F. AICRF: Ancestry Inference of Admixed Population with Deep Conditional Random Field. Journal of Genetics. accepted for puplication, 2023. 10.1007/s12041-023-01445-7
  19. Pa┼čaniuc B, Sankararaman S, Kimmel G, Halperin E. Inference of locus-specific ancestry in closely related populations. Bioinformatics. 2009;25(12):i213-i21.
  20. Brisbin A, Bryc K, Byrnes J, Zakharia F, Omberg L, Degenhardt J, et al. PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations. Human biology. 2012;84(4):343.
  21. Salter-Townshend M, Myers S. Fine-scale inference of ancestry segments without prior knowledge of admixing groups. Genetics. 2019;212(3):869-89.
  22. Guan Y. Detecting structure of haplotypes and local ancestry. Genetics. 2014;196(3):625-42.
  23. Kumar A, Montserrat DM, Bustamante C, Ioannidis A. Xgmix: Local-ancestry inference with stacked xgboost. BioRxiv. 2020:2020.04. 21.053876.
  24. Montserrat DM, Bustamante C, Ioannidis A, editors. Lai-net: Local-ancestry inference with neural networks. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2020: IEEE.
  25. Oriol Sabat B, Mas Montserrat D, Giro-i-Nieto X, Ioannidis AG. SALAI-Net: species-agnostic local ancestry inference network. Bioinformatics. 2022;38(Supplement_2):ii27-ii33.
  26. Wang Y, Song S, Schraiber JG, Sedghifar A, Byrnes JK, Turissini DA, et al. Ancestry inference using reference labeled clusters of haplotypes. BMC bioinformatics. 2021;22(1):1-14.
  27. Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. The American Journal of Human Genetics. 2007;81(5):1084-97.
  28. Yang JJ, Li J, Buu A, Williams LK. Efficient inference of local ancestry. Bioinformatics. 2013;29(21):2750-6.
  29. Gaurav K, Kumar A, Singh P, Kumari A, Kasar M, Suryawanshi T. Human Disease Prediction using Machine Learning Techniques and Real-life Parameters. International Journal of Engineering. 2023;36(6):1092-8.
  30. Hamidi H, Qaribpour F. An efficient predictive model for probability of genetic diseases transmission using a combined model. International Journal of Engineering. 2017;30(8):1152-9. 10.5829/ije.2017.30.08b.06
  31. Kumar S, Sahoo G. A random forest classifier based on genetic algorithm for cardiovascular diseases diagnosis. International Journal of Engineering, Transactions B: Applications. 2017;30(11):1723-9. 10.5829/ije.2017.30.11b.13
  32. Zamani F, Mohammadjani A. A Multiple Kernel Learning based Model with Clustered Features for Cancer Stage Detection using Gene Datasets. International Journal of Engineering, Transactions B: Applications. 2023.
  33. Shedthi B S, Shetty V, Chadaga R, Bhat R, Bangera P, Kini K P. Implementation of Chatbot that Predicts an Illness Dynamically using Machine Learning Techniques. International Journal of Engineering. 2023. IJE Article in press
  34. Anbananthen KSM, Busst MBMA, Kannan R, Kannan S. A Comparative Performance Analysis of Hybrid and Classical Machine Learning Method in Predicting Diabetes. Emerging Science Journal. 2022;7(1):102-15.
  35. Muthaiyah S, Singh VA, Zaw TOK, Anbananthen KS, Park B, Kim MJ. A Binary Survivability Prediction Classification Model towards Understanding of Osteosarcoma Prognosis. Emerging Science Journal. 2023;7(4):1294-314.
  36. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449(7164):851-61.
  37. Delaneau O, Coulonges C, Zagury J-F. Shape-IT: new rapid and accurate algorithm for haplotype inference. BMC bioinformatics. 2008;9(1):1-14.
  38. Geza E, Mulder NJ, Chimusa ER, Mazandu GK. FRANC: a unified framework for multi-way local ancestry deconvolution with high density SNP data. Briefings in bioinformatics. 2020;21(5):1837-45.
  39. Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005;310(5746):321-4.
  40. Slatkin M. Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics. 2008;9(6):477-85.
  41. Smith RD. The nonlinear structure of linkage disequilibrium. Theoretical Population Biology. 2020;134:160-70.