Genomic Ancestry Inference of Admixed Population by Identifying Approximate Boundaries of Ancestry Change

Document Type : Original Article

Authors

1 Faculty of Electrical and Computer Engineering, Babol Noshirvani University of Technology, Babol, Iran

2 Department of Molecular and Cell Biology, Faculty of Science University of Mazandaran, Babolsar, Iran

3 School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW), Sydney, Australia

4 UNSW Data Science Hub, University of New South Wales (UNSW), Sydney, Australia

Abstract

Admixture is a common phenomenon in human populations, resulting from the mating of individuals from two or more previously isolated populations. This can lead to the formation of mosaic DNA segments, with each segment originating from a different ancestral population. Local ancestry inference methods are used to identify the ancestry of each segment, which can provide insights into the history of admixture in a population. Many local ancestry inference (LAI) methods require the determination of various parameters that may be difficult to obtain, which can hamper using LAI methods. In this paper, we present a novel method for identifying approximate boundaries of ancestry change (IABAC) in admixed haplotypes and then determining the ancestry between boundaries. Unlike many LAI methods, our method does not rely on many statistical or biological parameters, therefore more robust to variations in admixture patterns. We evaluate our method on human data, and show that it is more accurate than existing methods for ancestry detection. Our results suggest that IABAC is a promising new method for identifying ancestry boundaries in admixed haplotypes. This method could be used to study the history of admixture in human populations, and to identify genetic variants that are associated with different ancestral populations.

Graphical Abstract

Genomic Ancestry Inference of Admixed Population by Identifying Approximate Boundaries of Ancestry Change

Keywords

Main Subjects


  1. Cavalli-Sforza LL, Feldman MW. The application of molecular genetic approaches to the study of human evolution. Nature genetics. 2003;33(Suppl 3):266-75. https://doi.org/10.1038/ng1113
  2. Yang JJ, Cheng C, Devidas M, Cao X, Fan Y, Campana D, et al. Ancestry and pharmacogenomics of relapse in acute lymphoblastic leukemia. Nature genetics. 2011;43(3):237-41. https://doi.org/10.1038/ng.763
  3. Koehl AJ. Estimating ancestry and genetic diversity in admixed populations: The University of New Mexico; 2016.
  4. Geza E, Mugo J, Mulder NJ, Wonkam A, Chimusa ER, Mazandu GK. A comprehensive survey of models for dissecting local ancestry deconvolution in human genome. Briefings in bioinformatics. 2019;20(5):1709-24. https://doi.org/10.1093/bib/bby044
  5. Price AL, Tandon A, Patterson N, Barnes KC, Rafaels N, Ruczinski I, et al. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS genetics. 2009;5(6):e1000519. https://doi.org/10.1371/journal.pgen.1000519
  6. Gravel S. Population genetics models of local ancestry. Genetics. 2012;191(2):607-19.
  7. Hu Y, Willer C, Zhan X, Kang HM, Abecasis GR. Accurate local-ancestry inference in exome-sequenced admixed individuals via off-target sequence reads. The American Journal of Human Genetics. 2013;93(5):891-9. https://doi.org/10.1016/j.ajhg.2013.10.008
  8. Ma Y, Zhao J, Wong J-S, Ma L, Li W, Fu G, et al. Accurate inference of local phased ancestry of modern admixed populations. Scientific reports. 2014;4(1):5800. https://doi.org/10.1038/srep05800
  9. Durand EY, Do CB, Mountain JL, Macpherson JM. Ancestry composition: a novel, efficient pipeline for ancestry deconvolution. biorxiv. 2014:010512. https://doi.org/10.1101/010512
  10. Khayatzadeh N, Mészáros G, Gredler B, Schnyder U, Curik I, Sölkner J. Prediction of global and local Simmental and Red Holstein Friesian admixture levels in Swiss Fleckvieh cattle. Poljoprivreda. 2015;21(1 SUPPLEMENT):63-7. https://doi.org/10.18047/poljo.21.1.sup.14
  11. Alizadeh F, Jazayeriy H, Jazayeri O, Vafaee F, editors. SMIA: a simple way for inference of admixed population ancestors. 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE); 2020: IEEE. https://doi.org/10.1109/ICCKE50421.2020.9303686
  12. Pool JE, Nielsen R. Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics. 2009;181(2):711-9. https://doi.org/10.1534/genetics.108.098095
  13. Pasaniuc B, Zaitlen N, Lettre G, Chen GK, Tandon A, Kao WL, et al. Enhanced statistical tests for GWAS in admixed populations: assessment using African Americans from CARe and a Breast Cancer Consortium. PLoS genetics. 2011;7(4):e1001371. https://doi.org/10.1371/journal.pgen.1001371
  14. Wang X, Zhu X, Qin H, Cooper RS, Ewens WJ, Li C, et al. Adjustment for local ancestry in genetic association analysis of admixed populations. Bioinformatics. 2011;27(5):670-7. https://doi.org/10.1093/bioinformatics/btq709
  15. Omberg L, Salit J, Hackett N, Fuller J, Matthew R, Chouchane L, et al. Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations. BMC genetics. 2012;13:1-10. https://doi.org/10.1186/1471-2156-13-49
  16. Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. The American Journal of Human Genetics. 2013;93(2):278-88. http://dx.doi.org/10.1016/j.ajhg.2013.06.020
  17. Sankararaman S, Sridhar S, Kimmel G, Halperin E. Estimating local ancestry in admixed populations. The American Journal of Human Genetics. 2008;82(2):290-303. https://doi.org/10.1016/j.ajhg.2007.09.022
  18. Alizadeh F, Jazayeriy H, Jazayeri O, Vafaee F. AICRF: Ancestry Inference of Admixed Population with Deep Conditional Random Field. Journal of Genetics. accepted for puplication, 2023. 10.1007/s12041-023-01445-7
  19. Paşaniuc B, Sankararaman S, Kimmel G, Halperin E. Inference of locus-specific ancestry in closely related populations. Bioinformatics. 2009;25(12):i213-i21. https://doi.org/10.1093/bioinformatics/btp197
  20. Brisbin A, Bryc K, Byrnes J, Zakharia F, Omberg L, Degenhardt J, et al. PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations. Human biology. 2012;84(4):343. https://doi.org/10.3378%2F027.084.0401
  21. Salter-Townshend M, Myers S. Fine-scale inference of ancestry segments without prior knowledge of admixing groups. Genetics. 2019;212(3):869-89. https://doi.org/10.1534/genetics.119.302139
  22. Guan Y. Detecting structure of haplotypes and local ancestry. Genetics. 2014;196(3):625-42. https://doi.org/10.1534/genetics.113.160697
  23. Kumar A, Montserrat DM, Bustamante C, Ioannidis A. Xgmix: Local-ancestry inference with stacked xgboost. BioRxiv. 2020:2020.04. 21.053876. https://doi.org/10.1101/2020.04.21.053876
  24. Montserrat DM, Bustamante C, Ioannidis A, editors. Lai-net: Local-ancestry inference with neural networks. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2020: IEEE. https://doi.org/10.1109/ICASSP40776.2020.9053662
  25. Oriol Sabat B, Mas Montserrat D, Giro-i-Nieto X, Ioannidis AG. SALAI-Net: species-agnostic local ancestry inference network. Bioinformatics. 2022;38(Supplement_2):ii27-ii33. https://doi.org/10.1093/bioinformatics/btac464
  26. Wang Y, Song S, Schraiber JG, Sedghifar A, Byrnes JK, Turissini DA, et al. Ancestry inference using reference labeled clusters of haplotypes. BMC bioinformatics. 2021;22(1):1-14. https://doi.org/10.1186/s12859-021-04350-x
  27. Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. The American Journal of Human Genetics. 2007;81(5):1084-97. https://doi.org/10.1086/521987
  28. Yang JJ, Li J, Buu A, Williams LK. Efficient inference of local ancestry. Bioinformatics. 2013;29(21):2750-6. https://doi.org/10.1093/bioinformatics/btt488
  29. Gaurav K, Kumar A, Singh P, Kumari A, Kasar M, Suryawanshi T. Human Disease Prediction using Machine Learning Techniques and Real-life Parameters. International Journal of Engineering. 2023;36(6):1092-8. https://doi.org/10.5829/ije.2023.36.06c.07
  30. Hamidi H, Qaribpour F. An efficient predictive model for probability of genetic diseases transmission using a combined model. International Journal of Engineering. 2017;30(8):1152-9. 10.5829/ije.2017.30.08b.06
  31. Kumar S, Sahoo G. A random forest classifier based on genetic algorithm for cardiovascular diseases diagnosis. International Journal of Engineering, Transactions B: Applications. 2017;30(11):1723-9. 10.5829/ije.2017.30.11b.13
  32. Zamani F, Mohammadjani A. A Multiple Kernel Learning based Model with Clustered Features for Cancer Stage Detection using Gene Datasets. International Journal of Engineering, Transactions B: Applications. 2023. https://doi.org/10.5829/ije.2023.36.11b.08
  33. Shedthi B S, Shetty V, Chadaga R, Bhat R, Bangera P, Kini K P. Implementation of Chatbot that Predicts an Illness Dynamically using Machine Learning Techniques. International Journal of Engineering. 2023. IJE Article in press
  34. Anbananthen KSM, Busst MBMA, Kannan R, Kannan S. A Comparative Performance Analysis of Hybrid and Classical Machine Learning Method in Predicting Diabetes. Emerging Science Journal. 2022;7(1):102-15. https://doi.org/10.28991/ESJ-2023-07-01-08
  35. Muthaiyah S, Singh VA, Zaw TOK, Anbananthen KS, Park B, Kim MJ. A Binary Survivability Prediction Classification Model towards Understanding of Osteosarcoma Prognosis. Emerging Science Journal. 2023;7(4):1294-314. https://doi.org/10.28991/ESJ-2023-07-04-018
  36. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449(7164):851-61. https://doi.org/10.1038%2Fnature06258
  37. Delaneau O, Coulonges C, Zagury J-F. Shape-IT: new rapid and accurate algorithm for haplotype inference. BMC bioinformatics. 2008;9(1):1-14. https://doi.org/10.1186%2F1471-2105-9-540
  38. Geza E, Mulder NJ, Chimusa ER, Mazandu GK. FRANC: a unified framework for multi-way local ancestry deconvolution with high density SNP data. Briefings in bioinformatics. 2020;21(5):1837-45. https://doi.org/10.1093/bib/bbz117
  39. Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005;310(5746):321-4. https://doi.org/10.1126/science.1117196
  40. Slatkin M. Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics. 2008;9(6):477-85. https://doi.org/10.1038/nrg2361
  41. Smith RD. The nonlinear structure of linkage disequilibrium. Theoretical Population Biology. 2020;134:160-70. https://doi.org/10.1016/j.tpb.2020.02.005