Machine Learning overview for biogeographical ancestry prediction - a PLS-DA approach

Published:October 25, 2022DOI:


      Biogeographical ancestry (BGA) of a trace or person/skeleton refers to the component of ethnicity, which is composed of biological and cultural elements and is biologically determined. Nowadays, many people are interested in researching their genealogy, and the ability to distinguish biogeographic information about populations and subgroups using DNA analysis plays an essential role in various fields, such as forensics. For example, it is advantageous for investigative and intelligence purposes to infer the biogeographic origin of perpetrators or victims of unsolved cases when reference profiles of perpetrators or database matches are not available for comparison purposes. Current approaches to biogeographic ancestry estimation using SNPs data are generally based on PCA and STRUCTURE software. The present study provides an alternative method that incorporates multivariate data analysis and Machine Learning strategies to assess the BGA discriminatory power of unknown samples using various commercial panels. Using datasets from the 1000 Genomes Project, Simons Genome Diversity Project, and Human Genome Diversity Project, which include African, American, Asian, European, and Oceanic individuals, powerful multivariate techniques such as Partial Least Squares-Discriminant Analysis (PLS-DA) and XGBoost were used and their discriminatory power was compared.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Forensic Science International: Genetics Supplement Series
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Alladio E.
        • Poggiali B.
        • Cosenza G.
        • Pilli E.
        Multivariate statistical approach and machine learning for the evaluation of biogeographical ancestry inference in the forensic field.
        Sci. Rep. 2022; 12: 8974