Research Article| Volume 8, P102-104, December 2022

# Evaluating population structure of Ecuador for forensic STR markers

Published:September 29, 2022

## Abstract

The continuous admixture events among Europeans, Native Americans, and Africans occurred differently throughout the Ecuadorian territory, creating a diversified genetic composition. Therefore, to evaluate how the genetic diversity is partitioned along the country for 15 STRs, 842 admixed-population samples were analyzed. We also evaluated the effect of applying an adjustment for population structure when estimating LRs using a national database. The results showed that to accurately assess forensic evidence, the use of a national database may be justified with the application of an appropriate adjustment for population structure.

## 1. Introduction

Previous studies in the Ecuadorian population showed evidence of genetic substructure [
• Gaviria A.
• Zambrano A.K.
• Morejon G.
• Galarza J.
• Aguirre V.
• Vela M.
• Builes J.J.
• Burgos G.
Twenty two autosomal microsatellite data from Ecuador (Powerplex Fusion).
,
• Flores-Espinoza R.
• Paz-Cruz E.
• Ruiz-Pozo V.A.
• Lopez-Carrera M.
• Gusmão L.
• Burgos G.
]. In most studies, to account for substructure, the territory was divided in three natural regions, based on geographical criteria. However, few studies investigated genetic variation within regions or other regional boundaries considering demographic information. To capture the greatest genetic differentiation, we investigated an alternative division of the territory, taking into account both geographic and demographic data. We also investigated if the use of an FST adjustment allows to obtain likelihood ratios (LRs) that better adjust to those expected when considering the observed substructure at national and regional level.

## 2. Materials and methods

Samples were collected from 842 unrelated individuals from the Ecuadorian admixed population, under written informed consent. DNA samples were genotyped for one of the two marker sets: (i) 15 autosomal STRs included in the AmpFLSTR™ Identifiler™ Kit (Applied Biosystems, Foster City, California, USA); (ii) 21 autosomal STRs included in the PowerPlex® 21 System (Promega Corporation). Samples were grouped into three geographic regions (Fig. 1A) and considering the altitude of the cities (Fig. 1B). Amplified products were separated and detected on a 3130 Genetic Analyzer (Applied Biosystems). Genotypes were determined using the GeneMapper software V3.2 (Applied Biosystems).
Allele frequencies, Hardy-Weinberg equilibrium (HWE), F-statistics, Analysis of Molecular Variance (AMOVA) and probabilities of non-differentiation (p-values) were calculated using the Arlequin software v.3.5.2.2 [

L.L. Excoffier, S. Schneider, Arlequin: A Software For Population Genetics Data Analysis, Arlequin Ver. 3.0, 2005.

]. For multiple tests, the significant level of 0.05 was adjusted by applying the Bonferroni’s correction. LRs were calculated using the Familias 3 software [
• Kling D.
• Tillmar A.O.
• Egeland T.
Familias 3 - extensions and new functionality.
].

## 3. Results and discussion

The results showed no significant deviations from the HWE, after applying Bonferroni’s correction. However, four loci showed p-values below 0.05. The average value of observed heterozygotes (0.757) was lower than the expected for a population in HWE (0.769). The excess of homozygotes may reflect population substructure. To evaluate the degree of substructure, samples were separated into three natural regions. However, the proportion of heterozygotes remained lower than the expected under HWE, being even lower for the Pacific coast. To capture the greatest genetic differentiation, AMOVA was performed considering the two divisions of the territory. The results showed low FST values for the two divisions. However, the second division presented the highest variation among populations and the lowest variation within populations (Fig. 1C).
In the subsequent analyses of LRs, the FST value obtained for the second division (0.004) was used as an adjustment for population substructure. We estimated LRs using the allele frequencies of the national database. For comparison purpose, we have also estimated the LRs using; (i) the allele frequencies of three regional databases; (ii) the allele frequencies of the national database and applying the Balding and Nichols adjustment formula for θ = 0.004. As a measure of comparison of the LRs, we calculated the log-ratio of the LRs from the national and regional databases, as proposed by Gill et al [
• Gill P.
• Foreman L.
• Buckleton J.S.
• Triggs C.M.
• Allen H.
A comparison of adjustment methods to test the robustness of an STR DNA database comprised of 24 European populations.
].
$d=log10(LRNational/LRRegional)$

When the Balding and Nichols adjustment formula were used, conservative values in the national database were generated in more than 50% of the cases. This means that the use of the national database without an appropriate theta-correction would overstate the strength of the evidence against a defendant in more than 50% of cases. The difference in the distribution of d is shown in Fig. 2. Additionally, when cases from the Pacific coast were analyzed, we show that the highest values of d are generated when using the adjustment for substructure.

## 4. Conclusion

We demonstrate that the use of an FST adjustment allows to obtain more conservative values that better adjust to the expected ones, considering the observed substructure at national and regional level. We conclude that to accurately assess forensic evidence, the use of a national database may be justified with the application of an appropriate adjustment for population structure. However, the available database does not perfectly adjust to demographic events, and a broader study of the admixed population of Ecuador would be important to disclose regional differences and levels of population substructure.

## Funding

RF was supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brazil (CAPES) – Finance Code 001. GB was supported by MED.GBF.20.07 funded by DGIV from Universidad de Las Américas; Quito, Ecuador. LG was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico – CNPq (ref. 306342/2019-7), Brazil and by Fundação de Amparo à Pesquisa do Rio de Janeiro (FAPERJ), Brazil (CNE-2022 and E-26/211.369/2021).

None.

## References

• Gaviria A.
• Zambrano A.K.
• Morejon G.
• Galarza J.
• Aguirre V.
• Vela M.
• Builes J.J.
• Burgos G.
Twenty two autosomal microsatellite data from Ecuador (Powerplex Fusion).
Forensic Sci. Int. Genet. Suppl. Ser. 2013; 4: e330-e333
• Flores-Espinoza R.
• Paz-Cruz E.
• Ruiz-Pozo V.A.
• Lopez-Carrera M.
• Gusmão L.
• Burgos G.
Am. J. Phys. Anthropol. 2021; 176: 109-119
1. L.L. Excoffier, S. Schneider, Arlequin: A Software For Population Genetics Data Analysis, Arlequin Ver. 3.0, 2005.

• Kling D.
• Tillmar A.O.
• Egeland T.
Familias 3 - extensions and new functionality.
Forensic Sci. Int. Genet. 2014; 13: 121-127
• Gill P.
• Foreman L.
• Buckleton J.S.
• Triggs C.M.
• Allen H.
A comparison of adjustment methods to test the robustness of an STR DNA database comprised of 24 European populations.
Forensic Sci. Int. 2003; 131: 184-196