Advertisement
Research Article| Volume 5, e16-e18, December 2015

Studies of East European populations with a 46-plex ancestry-informative indel set

Published:September 12, 2015DOI:https://doi.org/10.1016/j.fsigss.2015.09.007

      Abstract

      There are numerous ancestry informative markers (e.g., SNPs, InDels) used for distinguishing ancestral origins among continental regions of the world. This work presents the 46-plex ancestry informative InDels set data for three East European populations (Turkish, Turkish Cypriot and Azerbaijani).

      Keywords

      1. Introduction

      Insertion-deletion polymorphisms (InDels) have various characteristics (e.g., low mutation rates, short amplicon sizes and a very straightforward amplification technique) that make them well suited for forensic DNA analysis. InDels are also practical loci for the forensic analysis of biogeographic ancestry as they can show high allele frequency differentiation amongst population groups, while mixed DNA is less likely to go undetected and be misinterpreted as indicating admixed ancestry [
      • Pereira R.
      • Phillips C.
      • Pinto N.
      • et al.
      Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing.
      ,
      • Santos C.
      • Phillips C.
      • Oldoni F.
      • et al.
      Completion of a worldwide reference panel of samples for an ancestry informative Indel assay.
      ]. In this study we investigated patterns of genetic variation from 46 ancestry-informative InDels in Turkish, Turkish Cypriots and Azerbaijani populations.

      2. Material and methods

      40 unrelated individuals from each of the Turkish, Turkish Cypriots and Azerbaijani populations were genotyped. Genotypes were also collected for a total of 857 samples from the CEPH human genome diversity panel (HGDP-CEPH) corresponding to forty-five population groups: seven African (n = 105), four Middle Eastern (n = 163), eight European (n = 158), nine Central South Asian (n = 202) and seventeen East Asian (n = 229) populations. The HGDP-CEPH data were accessed using the SPSmart forInDel browser. (http://spsmart.cesga.es/forindel.php) [
      • Amigo J.
      • Salas A.
      • Phillips C.
      • et al.
      SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access.
      ]. PCR amplification and fragment analysis were performed following the protocol in [
      • Pereira R.
      • Phillips C.
      • Pinto N.
      • et al.
      Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing.
      ]. Allele frequencies, Hardy Weinberg equilibrium (HWE) and pairwise FST values were calculated using Arlequin v. 3.5 [
      • Excoffier L.
      • Lischer H.E.
      Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.
      ]. Biogeographic ancestry was analyzed using STRUCTURE v. 2.3.4 with a burnin length of 10,000 followed by 10,000 MCMC repetitions [
      • Pritchard J.K.
      • Stephens M.
      • Donnelly P.
      Inference of population structure using multilocus genotype data.
      ]. CLUMPAK was applied to create cluster plots [
      • Kopelman N.M.
      • Mayzel J.
      • Jakobsson M.
      • et al.
      Clumpak: a program for identifying clustering modes and packaging population structure inferences across K.
      ]. Multiple profile ancestry assignment likelihoods and PCA were measured using the USC Bayesian forensic SNP classifier Snipper (http://mathgene.usc.es/snipper/analysismultipleprofiles.html).

      3. Results

      The calculated pairwise genetic distances are presented in Table 1. The differences between each pair of the five major populations (Africa, Middle East, Central South Asia and East Asia) were all found to be significant (the level of significance, p < 0.001 after the Bonferroni correction). The Turkish, Turkish Cypriot and Azerbaijani populations show very low FST values between each other and indicate very strong genetic similarities.
      Table 1Pairwise genetic distance matrix based on the FSTvalues. P-values are shown above and the FST values below the diagonal. P-values in bold/italics indicate non-significant distances between the corresponding population pairs (p-value > 1 × 10−3).
      PopsAfricaMiddle EastCSAEuropeEasiaTurkeyTurkish CypriotAzerbaijan
      Africa*0.00000.00000.00000.00000.00000.00000.0000
      Middle East0.2896*0.00000.00000.00000.00000.00020.0003
      CSA0.36540.0150*0.00000.00000.00010.00000.0000
      Europe0.28930.02740.0353*0.00000.00000.00000.0000
      Easia0.39210.25140.28350.1808*0.00000.00000.0000
      Turkey0.33960.01180.00870.00950.2480*0.05520.1592
      Turkish Cypriot0.33610.00890.01060.02420.26250.0058*0.0061
      Azerbaijan0.33000.00800.01400.01270.24650.00320.0089*
      Cluster plots indicate ancestral membership proportions of the eight populations for K = 2–4 based on the STRUCTURE results (Fig. 1). The K:2 plot distinguishes Asian from all other population groups. The best differentiation was observed for K:3 with the three main continental population groups (Africa, Europe and East Asia) clustered together, while Middle East and Central South Asia clustered together with Europe. Turkish, Turkish Cypriot and Azerbaijani samples clustered with the European group including Central South Asia and Middle East. The Middle East, Europe and Central South Asian cluster reveal the admixed character of these populations as the reasonable stopping point at K:4.
      Figure thumbnail gr1
      Fig. 1STRUCTURE analysis of grouped reference populations (leftmost), Turkish, Turkish Cypriot and Azerbaijani populations (K = 2–4) (righmost). Each vertical bar represents one individual and the colours represent the individual admixture proportions based on K assumed clusters.
      Reference HGDP-CEPH diversity panel genetic data from the five population groups (AFR, EUR, ME, CSA and EAS) was used to estimate individual ancestry assignment of each individual from the studied populations. Consistent with the results from the STRUCTURE, the Bayesian ancestry assignments for the tested individuals were assigned as European, Middle Eastern or Central South Asian (data not shown).

      4. Discussion

      Results indicated that the 46 InDel set, though highly efficient in inferring the ancestry of individuals from Africa, Europe and East Asia, did not reveal distinct genetic clusters for Middle Eastern and Central South Asian populations. Populations from Turkey, Cyprus and Azerbaijan all show European, Middle Eastern and Central South Asian ancestry components. Our results strongly corroborate previous studies [
      • Pereira R.
      • Phillips C.
      • Pinto N.
      • et al.
      Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing.
      ,
      • Santos C.
      • Phillips C.
      • Oldoni F.
      • et al.
      Completion of a worldwide reference panel of samples for an ancestry informative Indel assay.
      ].

      5. Conclusion

      The 46 AIM-Indel markers originally selected to show differences between the major groups African, European, East Asian, Native American and Oceanian [
      • Pereira R.
      • Phillips C.
      • Pinto N.
      • et al.
      Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing.
      ,
      • Santos C.
      • Phillips C.
      • Oldoni F.
      • et al.
      Completion of a worldwide reference panel of samples for an ancestry informative Indel assay.
      ]. Eurasian sub-population groups will be more optimally differentiated by combining the InDels used here with the extended sets of InDels focused on variation within Eurasia as well as with the addition of panels of ancestry-informative SNPs chosen for the same purpose.
      This study provided new data for a forensic InDel database based on the SPSmart frequency browser framework: forInDel (Forensic Indel browser).

      Conflict of interest

      None.

      Acknowledgments

      This research was supported by Istanbul University Scientific Research Projects Unit (BAP) under International Research Projects (IRP) number 40991. CS is supported by funding awarded by the Portuguese Foundation for Science and Technology (FCT) and co-financed by the European Social Fund (Human Potential Thematic Operational Program SFRH/BD/75627/2010). We sincerely thank all the sample contributors. The authors would like to thank Department of Criminalistics Investigations DNA Laboratory, the Ministry of Internal Affairs of the Azerbaijan Republic for providing DNA samples.

      References

        • Pereira R.
        • Phillips C.
        • Pinto N.
        • et al.
        Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing.
        PLoS one. 2012; 7: e29684
        • Santos C.
        • Phillips C.
        • Oldoni F.
        • et al.
        Completion of a worldwide reference panel of samples for an ancestry informative Indel assay.
        Forensic Sci. Int. Genet. 2015; 17: 75-80
        • Amigo J.
        • Salas A.
        • Phillips C.
        • et al.
        SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access.
        BMC Bioinformatics. 2008; 9: 428
        • Excoffier L.
        • Lischer H.E.
        Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.
        Mol. Ecol. Resour. 2010; 10: 564-567
        • Pritchard J.K.
        • Stephens M.
        • Donnelly P.
        Inference of population structure using multilocus genotype data.
        Genetics. 2000; 155: 945-959
        • Kopelman N.M.
        • Mayzel J.
        • Jakobsson M.
        • et al.
        Clumpak: a program for identifying clustering modes and packaging population structure inferences across K.
        Mol. Ecol. Resour. 2015;