Advertisement

Sequence‐based Saudi population data for the SE33 locus

Published:September 20, 2019DOI:https://doi.org/10.1016/j.fsigss.2019.09.004

      Abstract

      A set of 87 reference samples collected from the population of Saudi Arabia were sequenced using the ForenSeq™DNA Signature Prep Kit on a MiSeq FGx™. The FASTQ files contain the sequences of the SE33 STR, but are not reported by the ForenSeq™ Universal Analysis Software (UAS). The STRait Razor software was used to recover and to report SE33 sequence‐based data for the Saudi population. Ninety-six sequence-based alleles were recovered, most of which had previously reported motif patterns. Two unreported motif patterns found in three alleles and seven novel allele sequences were reported. We also reported a single discordance between the sequence-based data and the CE data that was due to the presence of a common TTTT deletion. SE33 had 130% more sequence-based alleles; the highest number of observed sequence variants were in alleles 27.2 and 30.2, which each had 7 sequence variants. The statistical parameters emphasize the usefulness of using the sequence-based data.

      Keywords

      1. Introduction

      Massively Parallel Sequencing (MPS) systems are now being adopted in many forensic laboratories generating detailed sequence data for different type of markers simultaneously. The ForenSeq™ DNA Signature Prep Kit allows sequencing >150 (Primer Mix A) or >230 markers (Primer Mix B) where users can decide which primer mix will be used.
      By utilising the MiSeq FGx™ and ForenSeq™ DNA Signature Prep Kit, the ForenSeq™ Universal Analysis Software (UAS), reports 27 autosomal STRs (aSTRs) along with other commonly used markers (Y-STRs, X-STRs, and SNPs). Although SE33 is included, it is not reported by the UAS.
      SE33 is the most polymorphic well-characterised STR [
      • Wiegand P.
      • Budowle B.
      • Rand S.
      • Brinkmann B.
      Forensic validation of the STR systems SE 33 and TC 11.
      ] which makes it valuable for forensic applications. Previous studies have demonstrated that sequence-based data of SE33 presents significantly more observed alleles compared to CE systems. A recent study has classified the SE33 repeat motifs into 34 types (A0, A1, A2…. to D3) based on the structure of the repeat region, eleven of which had >1% frequency in the tested populations [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ].
      The aim of this study was to provide Saudi sequence-based data for the SE33 locus. This included a concordance study with the GlobalFiler™ PCR Amplification Kit.

      2. Materials and methods

      Eighty-seven reference samples from the Saudi population, which were already profiled with the GlobalFiler® kit [
      • Alsafiah H.M.
      • Goodwin W.H.
      • Hadi S.
      • Alshaikhi M.A.
      • Wepeba P.
      Population genetic data for 21 autosomal STR loci for the Saudi Arabian population using the GlobalFiler® PCR amplification kit.
      ], were sequenced in the study. Using ForenSeq™ DNA Signature Prep Kit (Primer Mix A), libraries were prepared for sequencing following the manufacturer’s guidelines except that the volume of the pooled normalised library (PNL) was increased from 7 μl to 12 μl. Sequencing was carried out using a MiSeq FGx™ following the manufacturer’s guidelines.
      The STRait Razor v3.0 (SR) [
      • Woerner A.E.
      • King J.L.
      • Budowle B.
      Fast STR allele identification with STRait razor 3.0.
      ], was used to recover the SE33 sequences from the FASTQ files after modifying the configuration file by adding the 5′ and 3′ anchors and motif sequence provided in [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ]. All sequences with ≥ 10 reads (depth of coverage (DoC)) and heterozygous sequences that showed ≥ 20% of allele coverage ratio (ACR), were recovered automatically by the software. Sequences that showed less than 20% ACR were recovered manually.
      For the concordance study, the sequenced-based data was compared to CE data and Sanger sequencing was used to confirm a discordant result as previously described in [
      • Alsafiah H.M.
      • Iyengar A.
      • Hadi S.
      • Alshlash W.M.
      • Goodwin W.
      Sequence data of six unusual alleles at SE33 and D1S1656 STR Loci.
      ]. The novelty of a motif pattern or of an allele sequence was assessed based on those motifs and sequences reported in [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ] and in STRBase [
      • Ruitberg C.
      • Reeder D.
      • Butler J.
      STRBase: a short tandem repeat DNA database for the human identity testing community.
      ].
      Allele frequencies, matching probability (MP), and power of exclusion (PE) were calculated using the GenAlEx 6.5 software [
      • Peakall R.
      • Smouse P.E.
      GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research _an update.
      ]. The typical paternity index (PI) was calculated using the equation PI = (h+H)/2 h (h = homozygosity and H = Heterozygosity) as described in [
      • Brenner C.
      • Morris J.
      Paternity index calculations in single locus hypervariable DNA probes: validation and other studies.
      ]. Finally, the expected heterozygosity (He), observed heterozygosity (Ho) and the exact test for Hardy-Weinberg Equilibrium (HWE) were calculated using Arlequin v 3.5 [
      • Excoffier L.
      • Lischer H.E.L.
      Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.
      ].

      3. Results and discussion

      The SE33 sequences of the 87 samples were recovered, 83 of which were within the designated limits (≥10 reads and ≥20% ACR), and the rest of samples (four samples) were recovered manually due lower ACR (<20%). The ACR of heterozygous sequences ranged from 6.5% to 99.4% and showed an average of 58.6%, the four manually typed samples had ACR of 6.5% for alleles (6.3, 31.2), 8.14% for alleles (14,35.2), 12.17% for alleles (13.3,31.2), and 12.8% for alleles (17,34). Among the 87 samples, these samples had the largest size difference between the long and short allele that ranged from 99 bp to 68 bp demonstrating the ACR correlation with the size difference of the heterozygous allele pair.
      The total coverage of the SE33 locus in all samples was 53,956 reads and the average DoC of recovered sequences was 742 reads that ranged from 32 to 2196 reads for alleles 31.2 and 6.3 respectively.
      The number of observed sequence-based alleles was 130% more (69 alleles) comparing to 30 size-based alleles. Most sequence variants (iso-alleles) were observed in x.2 alleles where alleles 27.2 and 30.2 had the highest number of observed sequences (7 sequence variants/allele).
      The SE33 motif patterns of the 69 sequences showed that 66 alleles were within the classification of Borsuk et al. [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ] and most of these alleles (53 alleles), as expected, had an A0 or A1 motif. Two new motif patterns were observed in three alleles that are shown in Table 1. Following on from the earlier study we suggest two new motif IDs (D4 & D5). In addition, seven sequences, which fall within the motif classification, but have not been reported before were observed (Table 1).
      Table 1Motif patterns of the SE33 locus observed in the samples from Saudi Arabia. A total of 66 allele sequences were within motif patterns classified by Borsuk et al. [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ], 53 of which, as expected, had the A0 and A1 motif patterns. Two unreported motif patterns were observed in three alleles and were given D4 and D5 motif IDs. Rows in red indicates novel motifs observed in Saudi population.
      AllelesMotifObs.IDNovelty
      9–22CT [CTTT]3 C [CTTT]n CT [CTTT]3 CT [CTTT]213A0Novel sequence (Allele 9)
      20.2–33.2CT [CTTT]2 CCTT C [CTTT]n TT [CTTT]n CT [CTTT]3 CT [CTTT]240A1Reported in [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ]
      30.2CT [CTTT]2 CCTT C [CTTT]n CT [CTTT]n CT [CTTT]3 CT [CTTT]21A3Reported in [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ]
      34CT [CTTT]2 CCTT C [CTTT]n TT [CTTT]n TT [CTTT]n CT [CTTT]3 CT [CTTT]21A7Novel sequence
      35.2CT [CTTT]2 CCTT C [CTTT]n TT [CTTT]n CT [CTTT]3 CT CTTT1A8Novel sequence
      6.3 & 7.3CT [CTTT]3 [CTTT]n CT [CTTT]3 CT [CTTT]22C2Novel sequence (Allele 7.3)
      13.3CT [CTTT]3 C [CTTT]n C [CTTT]n [CTTT]3 CT [CTTT]21B2Novel sequence
      18CT [CTTT]2 C [CTTT]n CT [CTTT]3 CT [CTTT]21B1Reported in [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ]
      20.2 & 22.2CT [CTTT]3 C [CTTT]n CT [CTTT]n CT [CTTT]3 CT [CTTT]22B3Novel sequence
      26.2CT [CTTT]2 [CCTT]3 C [CTTT]n TT [CTTT]n CT [CTTT]3 CT [CTTT]21B9Reported in [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ]
      28.2CT [CTTT]2 [CCTT]2 C [CTTT]n TT [CTTT]n CT [CTTT]3 CT [CTTT]21A4Reported in [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ]
      27.2CT [CTTT]2 CCTT C [CTTT]n CTGT [CTTT]n TT [CTTT]n CT [CTTT]3 CT [CTTT]21C4Novel sequence
      27.2CT [CTTT]2 CCTT C [CTTT]n TT [CTTT]n CT TTTT [CTTT]2 CT [CTTT]21D4
      The D4 and D5 IDs were suggested to continue the work done by Borsuk et al. [2].
      Novel motif
      29.2 & 30.2CT [CTTT]2 CCTT C [CTTT]n CCTT [CTTT]n TT [CTTT]n CT [CTTT]3 CT [CTTT]22D5
      The D4 and D5 IDs were suggested to continue the work done by Borsuk et al. [2].
      Novel motif
      30.2CT [CTTT]2 C [CTTT]n TT [CTTT]n CT [CTTT]3 CT [CTTT]21B5Reported in [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ]
      a The D4 and D5 IDs were suggested to continue the work done by Borsuk et al. [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ].
      A single discordance was observed, where the sample had 19,31.2 in the sequence-based data while it had 18,31.2 in the size-based data. The allele 19 had CT [CTTT]3 C [CTTT]19 CT [CTTT]3 CT [CTTT]2 (counted part of the repeat region is in bold) suggesting a deletion of four bp within the flanking region. Examination of the FASTQ file of the sample revealed a [TTTT] deletion at 88277355_88277358 (GRCh38) when compared to the reference sequence of the locus. This was further investigated by Sanger sequencing and the deletion was confirmed. The deletion was assigned rs369314007 and was found to be associated with the A0 motif [
      • Borsuk L.A.
      • Gettings K.B.
      • Steffen C.R.
      • Kiesler K.M.
      • Vallone P.M.
      Sequence-based US population data for the SE33 locus.
      ], which is the motif pattern of allele 19.
      The data showed that the heterozygosity was increased from 90.8% (79 heterozygous samples) to 91.9% (80 heterozygous samples), and both data were within the expectations of HWE (P value > 0.05).
      For the population of Saudi Arabia, the sequence data of the SE33 locus showed that the power of discrimination, power of exclusion and the typical paternity index increased from 99.3% to 99.7%, 89.4% to 93%, and from 5.44 to 6.21 respectively. The figures emphasize the value of using SE33 in forensic applications especially with mixture analysis and in paternity testing.

      4. Conclusion

      This study provides sequence‐based Saudi population data for the SE33 locus for the first time. As expected, most sequences showed A0 and A1 motif patterns while three sequence-based alleles were not within the classification. Two new motif patterns are reported, and their motif IDs were suggested as D4 and D5. In addition, seven sequences that fall within the classification but have not been reported before, were reported. The discordance event was resolved by Sanger sequencing that showed the presence of the rs369314007 deletion.

      Declaration of Competing Interest

      None

      Acknowledgements

      We would like to thank Professor Mark A. Jobling and Dr. Jon Wetton (University of Leicester) for allowing us to carry out the lab work and analysis of the samples in Alec Jeffreys Forensic Genomics Unit. The study was funded by the Royal Embassy of Saudi Arabia Cultural Bureau in London (UKSACB) .

      References

        • Wiegand P.
        • Budowle B.
        • Rand S.
        • Brinkmann B.
        Forensic validation of the STR systems SE 33 and TC 11.
        Int. J. Legal Med. 1993; 105: 315-320
        • Borsuk L.A.
        • Gettings K.B.
        • Steffen C.R.
        • Kiesler K.M.
        • Vallone P.M.
        Sequence-based US population data for the SE33 locus.
        Electrophoresis. 2018; 39: 2694-2701
        • Alsafiah H.M.
        • Goodwin W.H.
        • Hadi S.
        • Alshaikhi M.A.
        • Wepeba P.
        Population genetic data for 21 autosomal STR loci for the Saudi Arabian population using the GlobalFiler® PCR amplification kit.
        Forensic Sci. Int. Genet. 2017; 31: e59-e61
        • Woerner A.E.
        • King J.L.
        • Budowle B.
        Fast STR allele identification with STRait razor 3.0.
        Forensic Sci. Int. Genet. 2017; 30: 18-23
        • Alsafiah H.M.
        • Iyengar A.
        • Hadi S.
        • Alshlash W.M.
        • Goodwin W.
        Sequence data of six unusual alleles at SE33 and D1S1656 STR Loci.
        Electrophoresis. 2018; 39: 2471-2476
        • Ruitberg C.
        • Reeder D.
        • Butler J.
        STRBase: a short tandem repeat DNA database for the human identity testing community.
        Nucleic Acids Res. 2001; 29: 320-322
        • Peakall R.
        • Smouse P.E.
        GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research _an update.
        Bioinformatics. 2012; 28: 2537-2539
        • Brenner C.
        • Morris J.
        Paternity index calculations in single locus hypervariable DNA probes: validation and other studies.
        in: Proceedings for the International Symposium on Human Identification, Promega Corporation, Madison, WI1989
        • Excoffier L.
        • Lischer H.E.L.
        Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.
        Mol. Ecol. Resour. 2010; 10: 564-567