Advertisement

High throughput sequencing data analysis workflow: mtDNA variant detection and identification of STR/Y-STR alleles and iso-alleles

Published:October 18, 2019DOI:https://doi.org/10.1016/j.fsigss.2019.10.121

      Abstract

      High throughput sequencing of mtDNA and STRs enable forensic laboratories to have the benefits of both analysis methods at the same time. HTS chemistries are more cost effective than Sanger sequencing for the mitochondrial genome and produce data at a greater depth of coverage allowing for detection of low level heteroplasmy [
      • Holland M.M.
      • Pack E.D.
      • McElhoe J.A.
      Evaluation of GeneMarker HTS for improved alignment of mtDNA MPS data, haplotype determination, and heteroplasmy assessment.
      ,
      • Holland M.M.
      • Bonds R.M.
      • Holland C.A.
      • McElhoe J.A.
      Recovery of mtDNA from unfired metallic ammunition components with an assessment of sequence profile quality and DNA damage through MPS analysis.
      ,
      • Riman S.
      • Kiesler K.M.
      • Borsuk L.A.
      • Vallone P.M.
      Characterization of NIST human mitochondrial DNA SRM-2392 and SRM-2392-I standard reference materials by next generation sequencing.
      ]. Advantages of HTS STR chemistries over traditional CE include the ability to have smaller amplicons, analyze more loci in each reaction, and the identification of sequence polymorphisms that could, once iso-allele frequencies are available, potentially be processed in mixture software, such as MaSTR™, for deconvolution and LR calculation. Rigorous, user-friendly software is needed in order to analyze the large data files consisting of thousands to millions of reads generated for each sample [

      Vohr, S.H., R. Gordon, J.M. Eizenga, H.A. Erlich, C.D. Calloway, R.E. Green. A phylogenetic approach for haplotype analysis of sequence data from complex mitochondrial mixtures. Forensic Science International: Genetics. 30 93–105.

      ].
      GeneMarker®HTS is a rapid, user-friendly, software for analysis of forensic mtDNA, autosomal and Y-STR high throughput sequencing data. National Institute of Standards and Technology (NIST), in conjunction with Promega corporation, generously supplied fastq sequence files and corresponding CE allele calls for 672 samples amplified with the PowerSeq® Auto/Y System and analyzed on an Illumina® MiSeq. Results of these data analyzed in GeneMarkerHTS software were highly concordant with the CE allele calls. Summary of the STR allele calls concordance and examples of alleles exhibiting sequence variation will be presented. Additionally, a review of the mtDNA genome forensic alignment, heteroplasmy report, import of major variant profile to EMPOP, sample comparison, and database options will be presented.

      Keywords

      1. HTS STR analysis

      High-throughput sequencing data for forensic applications can be analyzed by selecting a built-in panel or by loading a custom panel. Primer sequences are used to sort and trim the input reads, which are, when using paired-end data, merged with overlapping sequences to help correct any alignment errors. The STR or specific sequence (Amelogenin, SNPs) is identified using regular expressions (regex strings). This matching process is used to name the allele sequences that were found, and filters low-frequency sequences caused by sequencing errors Fig. 1.
      Fig. 1
      Fig. 1An example of an iso-allele. High-throughput sequencing provides additional information that cannot be determined from the traditional CE data.
      National Institute of Standards and Technology (NIST), in conjunction with Promega corporation, generously supplied the fastq sequence files and the corresponding CE allele calls for 672 samples amplified with the PowerSeq® Auto/Y System and analyzed on an Illumina® MiSeq. GeneMarker HTS software results were 99.74% concordant with the CE allele calls of 20,000 sampled loci.
      High-throughput sequencing can reveal additional information that is not available from the traditional CE data [

      Butler, J., Reeder, D., Short Tandem Repeat DNA Internet DataBase (STRBase), http://strbase.nist.gov/.

      ]. Isoalleles are loci that appear homozygous in length, but are heterozygous by sequence. High-throughput sequencing reports the percentage of sequences for a given allele and any sequence variants. This depth of information has applications in identification of individuals in single source samples and the potential for improved deconvolution of mixtures.

      2. HTS mtDNA analysis

      Alignment is traditionally focused on minimizing the number of differences between the read and the reference. This isn’t optimal in forensic analysis due to an established convention for the naming of common variants. The unique motif alignment provides recognition and proper assignment of SNPs and Indels consistent with forensic considerations and automates the motif recommendations of the DNA Commission of the International Society for Forensic Genetics [
      • Parson W.
      • Gusmão L.
      • Hares D.R.
      • et al.
      DNA commission of the international society for forensic genetics: revised and extended guidelines for mitochondrial DNA typing.
      ]. GeneMarker®HTS software is pre-loaded with the rCRS and will align across the origin or in the HV1/HV2 regions.
      GeneMarkerHTS outputs a variety of reports to a user specified location. Two of the reports output by the program are the major and minor variant reports. The major variant report contains the alleles with the highest frequency at each position, which can be directly copied and pasted into the EMPOP database. The minor variant, or heteroplasmy, report contains any alleles with a frequency that is less than the allele with the highest frequency at a given position.
      After reviewing the individual samples, the Comparison Viewer displays a direct comparison of all the samples in the project. The window is broken into two tables. The sample to sample comparison table shows all samples in the project, comparing major alleles, minor alleles, or both. The variant comparison table shows the allele frequency of all variants called in at least one sample. The variants are colored according to whether the variant was a major allele, minor allele, or if the total coverage was below the set threshold.
      GeneMarkerHTS has a database component that houses the User Management function and stores samples, projects, and reports. User Management allows access rights to be assigned to users by a lab administrator. These profiles control access of settings and features in the software as well as the projects that can be opened from the database.

      3. Conclusion

      GeneMarker HTS software results proved to be 99.74% concordant with CE allele calls of 20,000 sampled loci. High-throughput sequencing can reveal additional sequence information that is not available from traditional CE data which can be extremely beneficial in forensic applications such as mixture deconvolution and identification of individuals.
      Chemistries for mtDNA and STR amplification for HTS platforms enable the laboratory to have the benefits of both mtDNA and STR analysis at the same time. GeneMarker®HTS software provides a streamlined workflow for forensic mitochondrial and STR DNA data analysis from all major High Throughput Sequencing (HTS) systems and chemistries.

      Acknowledgements

      We would like to sincerely thank Dr. Peter Vallone at National Institute of Standards and Technology (NIST) for generously supplying data to complete the CE and HTS STR/Y-STR results concordance study, Promega Corporation (Madison, WI, USA) for providing Autosomal and Y-STR data, and Drs. Mitchell Holland and Jennifer McElhoe at Penn State University for their comments/suggestions during the mitochondrial analysis development.

      References

        • Holland M.M.
        • Pack E.D.
        • McElhoe J.A.
        Evaluation of GeneMarker HTS for improved alignment of mtDNA MPS data, haplotype determination, and heteroplasmy assessment.
        Forensic Sci. Int. Genet. 2017; 28: 90-98
        • Holland M.M.
        • Bonds R.M.
        • Holland C.A.
        • McElhoe J.A.
        Recovery of mtDNA from unfired metallic ammunition components with an assessment of sequence profile quality and DNA damage through MPS analysis.
        Forensic Sci. Int. Genet. 2019; 39: 86-96
        • Parson W.
        • Gusmão L.
        • Hares D.R.
        • et al.
        DNA commission of the international society for forensic genetics: revised and extended guidelines for mitochondrial DNA typing.
        Forensic Sci. Int. Genet. 2014; 13: 134-142
        • Riman S.
        • Kiesler K.M.
        • Borsuk L.A.
        • Vallone P.M.
        Characterization of NIST human mitochondrial DNA SRM-2392 and SRM-2392-I standard reference materials by next generation sequencing.
        Forensic Sci. Int. Genet. 2017; 29: 181-192
      1. Vohr, S.H., R. Gordon, J.M. Eizenga, H.A. Erlich, C.D. Calloway, R.E. Green. A phylogenetic approach for haplotype analysis of sequence data from complex mitochondrial mixtures. Forensic Science International: Genetics. 30 93–105.

      2. Butler, J., Reeder, D., Short Tandem Repeat DNA Internet DataBase (STRBase), http://strbase.nist.gov/.