Advertisement

The tao of MPS: Common novel variants

Published:October 13, 2017DOI:https://doi.org/10.1016/j.fsigss.2017.09.222

      Abstract

      The introduction of massively parallel sequencing (MPS) to forensic genetics has led to improvements in multiple aspects of DNA analysis, however additional complexities are concurrently associated with these advances. In relation to STR analysis, the move to assign alleles using sequence rather than length based methodologies has highlighted the extent to which previous allelic variation was masked. In this work, a series of samples (n = 1000) from five different population groups (Caucasian, West African, North East African, East Asian and South Asian) were genotyped for 27 forensically validated autosomal STRs. Results were compared to data from the National Institute of Standards and Technology (NIST), with this collaborative project now providing one of the most expansive data sets generated using MPS technology to date. The large number of these variants characterised at select markers brings into question the strategies for producing representative population data, yet also provides an opportunity to utilise this diversity in unique ways. Results from this collaborative study have demonstrated that the number of samples necessary to capture the breadth of allelic variation is highly dependent on the individual marker and the extent of its sequence variability.

      Keywords

      1. Introduction

      The introduction of massively parallel sequencing (MPS) has led to an increased power of discrimination compared to traditional CE-based techniques. This is largely due to the increased number of markers that can be multiplexed, and the ability to use sequence variation to differentiate allele of the same size but differing in sequence [
      • Borsting C.
      • Morling N.
      Next generation sequencing and its applications in forensic genetics.
      ,
      • Gettings K.B.
      • Kiesler K.M.
      • Faith S.A.
      • Montano E.
      • Baker C.H.
      • Young B.A.
      • Guerrieri R.A.
      • Vallone P.M.
      Sequence variation of 22 autosomal STR loci detected by next generation sequencing.
      ,
      • Churchill J.D.
      • Schmedes S.E.
      • King J.L.
      • Budowle B.
      Evaluation of the Illumina((R)) beta version ForenSeq DNA signature prep kit for use in genetic profiling.
      ]. In order for these “novel” sequence variants to be of use for forensic casework, new databases must be generated. Historically, 200 samples from each population group were used to capture the breadth of variation at any given autosomal STR locus, but the increased number of alleles observed using MPS brings into question whether this number is still adequate.
      In this work, sequence-specific population databases were created for five UK population groups. Genotypes were obtained using the Illumina ForenSeq™ DNA Signature Prep Kit (Illumina, San Diego, CA). This report compares results from two contrasted locii with data provided by the National Institute of Standards and Technology (NIST).

      2. Materials and methods

      2.1 Library preparation and sequencing

      The Illumina ForenSeq™ DNA Signature Prep Kit [
      • Illumina
      ForenSeq™ DNA Signature Prep Reference Guide. Document #15049528 v01.
      ] was used to prepare samples (buccal swab extracts from 1000 unrelated individuals) for sequencing on the MiSeq® FGx instrument. Primer mix A was used for the first PCR reaction, which contains primers for identity markers including 27 autosomal STRs. The only protocol modification implemented was to increase the volume of pooled libraries used for sequencing from 7 μl to 12 μl, as this has been shown to yield better results.

      2.2 Data analysis and comparison

      Individual STR allelic sequence variants were characterised using a modified version of STRait Razor 2.0 and in-house Excel-based workbooks [
      • Warshauer D.H.
      • King J.L.
      • Budowle B.
      STRait Razor v2. 0: the improved STR allele identification tool – razor.
      ]. Results were compared to previous results obtained by CE, and to those generated by The ForenSeq™ Universal Analysis Software (UAS) for concordance purposes.
      Sequence variants observed at markers CSF1PO and D12S391 were compared to those described by NIST, and graphs were generated to show allelic diversity against number of alleles sequenced. Caucasian data from NIST was merged with White British data from King’s College to generate data for the White European graph.

      3. Results and discussion

      Results show that the number of alleles observed at certain markers is significantly increased when using sequencing. The highest level of sequence variation was observed at D12S391, where the number of alleles in the White British population group for example increases from 18 (size based) to 53 (sequence based). As shown in Fig. 1, some of the sequence variation observed was population specific, with 10 alleles at D12S391 only being seen in the White British population group.
      Fig. 1
      Fig. 1Increase in the number of alleles seen using sequencing at D12S391 and CSF1PO. Length based (LB) alleles are shown in blue, whilst additional sequence based (SB) alleles are shown in orange. Variants seen only in one population group are further highlighted in green (Unique Variants). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
      At CSF1PO, all variation observed was population specific within this dataset. In order to investigate whether these alleles are indeed population-specific, they were compared with alleles genotyped in the Caucasian, African-American and East Asian population groups by NIST and the University of North Texas (UNT) [
      • Novroski N.M.
      • King J.L.
      • Churchill J.D.
      • Seah L.H.
      • Budowle B.
      Characterization of genetic sequence variation of 58 STR loci in four major population groups.
      ]. Table 1 shows that two sequence-based alleles seen only in one population group were also seen in the corresponding groups sequenced by another research group. This suggests these variants could be specific to these populations, and demonstrates the utility of larger scale databases to capture variation.
      Table 1List of alleles observed at CSF1PO. Boxes coloured in blue show which population groups each allele was observed in. For sequence variants unique to one population group, the box is coloured orange and the number of times that allele was observed in the population within our data is given. One allele was also seen during the comparison with data generated for the African American population at NIST (*), and another was seen in data published by the University of North Texas (UNT) for the East Asian Population (**)
      • Novroski N.M.
      • King J.L.
      • Churchill J.D.
      • Seah L.H.
      • Budowle B.
      Characterization of genetic sequence variation of 58 STR loci in four major population groups.
      .
      To gain an idea of how many alleles must be typed to identify the majority of sequence variants at any given STR locus, samples were randomised and alleles were plotted against novel variants observed. Fig. 2 shows the resulting graph for D12S391, which shows that new variants are still being observed after 200 samples (400 alleles) typed. The addition of alleles observed within the comparable population groups at NIST show that even above 1000 alleles, i.e. 500 samples, novel alleles are still being discovered.
      Fig. 2
      Fig. 2Graph showing the number of individual sequence-based alleles observed at D12S391 compared to the number of alleles sequenced. The yellow box on the White European, West African and East Asian graphs highlight allele 400- after which all additional alleles sequenced are from the NIST data set. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

      4. Conclusion

      In order to capture the breadth of variation at all autosomal STR markers using massively parallel sequencing, a sample size of 200 per population group is insufficient. Larger scale studies are necessary to identify all “common sequence variants”, especially at markers such as D12S391 which show a high level of variation.

      References

        • Borsting C.
        • Morling N.
        Next generation sequencing and its applications in forensic genetics.
        Forensic Sci. Int. Genet. 2015; 18: 78-89
        • Gettings K.B.
        • Kiesler K.M.
        • Faith S.A.
        • Montano E.
        • Baker C.H.
        • Young B.A.
        • Guerrieri R.A.
        • Vallone P.M.
        Sequence variation of 22 autosomal STR loci detected by next generation sequencing.
        Forensic Sci. Int. Genet. 2016; 21: 15-21
        • Churchill J.D.
        • Schmedes S.E.
        • King J.L.
        • Budowle B.
        Evaluation of the Illumina((R)) beta version ForenSeq DNA signature prep kit for use in genetic profiling.
        Forensic Sci. Int. Genet. 2016; 20: 20-29
        • Illumina
        ForenSeq™ DNA Signature Prep Reference Guide. Document #15049528 v01.
        2015
        • Warshauer D.H.
        • King J.L.
        • Budowle B.
        STRait Razor v2. 0: the improved STR allele identification tool – razor.
        Forensic Sci. Int. Genet. 2015; 14: 182-186
        • Novroski N.M.
        • King J.L.
        • Churchill J.D.
        • Seah L.H.
        • Budowle B.
        Characterization of genetic sequence variation of 58 STR loci in four major population groups.
        Forensic Sci. Int. Genet. 2016; 25: 214-226