Advertisement

Introducing eNoC – A simple, excel-based tool for improved assignment of the number of contributors (NoC) to a mixture

Published:September 27, 2022DOI:https://doi.org/10.1016/j.fsigss.2022.09.016

      Abstract

      Assigning NoC in a mixed STR profile is an important preliminary step in computing a likelihood ratio (LR). A common metric is maximum allele count (MAC) whereby the locus exhibiting the largest number of alleles is used to set the NOC. This metric can be supplemented by considering total allele count (TAC) and locus allele count (LAC). TAC is the total number of alleles across all loci and is compared with probability distributions generated in silico. LAC works similarly, save that the probability distributions are generated at the locus level. Herein, we present a comparative analysis of these three metrics using a dataset of 10,000 of each of 2–7 person simulated ground truth mixtures. These datasets were used to generate parameter distributions for each NoC. This analysis showed LAC to be the most accurate single metric in all circumstances tested. We have developmentally validated an excel-based tool to automate calculations for use by operational caseworkers.

      Keywords

      1. Materials and method

      1.1 Definitions

      The effectiveness of LAC, MAC and TAC to assign NoC were examined. LAC is the number of practitioner-designated alleles at an autosomal locus in the mixture. MAC is the highest LAC. TAC is the sum of all available LACs. Minimum TAC is n (where n is the number of autosomal loci in the kit of interest – in this study 16) and maximum TAC is 2 nm (where m is the NOC). Probability distributions of TAC were generated by simulation (Fig. 1). For any unknown mixture, the observed TAC (based on the alleles designated by the practitioner) is compared to the simulated probability distributions to estimate NoC. Similarly, simulated probability distributions of LAC can also be produced. These distributions differ between autosomal loci based on the number of alleles at the locus and their corresponding allele frequencies. For mixtures of m persons, the LAC probabilities for each locus (for any given m) are multiplied to give an overall likelihood under different m values. In the ‘combined approach’, MAC is used to set a lower bound for the NOC estimated using TAC.
      Fig. 1
      Fig. 1Probability distributions for different numbers of contributor scenarios for the total number of alleles (TAC) observed within a profile.

      1.2 Simulation experiments to determine probability distributions

      Using published Caucasian allele frequencies [

      NDNAD, Data to support the implementation of National DNA Database – GOV.UK, 2019. [Online] 03 29, 2019. 〈https://www.gov.uk/government/statistics/dna-population-data-to-support-the647〉.

      ,
      • Steele D.
      • Syndecombe-Court D.
      • Balding D.J.
      Worldwide FST estimates relative to five continental-scale populations.
      ], 15,000 random full ESI-17 profiles were generated in silico. From this pool, individual profiles were selected randomly and combined to create simulated mixed profiles. A total of 10,000 of each of 2–7p ground truth mixtures was generated in silico. Any mixed profile in which the same individual profile had been included more than once was replaced. Each mixture was rationalised recording each observed allele only once. For each simulated mixture, TAC/LAC values were calculated and distributions displayed as probability mass functions for each NoC (m = 1–7).

      2. Results

      2.1 Comparison of alternative metrics/combinations of metrics to estimate NOC

      Performance of four different configurations of the MAC, LAC and TAC metrics were assessed against ground truth (Fig. 2). The data reveal deterioration of the MAC metric used alone and superior performance of LAC with increasing NoC. Best performance was obtained when LAC was used (yellow bars).
      Fig. 2
      Fig. 2Sensitivity, specificity, accuracy and precision analysis of four NoC methods analysed as binary data sets (i.e. outcomes are True [correct NoC assigned] or False [incorrect NoC assigned]). Red: MAC; Orange: TAC; Yellow: LAC; Green: Rounded Average (mean value of MAC/TAC2/LAC outcomes rounded to nearest integer).

      2.2 Limitations

      With ‘real’ mixtures, allele counts at a locus may be artificially increased if stutters (or other artefacts) are misinterpreted as alleles and/or when drop in events occur. Similarly, allele counts at a locus can be artificially decreased by stochastic variation (resulting in allelic drop out) and/or misinterpretation of minor alleles as stutter. None of these effects are captured by the tool. However, it is possible to take some account of complete locus drop-out and to assess the impact of this on accuracy of NoC assignment. In our tool, “0” can be entered for any locus without peaks (locus drop-out) or where allelic drop-out is considered likely. Loci with 0 allele counts are assigned probability 1 and become neutral in calculations. The impact on assigning NOC in simulated mixtures with varying degrees of locus drop-out was tested. This revealed that LAC was most robust when locus drop out was present (data not shown).

      2.3 Introducing eNOC

      The printable front GUI of the eNOC tool is shown (Fig. 3). The tool is excel-based and features a field for manual entry of LAC values, a TAC plot plus a dynamic, graphical display of NoC estimates. The tool was developmentally validated and accredited to ISO17025.

      References

      1. NDNAD, Data to support the implementation of National DNA Database – GOV.UK, 2019. [Online] 03 29, 2019. 〈https://www.gov.uk/government/statistics/dna-population-data-to-support-the647〉.

        • Steele D.
        • Syndecombe-Court D.
        • Balding D.J.
        Worldwide FST estimates relative to five continental-scale populations.
        Ann. Hum. Genet. 2014; 78: 468-477