Advertisement
Research articles| Volume 1, ISSUE 1, P640-642, August 2008

Probabilistic modelling for DNA mixture analysis

      Abstract

      Taking peak area information into account when analysing STR DNA mixtures is acknowledged to be a difficult task. There have been a number of non-probabilistic approaches proposed in the literature, and some have been incorporated into computer systems, but comparatively little has been published from a probabilistic perspective. Here we briefly review our previous work on using Bayesian networks to analyse two-person mixtures within a probabilistic framework, and present preliminary results obtained for analysing two-person and three-person mixtures that combine peak area information from multiple independent samples.

      Keywords

      1. Introduction

      In a recent series of papers [
      • Cowell R.G.
      • Lauritzen S.L.
      • Mortera J.
      MAIES: a tool for DNA mixture analysis.
      ,
      • Cowell R.G.
      • Lauritzen S.L.
      • Mortera J.
      Identification and separation of DNA mixtures using peak area information.
      ,
      • Cowell R.G.
      • Lauritzen S.L.
      • Mortera J.
      A Gamma model for DNA mixture analyses.
      ] we have presented a probabilistic methodology for analysing peak area information from DNA mixtures based on Bayesian networks. A representative fragment of these networks is shown in Fig. 1 for a two-person mixture. This represents peak area information on three alleles, denoted by a, b and c, of some marker system. At the top we have two nodes representing the genotypes of the contributors p1 and p2. On the next layer we have nodes such as n1a that count the number of alleles of type a that person p1 has. These nodes take values in the set {0,1,2}. They depend on the genotypes of the persons, this dependence is represented by the directed arrow from the genotype to the nia nodes. The θ node to the left represents the relative proportions of DNA in the mixture from each contributor prior to PCR amplification, so that the proportion from person pi is θi with θ1+θ2=1. From the θ proportions and the allele count nodes we calculate the mean μa=(θ1n1a+θ2n2a)/2, with similar formula for the mean nodes μb and μc. These are the fraction of alleles of type a, b and c for the marker in the mixture prior to PCR amplification. The bottom layer of nodes represents the peak areas of the individual alleles as measured by the PCR apparatus after amplification of the mixture sample. We model the stochastic variations in these areas by Gamma distributions, where the Gamma distribution of the area for allele a depends on the mean μa and has expectation proportional to μa; similarly for alleles b and c. For further details of the Gamma model and Bayesian networks, and how the probability calculations are performed, see [
      • Cowell R.G.
      • Lauritzen S.L.
      • Mortera J.
      MAIES: a tool for DNA mixture analysis.
      ,
      • Cowell R.G.
      • Lauritzen S.L.
      • Mortera J.
      Identification and separation of DNA mixtures using peak area information.
      ,
      • Cowell R.G.
      • Lauritzen S.L.
      • Mortera J.
      A Gamma model for DNA mixture analyses.
      ].
      Figure thumbnail gr1
      Fig. 1Bayesian network fragment for modelling peak areas in a mixture.

      2. Results

      In our previous papers [
      • Cowell R.G.
      • Lauritzen S.L.
      • Mortera J.
      MAIES: a tool for DNA mixture analysis.
      ,
      • Cowell R.G.
      • Lauritzen S.L.
      • Mortera J.
      Identification and separation of DNA mixtures using peak area information.
      ,
      • Cowell R.G.
      • Lauritzen S.L.
      • Mortera J.
      A Gamma model for DNA mixture analyses.
      ] we have analysed peak-area data on two-person mixtures taken from a variety of publications. Here we illustrate the power of our methodology for combining peak area information from two independent samples that each have the same contributors.
      In our first example there are two individuals. Two mixtures were prepared in a laboratory, with each mixture having approximately the same amount of DNA from each person. We separated each mixture individually, and also separated the pair of mixtures together.
      With these proportions it should not be possible to separate the mixtures. That we are able to do so indicates that the effective fraction from each contributor was not exactly one half.
      Our results are shown in Table 1. Using only the first mixture, the genotypes of both contributors are correctly identified on all markers. Using only the second mixture the profiles on two markers were not identified correctly (as indicated by italics). When combining the two traces both profiles were correctly identified on all markers, with probabilities increased on all but one marker profiles. Note especially the increase in probabilities in the profiles for markers D3 and D19, which were incorrectly identified when analysing the second mixture by itself.
      Table 1Profile separation of a pair of two-person mixtures
      MarkerFirst trace only (correct all markers)Second trace only (correct 9 out of 11 markers)Both traces combined (correct all markers)
      Amelogenin0.66680.63920.7772
      D20.45820.38380.6956
      D30.81520.48540.8531
      D80.64710.48310.7357
      D160.60780.75340.7877
      D180.40950.35740.6872
      D190.49940.29280.6605
      D210.74800.74850.8592
      FGA0.67270.60580.7701
      TH01111
      VWA0.35290.76560.7457
      Each mixture was prepared in 1:1 ratio. They were analysed both individually, and also together assuming common contributors. Posterior probabilities shown are for the correct profile, with incorrect identifications italicized.
      In our second example, we consider three-person mixtures. We analyse two laboratory prepared mixtures of differing proportions, using the known profile of one of the contributors. Our results are shown in Table 2. Incorrect classifications are shown in italics. Using only the first mixture, only 3 of the 14 markers were correctly identified, whilst using the second mixture by itself only 3 marker profiles were incorrectly identified, these having low probabilities. However, when using both markers together all marker profiles are correctly identified. Note in particular the increase in probabilities for the profiles on markers D5, D16, and TH01, none of which were correctly identified with a single mixture analysis.
      Table 2Profile separation of two three-person mixtures, each mixture taken separately and then together assuming common contributors, using the profile of one contributor in all three separations
      MarkerFirst trace only 1:1:1 (correct 3 out of 14 markers)Second trace only 1:5:2 (correct 11 out of 14 markers)Both traces combined (correct all markers)
      CSF0.1451.0001.000
      D20.1781.0001.000
      D30.2850.7680.987
      D50.4320.1900.883
      D70.1790.9300.975
      D80.2700.7390.776
      D160.1710.2990.967
      D180.1260.9990.999
      D190.3600.9271.000
      D210.1540.9970.997
      FGA0.4000.8921.000
      TH010.0090.2120.529
      TPOX0.4960.5250.985
      VWA0.1790.9850.982
      Posterior probabilities shown are for the correct profile, with incorrect identifications italicized.

      3. Summary

      We have presented preliminary results from applying a simple probabilistic model-based approach for mixture peak area values, for what we believe is a novel example of combining peak area information from independent mixture samples that have DNA from the same set of contributors in order to enhance the profile separation. Our results show the power and flexibility of the Bayesian network approach. We intend to expand on our findings elsewhere. In addition, the same approach can deal with stutter peaks, and also possible kinship relationships between contributors to mixtures: again we intend to publish more details on the additional possibilities elsewhere. In our previous publications we have also shown how the same methodology may be used to find likelihoods and likelihood ratios of hypotheses concerning the contributors to a mixture.
      In the future we intend to fine tune the parameters in our model for better performance, and to analyse data with stutter peaks. We also intend to develop methods to take into account the possibility of dropout.

      Conflict of interest

      None.

      Acknowledgements

      We would like to thank the UK Forensic Science Service for providing the data on two-person mixtures analysed in Table 1. We would also like to thank G. Lago of the Raggruppamento Carabinieri Investigazioni Scientifiche, Rome, Italy, for providing the data on three-person mixtures analysed in Table 2.

      References

        • Cowell R.G.
        • Lauritzen S.L.
        • Mortera J.
        MAIES: a tool for DNA mixture analysis.
        in: Dechter R. Richardson T. Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI 2006). 2006: 90-97
        • Cowell R.G.
        • Lauritzen S.L.
        • Mortera J.
        Identification and separation of DNA mixtures using peak area information.
        Forensic Sci. Int. 2007; 166: 28-34
        • Cowell R.G.
        • Lauritzen S.L.
        • Mortera J.
        A Gamma model for DNA mixture analyses.
        Bayesian Anal. 2007; 2: 333-348