Advertisement
Research article| Volume 1, ISSUE 1, P652-653, August 2008

Informativeness of genetic markers for forensic inference––An information theoretic approach

      Abstract

      Forensic inference from genetic markers uses highly polymorphic multi-locus genotypes. Measures of informativeness can aid in selecting efficient genetic markers. Existing measures do not account for multiple sources of genetic variation (i.e. mutation, silent alleles, etc.) and they are not directly applicable to complex identification problems. Using information theoretic principles within a probabilistic expert system (PES) we define a general measure of informativeness, Iq, of a marker for answering a forensic query. Iq gives a slightly different ranking of most genetic markers as its comparable measures. Accounting for sources of variation such as mutation, silent and null alleles reduces Iq and may further affect ranking. This criterion has a solid theoretical basis and can account for multiple sources of genetic variation and other anomalies. It can be directly applied to a variety of planning issues concerning the type, quantity and specific choice of markers for use in paternity testing and more general forensic problems.

      Keywords

      1. Introduction and background

      Forensic inference from genetic markers is needed in a variety of identification problems including paternity testing, natural disasters, criminal investigations, immigration, etc. Polymorphic multi-locus genotypes and population allele frequencies used in the inference process are often complicated by population genetic factors such as mutation, co-ancestry, etc.
      Highly informative genetic markers can reduce the amount of genotyping required. Measures of informativeness can aid in selecting efficient genetic markers for forensic inference; hence, it is desirable to measure the extent to which specific markers contribute to the forensic inference of interest. Existing measures, such as heterozygosity (h) [
      • Nei Matatoshi
      • Roychoudhury A.K.
      Sampling variances of heterozygosity and genetic distance.
      ], polymorphism information content (PIC) [
      • Bostein D.
      • White R.L.
      • Skolnick M.
      • Davis R.L.
      Construction of a genetic linkage map in man using restriction fragment length polymorphisms.
      ], power of discrimination (PD) [
      • Jones D.A.
      Blood samples: Probability of discrimination.
      ], and power of exclusion (PE) [
      • Garber R.A.
      • Morris J.W.
      General equations for the average power of exclusion for genetic systems of n codominant alleles in one-parent cases of disputed parentage.
      ] are primarily based on polymorphism, and despite their various features, they do not account for multiple sources of genetic variation (i.e. mutation, silent alleles, etc.), nor are they designed specifically for measuring information content, nor are they directly applicable to more complex identification problems.
      Using information theoretic concepts and a decision-theoretic framework within a probabilistic expert system (PES), we define a general measure of informativeness, Iq, of a marker for a forensic query which can be used universally in a variety of forensic problems.

      2. Methods

      Consider a PES formulation for a paternity identification problem [
      • Philip Dawid A.
      • Mortera Julia
      • Pascali Vincenzo L.
      • van Boxel Daniel W.
      Probabilistic expert systems for forensic inference from genetic markers.
      ]. The query of interest is whether the true father is the putative father or a man drawn randomly from the population.
      Fig. 1 shows an object-oriented PES [
      • Philip Dawid A.
      • Mortera Julia
      • Vicard Paola
      Object-oriented Bayesian networks for complex forensic DNA profiling problems.
      ] for the paternity identification problem based on a single marker, with the query Q being represented by the node tf = pf? Using the PES to calculate the likelihood ratio for each marker and multiplying these to form a joint likelihood ratio resolves the paternity identification problem.
      Figure thumbnail gr1
      Fig. 1The overall PES representation of a paternity identification problem.
      In determining which genetic markers contribute to the inference of paternity, we define the informativeness Iq for this scenario as
      Iq=H(Q)H(Q|PFGT,CGT,MGT)=I(Q;PFGT,CGT,MGT),


      where H(X) denotes the entropy of the distribution of X
      H(X)=xpxlogpx,


      a measure of the total uncertainty of the distribution. The quantity Iq measures the reduction in uncertainty regarding Q due to observation of the genotypes of the associated individuals.
      The quantity I(Q; PFGT, CGT, MGT) is also known as the mutual information between Q and (PFGT, CGT, MGT). For further details, the reader is referred to [
      • Cover Thomas M.
      • Thomas Joy A.
      Elements of Information Theory.
      ], chapter 2.
      The concept of mutual information is very well established and understood and has a solid general foundation. It can be applied universally to any forensic query Q and any collection of evidence E1,…, Ek and can therefore be used for planning purposes in a multitude of scenarios, in particular it is valid also when mutation is incorporated into the PES.
      Once a PES has been established for the forensic problem in question, the mutual information can be calculated by standard PES methods [
      • Lauritzen S.L.
      • Spiegelhalter D.J.
      Local computations with probabilities on graphical structures and their application to expert systems (with discussion).
      ] using software such as, for example, the HUGIN API (http://www.hugin.com).
      The example below is based on gene frequencies and mutation rates as given in [
      • Philip Dawid A.
      • Mortera Julia
      • Pascali Vincenzo L.
      • van Boxel Daniel W.
      Probabilistic expert systems for forensic inference from genetic markers.
      ]. Mutation is incorporated as a proportional mutation model, for the sake of simplicity.

      3. Results

      For illustration, we consider the prior planning problem for paternity identification, i.e. the scenario where no genetic information is yet available for a triplet consisting of mother, child, and putative father, and informativeness of markers must be compared. The last column gives the informativeness Iq* when mutation is incorporated.
      The results in Table 1 are rank ordered according to Iq. Traditional measures of informativeness identify THO1 as the most informative marker and FES as the least informative, whereas Iq ranks D1S80 highest, although the differences are small for all measures. Traditional measures give essentially identical rankings to all markers, with the exception of PE which switches the order of D1S80 and D21S11. Taking mutation into account slightly reduces informativeness.
      Table 1Informativeness of genetic markers for paternity
      MarkerhPICPDPEIqIq*
      D1S800.78520.76380.93250.57180.32610.3101
      APO-B0.78420.76050.92970.57000.31500.3005
      vWA0.80480.77770.93480.60810.30470.2965
      D21S110.79610.76740.92980.59180.29860.2886
      TH010.80750.77900.93450.61300.29830.2906
      COL2A10.78050.74770.91900.56330.28570.2744
      F130.77920.74520.91720.56110.28190.2683
      MBP0.72120.67420.87530.46180.21790.2115
      FES0.69870.64260.85310.42630.20600.1990

      4. Conclusion

      The suggested measure, Iq, has a solid theoretical basis and gives similar rankings of forensic genetic markers as existing measures. It is applicable to any number of alleles and can account for multiple sources of genetic variation and other anomalies. It can be directly applied to a variety of planning issues concerning the type, quantity and choice of markers for use in paternity testing and more general forensic problems.

      Conflict of interest

      None.

      References

        • Nei Matatoshi
        • Roychoudhury A.K.
        Sampling variances of heterozygosity and genetic distance.
        Genetics. 1974; 76: 379-390
        • Bostein D.
        • White R.L.
        • Skolnick M.
        • Davis R.L.
        Construction of a genetic linkage map in man using restriction fragment length polymorphisms.
        Am. J. Hum. Genet. 1980; 32: 314-331
        • Jones D.A.
        Blood samples: Probability of discrimination.
        J. Forensic Sci. Soc. 1972; 12: 355-359
        • Garber R.A.
        • Morris J.W.
        General equations for the average power of exclusion for genetic systems of n codominant alleles in one-parent cases of disputed parentage.
        in: Walker R.H. Inclusion Probabilities in Parentage Testing. American Association of Blood Banks, Arlington, VA1983: 277-280
        • Philip Dawid A.
        • Mortera Julia
        • Pascali Vincenzo L.
        • van Boxel Daniel W.
        Probabilistic expert systems for forensic inference from genetic markers.
        Scan dinavian J. Stat. 2002; 29: 577-595
        • Philip Dawid A.
        • Mortera Julia
        • Vicard Paola
        Object-oriented Bayesian networks for complex forensic DNA profiling problems.
        Forensic Sci. Int. 2007; 169: 195-205
        • Cover Thomas M.
        • Thomas Joy A.
        Elements of Information Theory.
        John Wiley and Sons, New York1991
        • Lauritzen S.L.
        • Spiegelhalter D.J.
        Local computations with probabilities on graphical structures and their application to expert systems (with discussion).
        J. R. Stat. Soc., Ser. B. 1988; 50: 157-224