Advertisement

relMix: An open source software for DNA mixtures with related contributors

Open AccessPublished:October 17, 2019DOI:https://doi.org/10.1016/j.fsigss.2019.09.085

      Abstract

      In both criminal cases and relationship inference there is an increasing demand for analysis of DNA mixtures where relatives are involved. The goal might be to identify the contributors to a mixture where the donors may or may not be related, or to determine the relationship between individuals based on a mixture. relMix is an open source software for analysing DNA mixtures involving relatives, available as a graphical user interface in R. We explain the model behind relMix and give an overview of the new features (including improved checking of input) in the latest version.

      Keywords

      Introduction

      In both criminal cases and relationship inference there is an increasing demand for analysis of DNA mixtures where relatives are involved. One example is prenatal paternity cases based on a mother-fetus mixture and reference samples from the mother, the alleged father, but obviously not the child. In crime cases one may encounter stains where two or more contributors are related. relMix is an open source software for analysing DNA mixtures involving relatives, available from https://CRAN.R-project.org/package=relMix as a graphical user interface in R. Compared to commonly used mixture software, relMix can account for arbitrary kinship between more than two contributors in addition to mutations and silent alleles.

      Motivating example

      Investigators want to determine the father of an unborn child where the candidates are brothers. Available evidence consists of DNA reference samples from the mother, brother 1 and brother 2. In addition, a sample from the mother contains a mixture between her DNA and the DNA of her unborn child. Based on this we formulated
      H1:Brother1isthefatherH2:Brother2isthefather


      as shown in Figure 1. For the discussion we also included
      H3:Anunrelatedmanisthefather.


      Fig. 1
      Fig. 1Pedigrees and excerpt from the result table of the motivating example.
      For this case we consider an equal mutation model with mutation probabilities 0.001 and 0.003 for females and males, respectively. The dropout probabilities were 0.05 for the child and 0 for the mother.
      The evidence will be summarised by the likelihood ratios
      LR1=P(dataH1)P(dataH2)LR2=P(dataH1)P(dataH3)


      Consider the table in Figure 1. The first row is consistent with all hypotheses. For D19S433 a mutation or a dropout is needed for H1 but not for H2. D21S11 is consistent with H1 but a mutation is needed for H2. The final line shows clear evidence, but by most standards not conclusive, in favor of H1. It is correct to report LR1 = 1380 since we were asked to compare brother 1 to brother 2. If we inappropriately compared brother 1 to an unrelated man, we would get an LR that overestimates the evidence.

      Program input

      relMix works with tab-separated files to import DNA and allele frequency data. These can be exported from DNA profiling or spreadsheet software. Pedigrees for paternity cases are included with the program while other arbitrarily complex pedigrees can be loaded using the Familias (https://www.familias.name/openfamilias.html) format. Finally, parameters describing mutation, drop-in, drop-out, silent alleles, and population substructure (θ) are entered manually through a user friendly interface as shown in Figure 2.
      Fig. 2
      Fig. 2Examples from the relMix user interface, including the new input validation system.

      New in relMix version 1.3

      relMix now checks for common mistakes such as marker name inconsistencies, duplicate markers, invalid file formats and more. In particular, manually typing reference data or manipulating frequency databases can lead to subtle errors that previously resulted in wrong calculations or programme termination (e.g., TPOX vs TP0X). In addition, reuse of frequency databases coming from other programmes can lead to problems if marker naming is not consistent. We introduce specific checks for these kinds of errors which are largely based on computing the Levenshtein distance [

      Levenshtein distance. https://en.wikipedia.org/wiki/ Levenshtein_distance.

      ] between identifiers to find those that are suspiciously similar. The Levenshtein distance counts the minimum number of edits (substitutions, insertions or deletions) required to go from one string of text to another. Detected errors are presented to the user with an explanation and automatically fixed if possible. We found that setting a threshold of 2 for automatic correction of inconsistencies was beneficial because it also allows for transpositions in addition to the previous edits which are a common typing error (e.g., Plasma vs. Palsma).
      Development of this last version was done in GitHub, a platform that enables efficient collaboration between different authors in a project. In addition, all changes made to the codebase and the codebase itself are public, allowing for greater transparency and encouraging collaboration with other external developers. Towards end users, GitHub provides a mechanism for bug reporting and contacting the authors in which the questions and answers posted remain public and searchable for the benefit of the community. The adoption of this new workflow and development methodology is an important step for open/free software.

      Discussion

      The case presented demonstrates that relMix can deal with complex cases of practical significance. The importance of modelling relationships and mutations, is clearly demonstrated. LRmixStudio (https://lrmixstudio.org) is based on a model similar to the one we use. This software includes important functionality not available in relMix, but only simple pairwise relationships. Alternative software like EuroForMix (http://www.euroformix.com) is based on continuous models. Peak height information, which may or may not be important as discussed in [
      • Slooten K.
      The information gain from peak height data in DNA mixtures.
      ], is therefore accounted for. Alternative models and implementations based on Bayesian networks are exemplified in [
      • Green P.J.
      • Mortera J.
      Paternity testing and other inference about relationships from DNA mixtures.
      ].

      The model

      We adopt the mixture model described in [
      • Dørum G.
      • Kaur N.
      • Gysi M.
      Pedigree-based relationship inference from complex DNA mixtures.
      ] and [
      • Haned H.
      • Slooten K.
      • Gill P.
      Exploratory data analysis for the interpretation of low template DNA mixtures.
      ]. The model accounts for dropout and drop-in, but not peak heights. For a given locus, the probability that allele a will not appear or will appear in the mixture M, respectively, is found as
      P(aMg,d,c)=(1cpa)idini,b,P(aMg,d,c)=1(1cpa)idini,a,


      where
      g=genotypesofallcontributorsd=dropoutprobabilitiesforallcontributorsdi=dropoutprobabilityforcontributoricpa=probabilitythatawilldropinni,a=numberoftimesaisobservedincontributori


      The probability of observing a set M of mixture alleles is thus
      P(M=Mg,d,c)=aMP(aMg,d,c)·aMP(aMg,d,c).


      Finally, the probability of the evidence E conditioned on hypothesis Hj is found by combining the probability of the mixture with the probability of the kinship as
      P(EHj)=uUP(M=MgK,gU=u,d,c)·P(gA,gK,gU=uHj),


      where
      gK=GenotypesofknowncontributorsgU=GenotypesofunknowncontributorsgA=   GenotypesofadditionalgenotypedindividualsU=Setofpossiblegenotypesfortheunknown   contributor(s)


      Calculations are based on the R version of Familias.

      References

      1. Levenshtein distance. https://en.wikipedia.org/wiki/ Levenshtein_distance.

        • Slooten K.
        The information gain from peak height data in DNA mixtures.
        Forensic Sci. Int. Genet. 2018; 36: 119-123
        • Green P.J.
        • Mortera J.
        Paternity testing and other inference about relationships from DNA mixtures.
        Forensic Sci. Int. Genet. 2017; 28: 128-137
        • Dørum G.
        • Kaur N.
        • Gysi M.
        Pedigree-based relationship inference from complex DNA mixtures.
        Int. J. Legal Med. 2017; 131: 629-641
        • Haned H.
        • Slooten K.
        • Gill P.
        Exploratory data analysis for the interpretation of low template DNA mixtures.
        Forensic Sci. Int. Genet. 2012; 6: 762-774