## Abstract

In both criminal cases and relationship inference there is an increasing demand for analysis of DNA mixtures where relatives are involved. The goal might be to identify the contributors to a mixture where the donors may or may not be related, or to determine the relationship between individuals based on a mixture. relMix is an open source software for analysing DNA mixtures involving relatives, available as a graphical user interface in R. We explain the model behind relMix and give an overview of the new features (including improved checking of input) in the latest version.

## Keywords

## Introduction

In both criminal cases and relationship inference there is an increasing demand for analysis of DNA mixtures where relatives are involved. One example is prenatal paternity cases based on a mother-fetus mixture and reference samples from the mother, the alleged father, but obviously not the child. In crime cases one may encounter stains where two or more contributors are related. relMix is an open source software for analysing DNA mixtures involving relatives, available from https://CRAN.R-project.org/package=relMix as a graphical user interface in R. Compared to commonly used mixture software, relMix can account for arbitrary kinship between more than two contributors in addition to mutations and silent alleles.

## Motivating example

Investigators want to determine the father of an unborn child where the candidates are brothers. Available evidence consists of DNA reference samples from the mother, brother 1 and brother 2. In addition, a sample from the mother contains a mixture between her DNA and the DNA of her unborn child. Based on this we formulated

as shown in Figure 1. For the discussion we also included

$\begin{array}{cc}{H}_{1}& :\mathrm{Brother}1\mathrm{is}\mathrm{the}\mathrm{father}\\ {H}_{2}& :\mathrm{Brother}2\mathrm{is}\mathrm{the}\mathrm{father}\end{array}$

as shown in Figure 1. For the discussion we also included

$\begin{array}{cc}{H}_{3}:\mathrm{An}\mathrm{unrelated}\mathrm{man}\mathrm{is}\mathrm{the}\mathrm{father}.& \end{array}$

For this case we consider an equal mutation model with mutation probabilities 0.001 and 0.003 for females and males, respectively. The dropout probabilities were 0.05 for the child and 0 for the mother.

The evidence will be summarised by the likelihood ratios

$\begin{array}{cc}\hfill {\mathrm{LR}}_{1}=\frac{P(\mathrm{data}\phantom{\rule{thinmathspace}{0ex}}\mid {H}_{1})}{P(\mathrm{data}\phantom{\rule{thinmathspace}{0ex}}\mid {H}_{2})}\phantom{\rule{2em}{0ex}}{\mathrm{LR}}_{2}=\frac{P(\mathrm{data}\phantom{\rule{thinmathspace}{0ex}}\mid {H}_{1})}{P(\mathrm{data}\phantom{\rule{thinmathspace}{0ex}}\mid {H}_{3})}& \hfill \end{array}$

Consider the table in Figure 1. The first row is consistent with all hypotheses. For D19S433 a mutation or a dropout is needed for

*H*_{1}but not for*H*_{2}. D21S11 is consistent with*H*_{1}but a mutation is needed for*H*_{2}. The final line shows clear evidence, but by most standards not conclusive, in favor of*H*_{1}. It is correct to report*LR*_{1}= 1380 since we were asked to compare brother 1 to brother 2. If we inappropriately compared brother 1 to an unrelated man, we would get an*LR*that overestimates the evidence.## Program input

relMix works with tab-separated files to import DNA and allele frequency data. These can be exported from DNA profiling or spreadsheet software. Pedigrees for paternity cases are included with the program while other arbitrarily complex pedigrees can be loaded using the Familias (https://www.familias.name/openfamilias.html) format. Finally, parameters describing mutation, drop-in, drop-out, silent alleles, and population substructure (

*θ*) are entered manually through a user friendly interface as shown in Figure 2.## New in relMix version 1.3

relMix now checks for common mistakes such as marker name inconsistencies, duplicate markers, invalid file formats and more. In particular, manually typing reference data or manipulating frequency databases can lead to subtle errors that previously resulted in wrong calculations or programme termination (e.g., TPOX vs TP0X). In addition, reuse of frequency databases coming from other programmes can lead to problems if marker naming is not consistent. We introduce specific checks for these kinds of errors which are largely based on computing the Levenshtein distance [

[1]

] between identifiers to find those that are suspiciously similar. The Levenshtein distance counts the minimum number of edits (substitutions, insertions or deletions) required to go from one string of text to another. Detected errors are presented to the user with an explanation and automatically fixed if possible. We found that setting a threshold of 2 for automatic correction of inconsistencies was beneficial because it also allows for transpositions in addition to the previous edits which are a common typing error (e.g., Plasma vs. Palsma).Development of this last version was done in GitHub, a platform that enables efficient collaboration between different authors in a project. In addition, all changes made to the codebase and the codebase itself are public, allowing for greater transparency and encouraging collaboration with other external developers. Towards end users, GitHub provides a mechanism for bug reporting and contacting the authors in which the questions and answers posted remain public and searchable for the benefit of the community. The adoption of this new workflow and development methodology is an important step for open/free software.

## Discussion

The case presented demonstrates that relMix can deal with complex cases of practical significance. The importance of modelling relationships and mutations, is clearly demonstrated. LRmixStudio (https://lrmixstudio.org) is based on a model similar to the one we use. This software includes important functionality not available in relMix, but only simple pairwise relationships. Alternative software like EuroForMix (http://www.euroformix.com) is based on continuous models. Peak height information, which may or may not be important as discussed in [

[2]

], is therefore accounted for. Alternative models and implementations based on Bayesian networks are exemplified in [[3]

].## The model

We adopt the mixture model described in [

where

The probability of observing a set

Finally, the probability of the evidence

where

Calculations are based on the R version of Familias.

[4]

] and [[5]

]. The model accounts for dropout and drop-in, but not peak heights. For a given locus, the probability that allele *a*will not appear or will appear in the mixture $\mathcal{M}$, respectively, is found as$\begin{array}{cc}\hfill P(a\notin \mathcal{M}\mid \text{g},\text{d},c)& =(1-{\mathrm{cp}}_{a})\prod _{i}{d}_{i}^{{n}_{i,b}},\hfill \\ \hfill P(a\in \mathcal{M}\mid \text{g},\text{d},c)& =1-(1-{\mathrm{cp}}_{a})\prod _{i}{d}_{i}^{{n}_{i,a}},\hfill \end{array}$

where

$\begin{array}{cc}\text{g}& =\mathrm{genotypes}\mathrm{of}\mathrm{all}\mathrm{contributors}\hfill \\ \text{d}& =\mathrm{dropout}\mathrm{probabilities}\mathrm{for}\mathrm{all}\text{c}\mathrm{ontributors}\hfill \\ {d}_{i}& =\mathrm{dropout}\mathrm{probability}\mathrm{for}\mathrm{contributor}\phantom{\rule{thinmathspace}{0ex}}i\hfill \\ {\mathrm{cp}}_{a}& =\mathrm{probability}\mathrm{that}\phantom{\rule{thinmathspace}{0ex}}a\phantom{\rule{thinmathspace}{0ex}}\mathrm{will}\mathrm{drop}\mathrm{in}\hfill \\ {n}_{i,a}& =\mathrm{n}\mathrm{umber}\mathrm{of}\mathrm{times}\phantom{\rule{thinmathspace}{0ex}}a\phantom{\rule{thinmathspace}{0ex}}\mathrm{is}\mathrm{observed}\mathrm{in}\mathrm{contributor}\phantom{\rule{thinmathspace}{0ex}}i\hfill \\ & \end{array}$

The probability of observing a set

*M*of mixture alleles is thus$P(\mathcal{M}=M\mid \text{g},\text{d},c)=\prod _{a\notin M}P(a\notin M\mid \text{g},\text{d},c)\xb7\prod _{a\in M}P(a\in M\mid \text{g},\text{d},c).$

Finally, the probability of the evidence

*E*conditioned on hypothesis*H*_{j}is found by combining the probability of the mixture with the probability of the kinship as$P(E\mid {H}_{j})=\sum _{u\in U}P(\mathcal{M}=M\mid {\text{g}}_{\text{K}},{\text{g}}_{\text{U}}=u,\text{d},c)\xb7P({\text{g}}_{\text{A}},{\text{g}}_{\text{K}},{\text{g}}_{\text{U}}=u\mid {H}_{j}),$

where

$\begin{array}{cc}{\mathbf{\text{g}}}_{\mathbf{\text{K}}}& =\mathrm{Genotypes}\mathrm{of}\mathrm{known}\mathrm{contributors}\hfill \\ {\mathbf{\text{g}}}_{\mathbf{\text{U}}}& =\mathrm{Genotypes}\mathrm{of}\mathrm{unknown}\mathrm{contributors}\hfill \\ {\mathbf{\text{g}}}_{\mathbf{\text{A}}}& =\text{\hspace{0.28em}}\mathrm{Genotypes}\mathrm{of}\mathrm{additional}\mathrm{genotyped}\mathrm{individuals}\hfill \\ U& =\mathrm{Set}\mathrm{of}\mathrm{possible}\mathrm{genotypes}\mathrm{for}\mathrm{the}\mathrm{unknown}\hfill \\ & \text{\hspace{0.28em}}\mathrm{contributor}\left(\mathrm{s}\right)\hfill \end{array}$

Calculations are based on the R version of Familias.

## References

Levenshtein distance. https://en.wikipedia.org/wiki/ Levenshtein_distance.

- The information gain from peak height data in DNA mixtures.
*Forensic Sci. Int. Genet.*2018; 36: 119-123 - Paternity testing and other inference about relationships from DNA mixtures.
*Forensic Sci. Int. Genet.*2017; 28: 128-137 - Pedigree-based relationship inference from complex DNA mixtures.
*Int. J. Legal Med.*2017; 131: 629-641 - Exploratory data analysis for the interpretation of low template DNA mixtures.
*Forensic Sci. Int. Genet.*2012; 6: 762-774

## Article info

### Publication history

Published online: October 17, 2019

Accepted:
September 25,
2019

Received:
September 12,
2019

### Identification

### Copyright

© 2019 The Authors. Published by Elsevier B.V.

### User license

Creative Commons Attribution (CC BY 4.0) | How you can reuse

Elsevier's open access license policy

Creative Commons Attribution (CC BY 4.0)

## Permitted

- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes

Elsevier's open access license policy