If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Orogen advances the science of relationship prediction by using Ped-sim, which improves upon previous models by incorporating crossover interference and sex-specific genetic maps. The Orogen tool provides accurate relationship predictions for a wide range of relationship types. It properly differentiates between close relatives at 23andMe, which is a newly available functionality for standalone tools. It provides new granularity of close relationships by showing the differences between paternal and maternal sides and in-group relationship types.
Relationship prediction is important for determining kinship with DNA matches by showing all possible relationships and ranking them in the most probable order. Direct-to-consumer DNA testing services have used simulations to provide customers with relationship estimations for several years [
], show that these features have substantial effects on the shared DNA between relatives.
Population weights are also important. A person has many more distant cousins than close cousins, so when a person looks at their DNA matches they are seeing a sample that is biased towards distant cousins. Population weights overcome that bias, resulting in much higher probabilities for more distant cousins. Henn et al. [
] showed that the number of cousins that a person has will grow exponentially and indefinitely with increasing degree of cousinship. This model ignores pedigree collapse and thus can only be used to a certain degree of cousinship. Population models have also so far ignored cousins once removed, who along with half cousins will make up a large portion of a person’s DNA matches.
Here I introduce Orogen relationship predictions, which incorporate the advanced modeling of Ped-sim and population weights that approximate the average number of cousins a person has in order to overcome the bias towards distant DNA matches.
There are five main steps necessary for producing accurate relationship predictions.
2.1 Simulating shared DNA between relatives
Simulations were run in Ped-sim with 500,000 trials for each relative type with the exceptions that parent/child relationships were not simulated and that maternal paternal relationships were used in place of paternal maternal relationships since the shared DNA has the same properties. I used sex-specific genetic maps and crossover interference for all simulations. Half relationships more distant than Group 3 and cousins two times or more removed were not included.
I created a conversion factor based on the total genetic map length (3346 cMs) of Bhérer et al. [
], used by Ped-sim, and the different genetic lengths at 23andMe (3538 cMs) and AncestryDNA (3489 cMs) (data collected by the author). For simplification, I then applied this conversion to each segment in Ped-sim output files as well as low cM cutoffs to each segment in accordance with the thresholds used at 23andMe and AncestryDNA [
] map is 176.3 cMs and is ∼182 cMs at 23andMe for one chromosome copy (data collected by the author). I included options for two female testers or for two male testers, which require different simulation parameters.
For parent/child relationships, I generated a normal distribution to generate shared cMs for parent/child relationships after genotyping errors. I did this by approximately matching the distribution to Fig. 5.3 of the 2020 AncestryDNA Matching White Paper [
]. I then kept only the lower half of this distribution, which has the genetic map length of the respective companies as both a maximum and a mode.
2.2 Finding counts for each relationship type in bins
I established 1 cM bins that are open on the left, centered on integers, and closed on the right, placing the total counts for each relationship type into each bin. In order to represent each group equally, I divided the counts for each relationship type by the number of types in that group. For example, Group 2 comprises six different types, including paternal half-siblings, maternal half-siblings, paternal grandparent/grandchild, etc. Each of these six types represents 500,000 pairs of individuals. I divided the counts for all relationship types in Group 2 so that it would have equal representation compared to Groups 5–16, which each comprise 500,000 cousin pairs.
2.3 Smoothing the counts across bins for each relationship type
Even with 500,000 trials, plots of the counts for a given relationship type across bins will not be smooth. To avoid probability curves that are non-monotonic, I smoothed the counts for each relationship type across bins by applying moving window averages iteratively, with the window size usually decreasing on each pass.
2.4 Applying population weights
I used a simple population model and assumed, like Henn et al. [
], a constant survival rate (SR) of 2.5 children per family. For population weights in this study I ignored half relationships, cousins twice removed or more, and descendants. The formula for the number of nth cousins a person has, on average, under these assumptions is as follows: , where g is the number of generations back to the common ancestor and . The formula for cousins once removed is as follows: , where the first addend is for the younger generation and the second addend is for the older generation. In order to have smoother growth across groups, I used only the average of the two types of cousins once removed. For every bin for a given relationship type, I then multiplied the count by the group population weights described above.
2.5 Calculating probabilities for each relationship type
Finally, I calculated the relative probabilities for each relationship type by dividing its count by the total count of all relationship types in each bin. For the percentage input boxes, I divided the cM amounts by the total genetic map, including X-DNA, to obtain a percentage.
3. Results and discussion
Un-smoothed probability curves (Fig. 1a) would give varying predictions for small changes in cM inputs. Fig. 1b shows smoothed but unweighted probability curves, which appear to be monotonic over the appropriate intervals. The population weighted curves in Fig. 1c are only subtly different than the unweighted probability curves. More distant relationships are shifted to the right and slightly lower, but close relationships appear unchanged.
The resulting tool can be found at https://dna-sci.com/tools/orogen-wtd/, where users can obtain probabilities for relationships by entering values of 8 cMs or higher as inputs Rather than display the probabilities by relationship type groups with the same average shared DNA, Orogen shows them with finer resolution by relationships with curves that are significantly different in Fig. 1.
Orogen provides correct relationship predictions for average values, but excels at close relationships with values farther from the mean. A few examples will illustrate this. Two paternal half-siblings share 2055 cMs at AncestryDNA (data collected by the author). Using another tool found at DNA Painter [
], whose probabilities come from the 2016 AncestryDNA White Paper, all of the Group 2 probabilities are combined. The values are similar between the two tools. DNA Painter shows a 92% probability for Group 2 while the Orogen probability is 95.7%. The benefit of Orogen in this case is that it shows that paternal relationships are more likely and that maternal and avuncular relationships are less likely.
As another example, a woman and her known paternal grandmother share 38.2% DNA at 23andMe (data collected by the author) with no pedigree collapse, since her father has also been tested. The relationship predictor at DNA Painter converts this percentage to 2842 cMs and gives a 100% chance of full-siblings, leaving no possibility for paternal grandmother. Orogen predicts the correct relationship with a ∼20% probability. Although an ∼80% probability is given for full-siblings, it is not possible since no completely identical regions were shared.
Orogen also accurately predicts full-sibling relationships at 23andMe. A typical value for full-siblings at 23andMe is 3500 cMs, just under the theoretical average. This is given a ∼15% probability at Orogen and a 0% probability at DNA Painter.
Future work will concentrate on a population model that includes pedigree collapse.
Conflict of interest statement
The author has no conflict of interest to declare.