If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Late in 2021, the Y Chromosome Haplotype Reference Database (YHRD) added the capability to perform discrete Laplace statistical calculations on searches performed against their SWGDAM-compliant U.S. subpopulations. Because discrete Laplace is not a commonly used or reported statistic in the United States, the SWGDAM Lineage Marker Committee, responsible for maintaining the SWGDAM Interpretation Guidelines for Y-Chromosome STR Testing, evaluated the feature to assess its ease of use and applicability to U.S. casework. Discrete Laplace calculates profile probabilities based on their genetic distance from sets of ancestral alleles and can yield much more informative probability estimates than the commonly used Clopper-Pearson 95% upper confidence interval (UCI). This is especially true for rare profiles with no database observations because, unlike the 95% UCI, the discrete Laplace calculation is not based upon how many times a profile is observed in the database. However, the statistic as applied by YHRD also has some limitations, such as a requirement that the query profile is complete for the ‘minimal’ kit and that expanded loci beyond those included in the Y17 kit cannot be included in the calculation. Here, we explain how discrete Laplace works and demonstrate how the results compare to those generated using the 95% UCI.
Since 2021 it has been possible to perform discrete Laplace (DL) statistical calculations on searches performed against the SWGDAM-compliant U.S. subpopulations in YHRD [
]. While commonly used in Europe for Y-STR reporting, discrete Laplace remains an obscure statistical calculation in North American forensic laboratories as revealed by survey results gathered by the Scientific Working Group on DNA Analysis Methods (SWGDAM) Lineage Marker Committee (LMC) in June 2021 (unpublished).
Discrete Laplace uses the genetic distance of an evidentiary haplotype from the ancestral haplotype of a given population to calculate an estimated population frequency for the evidentiary haplotype. In short, for each locus in a profile, a DL probability distribution is centered over the ancestral allele. The width of the distribution is set to encompass a reasonable range of shorter and longer alleles that may have evolved over time from the ancestral allele in the population of interest. The probability of observing an allele decreases as repeat units are added or lost. Because loci mutate independently, the individual evidentiary allele probabilities can be multiplied. A lower profile probability translates to a rarer haplotype.
Here, we aim to introduce the North American forensic community to the discrete Laplace statistic as applied in YHRD and demonstrate how statistics generated using discrete Laplace compare to the results obtained from the commonly used Clopper-Pearson 95% upper confidence interval (UCI) developed from masked and transient searches [
For this evaluation, sixteen complete Y23 profiles and 2 complete Yfiler profiles (one very common and one with zero observations) taken from laboratory staff databases were searched against YHRD release 67 (R67) using the new masked and transient search capabilities. 95% upper confidence interval (UCI) statistics were developed using masked Y23 searches and masked and transient Y17 searches. DL (Minimal and Y17) statistics were developed from the Y23 and Yfiler profiles.
3. Results
DL (Y17) typically yielded lower frequency estimates than the 95% UCI for unobserved profiles (Fig. 1). Often, estimates were rarer than 1 in 1 million. The rarest 95% UCI estimate in our data for Y17 (masked) among the searched populations was 1 in 2834 for all unobserved profiles in the Admixed U.S. Caucasian American subpopulation, while the rarest DL (Y17) estimate was 1 in 369,616,080. This is the population specific lower bound for the Admixed U.S. Asian American subpopulation and was returned for five of the unobserved profiles.
Fig. 1Frequency estimate data developed using YHRD R67. All profiles were complete. Profiles 1–16 were developed using Y23 and profiles 17 and 18 were developed using Yfiler. ‘Y23 masked’ was searched using Y23 kit and Y23 dataset. ‘Y23/Y17 transient’ was searched using Y23 kit and Y17 dataset. ‘Y17 masked’ was searched using Y17 kit and Y17 dataset. DL (Y17) is automatically calculated for any profile complete for the Y17 loci. DL (minimal) was calculated using transient searching to the minimal dataset.
For common profiles which are well represented in the database, DL and the 95% UCI returned comparable rarity estimates. For example, Profile 17, which is a Yfiler profile, was observed 17 times in the Admixed U.S. Caucasian Y17 dataset, yielding a 95% UCI estimate of 1 in 333. The DL (Y17) estimate for this haplotype was 1 in 412.
DL (minimal) often yielded frequency estimates more conservative (less rare) than the 95% UCI, even for unobserved haplotypes, but also at times yielded estimates in the hundreds of thousands. Interestingly, when Profiles 8 and 15 were searched against the U.S. Hispanic American subpopulation, DL (Y17) yielded more conservative estimates than DL (minimal). These were the only observed instances where the minimal DL estimate was rarer than the Y17 DL estimate.
4. Discussion
As implemented in YHRD, the LMC found the application of the discrete Laplace profile probability estimate to be straightforward and easily interpreted. YHRD presents the DL statistic as a “1 in” number that is easily understood and comparable to the “1 in” numbers provided for the commonly used 95% UCI.
Our results show that the DL (Y17) statistic is far more informative for rare profiles with no database observations than the 95% UCI. This is because, unlike the Clopper-Pearson estimate, in which the shape of the probability distribution is based on the size of the database and the number of observations in the database (the counting method), the DL statistic does not take the database size or number of observations into account, relying instead on phylogenetic relationships and genetic distance.
It was noted during evaluation that YHRD implements population-specific lower bounds on DL frequency estimates. This is to prevent unrealistically rare frequency estimates for profiles that do not “belong” ancestrally to the searched population. These lower bounds are reported in the search results as metapopulation-specific cutoff values.
There are limitations to the use of discrete Laplace that must be considered. For example, microvariants and duplicated loci such as DYS385 will not be considered in the calculation. With the exception of DYS385, YHRD requires that at a minimum, a complete ‘minimal’ kit profile is available for query samples. Expanded loci beyond Y17 are not considered because the statistic becomes increasingly anticonservative as the number of loci increases [
Discrete Laplace (Y17) yielded more informative frequency estimates than the Clopper-Pearson 95% UCI, which tends to overestimate the frequency of rare haplotypes because the Clopper-Pearson calculation is dependent on database size [
]. In the instances where query profiles do not meet the requirements for discrete Laplace, the user can fall back on the 95% UCI, which yields results similar to DL (minimal) in many cases.
Conflict of interest statement
We have no conflicts of interest to disclose.
Acknowledgments
We wish to thank the Scientific Working Group on DNA Analysis Methods (SWGDAM) and our fellow members of the Lineage Marker Committee for their support.
References
Andersen M.M.
Eriksen P.S.
Morling N.
The discrete Laplace exponential family and estimation of Y-STR haplotype frequencies.