If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Massively parallel sequencing (MPS, aka NGS) is revolutionizing the field of forensics. Existing forensic short tandem repeat polymorphisms (STRPs) are more informative when typed by MPS. MPS also allows STRPs and forensic SNP panels to be multiplexed to add information , on ancestry and phenotype, to the identification information from STRPs. MPS also makes possible microhaplotypes: small segments of DNA (<300 bp) with two or more single nucleotide polymorphisms (SNPs) unambiguously defining three or more haplotypes. Because a single sequence read can cover the expanse of the microhaplotype, these loci become phase-known codominant systems. The multiple alleles (haplotypes) provide much more information than a single SNP for the same effort. Data now available on 129 loci characterized on 55 populations from around the globe demonstrate that the majority of these microhaplotypes appear to be useful in forensics for individual identification, ancestry inference, estimating relationships, and especially deconvoluting mixtures.
SNPs that are molecularly very close will have extremely low recombination rates, much lower than the mutation rates of the average forensic STRP. These SNPs can still define multiple haplotypes, creating a multi-allelic locus, with the number of alleles and their frequencies depending on the history of the accumulation of the variants at the different sites, the occurrence historically of rare crossovers, the vagaries of random genetic drift, and/or selection. Those DNA sequencing platforms that provide continuous runs of a hundred base pairs or more on a single DNA molecule will directly determine for a single individual the phase of the multiple SNPs within the small DNA segment. We designate such loci as microhaplotypes [
The multiple alleles of these microhaplotypes can be more informative than simple two-allele SNPs for many types of forensic analyses: identifying biological relatives, individual identification, and inferring the ethnicity of an individual’s ancestors. Thus, on a per locus basis, sequencing of haplotypes of close SNPs can yield more information than sequencing a single SNP. Because they are co-dominant systems, microhaplotypes with multiple alleles and high enough heterozygosity can be especially useful in identifying mixtures of DNA from more than one person and deconvoluting such mixtures [
]. The question has been whether a sufficient number of appropriately informatively microhaplotype loci can be identified. With the number of loci we have now identified, it is clear that a sufficient number of microhaplotypes can be, and for some purposes has been, identified and characterized.
2. Material and methods
Over the past decades we have accumulated dense SNP genotype data at multiple genomic regions for over 2500 individuals from 50+ globally diverse populations in the Yale lab.
Searches of these data identified many candidate microhap loci. We searched for those that were not in complete linkage disequilibrium and therefore likely to define multiple alleles. Of the >150 loci identified, preliminary analyses of a few of the most promising as individual SNPs typed by TaqMan and statistically phased into haplotypes confirmed to us the potential of microhaplotypes [
]. Subsequently we have screened many existing databases—HGDP, HapMap, 1000 Genomes, etc.—to identify potential microhaplotypes. We evaluated them on our sample of populations by typing all individuals for individual SNPs using TaqMan assays; we then statistically phased the genotypes into haplotypes. Of the >150 loci identified, many were not worth evaluating beyond initial results because they were insufficiently heterozygous.
The statistical phasing assigned each individual a genotype of two of the haplotypes that were estimated to exist globally. As a measure of the ability to deconvolute a mixture the resulting co- dominant genotypes were evaluated for the effective number of alleles (Ae) in each population as described [
] on the allele frequencies in the 55 populations.
3. Results
We have now identified many microhaplotypes and evaluated 129 in 55 populations. We have used the global average Ae to rank loci for the ability to deconvolute mixtures. We have also used the informativeness (In) of a microhap across the 55 populations studied to rank the loci for ancestry inference.
Out of the 129 microhaps 86 have a global average Ae > 2 (Fig. 1). These are, on average, better for mixture deconvolution than a di-allelic SNP with equal frequencies for the two alleles. 23 loci have an Ae > 3 and 6 of those have an Ae > 4. As shown in [
], some of the loci are known to have moderately common variation that we have not yet incorporated. Any low frequency variants not used in our studies will define new haplotypes when sequencing is used and increase the Ae value for any population in which it occurs. Table 1 shows that the 23 loci that have an Ae > 3, and especially the subset of 6 with an Ae > 4, already provide, on average, virtual certainty of identifying a mixture based on seeing loci with three or more alleles when the loci are tested in a multiplex.
Fig. 1The 86 microhaplotypes with global average Ae > 2 classified by Ae value into groups. The intervals of 0.25 are indicated by the lower bound.
A microhap locus can only have a high global average Ae if most of the populations have a high Ae. Thus, one expects relatively little variation among the populations, the opposite of what one wants for ancestry inference. Conversely, since one wants large amounts of allele frequency variation among populations for ancestry inference, loci selected for high In are expected to have less value for mixture deconvolution in at least some populations. For the actual values, In across the 55 populations and global average Ae, the correlation coefficient is 0.543 across the 129 microhaps. Indeed, there is significant overlap of the top 30 by Ae and the top 30 by In yielding a total of 45 microhap loci.
4. Conclusion
The speed, accuracy, and read lengths currently available require that forensics consider MPS methodology. SNPs for any of the purposes noted above can be genotyped by sequencing and all types can be pooled to give a collection of SNPs addressing all major forensic DNA questions in one laboratory analysis. For many reasons we believe that focusing on microhaplotypes is the best approach to maximizing the information obtained by sequencing. The potential value of microhaplotypes [
] and the new results presented here document our progress to find, select, and validate microhaplotype loci for forensic work.
Conflict of interest
Other than the obvious interest of Thermo Fisher in possible future commercial development, the authors have no conflicts of interest. The data are being made available.
Acknowledgements
These studies have been supported in part by Thermo Fisher as part of collaborative research and in part by grant 2013-DN-BX-K023 to KKK awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. Points of view in this presentation are those of the authors and do not necessarily represent the official position or policies of the U.S. Department of Justice.
References
Kidd K.K.
Pakstis A.J.
Speed W.C.
Lagace R.
Chang J.
Wootton S.
Ihuegbu N.
Microhaplotype loci are a powerful new type of forensic marker.