If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Correspondence to: Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Frederik V's Vej 11, DK-2100 Copenhagen, Denmark.
MHinNGS is a Python application developed for analysis of microhaplotypes (MHs) in single-end sequencing data. MHinNGS analyses reads in standard formats and store each sequence into bins, one bin for each MH as defined by the two flanking sequences. MHinNGS requires a reference genome and a configuration file with information about each locus. Four mandatory and 15 optional criteria defined in the configuration file allow detailed locus-specific analyses of the MH loci. The program 1) removes noise, 2) identify and name alleles, 3) test the genotypes, and 4) test unique sequences not identified as noise or alleles. MHinNGS produces a result file, where every unique sequence that passed the noise filter is presented with MH allele, read depth, warning flags based on the genotyping criteria, sequence, heterozygote balance, and MH name. Furthermore, variation in other parts of the fragment that is not defined as SNPs in the MH, linked variants, or rare SNPs are listed in a separate column of the result file.
Microhaplotypes (MHs) consist of two or more polymorphic loci (typically SNPs or small indels) within a short stretch of DNA (typically 2–300 nucleotides) [
]. The relative short distances between the variants allow for efficient PCR amplification and sequencing of the entire amplicon, which makes PCR-NGS assays targeting MH loci highly sensitive and potentially interesting for forensic genetic applications [
MHs have three important advantages compared to the standard STR loci used in forensic genetics: 1) Amplification of MHs do not generate stutter artefacts, that complicates data analysis of mixture samples [
], which is particularly important for relationship testing. 3) The amplicon lengths of the different MH alleles are the same. This prevents NGS read count variation due to differently sized alleles, which is observed for most STRs [
] and may be a problem in the analysis of highly degraded samples.
2. Materials and methods
MHinNGS is a freely available python script (https://hub.docker.com/r/bioinformatician/mhinngs) developed for analysis of MHs in single-end sequencing data. MHinNGS is built upon the program STRinNGS v2.0 [
], that is used for analysis of STR sequences, and they have many similar features. MHinNGS needs three input files: 1) One file or folder containing the reads (FASTQ, FASTA, BAM, SAM, or CRAM format), 2) A reference genome in FASTA format, and 3) A configuration file containing information about each locus. The configuration file has five mandatory elements and 15 optional criteria (Table 1).
MHinNGS output consist of three files: 1) A log file containing various information about the run such as program version, input files, and parameter settings. 2) A result file in csv format with filtered data and all comments, and 3) A file named raw_results in csv format that contains all data including noise sequences, but without allele name, comments, and heterozygote balance.
3. Results and conclusions
In short, MHinNGS collects and stores sequences in bins, one bin for each MH, according to the two flanking sequences (‘flank_up_length’ and ‘flank_down_length’ in Table 1). Next, the program removes noise, identifies and names alleles, tests the genotype, and tests unique sequences (Fig. 1), that were not identified as either noise or alleles. In addition to the criteria defined in STRinNGS [
], four criteria have been added to the MHinNGS configuration file: ‘mh_info’, ‘slide’, ‘linked_allele’ and ‘rare_snp’ (Table 1). Each variant (SNP or indel) of the MH is defined in the configuration file (‘mh_info’ in Table 1) with rs number (if known), genome position, surrounding nucleotides, and possible alleles. The variant is identified by searching for the surrounding nucleotides to the variant position. The surrounding nucleotides must be an exact match. If a match is not found, the program will slide one nucleotide to the left or right, and try again, until the surrounding nucleotides match or the slide maximum (‘slide’ in Table 1) is reached. MHinNGS also searches for additional variants between the start and stop position (Table 1). If a variant is identified, the position and base call is indicated in the result file (Supplementary Tables 1 and 2), but it is not included in the MH name.
Fig. 1Genotype calling with MHinNGS. There are three groups of reads for a locus as indicated on the left. ‘Total reads’ are all reads identified via the upstream and downstream flank. The ‘Reads for genotype call’ are all the reads that are left after noise reads have been removed. The ‘Genotype reads’ are the reads that make up the genotype. Thresholds and possible flags (Table 1) for each group of reads are indicated on the right.
In the configuration file, it is possible to ignore specific positions (‘ignore_pos’ in Table 1) with frequent errors, that generate multiple unique sequences (example in Supplementary Table 1). Furthermore, it is possible to define rare variants (‘rare_snp’ in Table 1), that are not part of the MH, with rs number, genome position, and alternative allele. If the alternative allele is detected, a warning flag is raised (“Rare SNP”) in the comment column of the result file. However, the SNP is not included in the MH name.
Linked alleles may be defined in the configuration file (linked_allele in Table 1) with genome position, the MH allele that the allele is linked to, and the variant allele. If the allele is detected and the MH allele is identical to the expected, linked MH allele, the position and base call of the SNP allele is not shown in the results file. If another haplotype is detected, the flag “Linked allele not linked” is shown in the comment column of the result file.
In conclusion, MHinNGS is a freely available MH analysis software that provide the user with maximum flexibility and complete control of the analysis process.