Limits...
Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data.

Hong C, Clement NL, Clement S, Hammoud SS, Carrell DT, Cairns BR, Snell Q, Clement MJ, Johnson WE - BMC Bioinformatics (2013)

Bottom Line: However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases.We tested GNUMAP-bs and other commonly-used bisulfite alignment methods using both simulated and real bisulfite reads and found that GNUMAP-bs and other dynamic programming methods were more accurate than the more heuristic methods.The GNUMAP-bs aligner is a highly accurate alignment approach for processing the data from bisulfite sequencing experiments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA. wej@bu.edu.

ABSTRACT

Background: DNA methylation has been linked to many important biological phenomena. Researchers have recently begun to sequence bisulfite treated DNA to determine its pattern of methylation. However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases. Therefore, it is often difficult to align reads to the correct locations in the reference genome. Furthermore, bisulfite sequencing experiments have the additional complexity of having to estimate the DNA methylation levels within the sample.

Results: Here, we present a highly accurate probabilistic algorithm, which is an extension of the Genomic Next-generation Universal MAPper to accommodate bisulfite sequencing data (GNUMAP-bs), that addresses the computational problems associated with aligning bisulfite sequencing data to a reference genome. GNUMAP-bs integrates uncertainty from read and mapping qualities to help resolve the difference between poor quality bases and the ambiguity inherent in bisulfite conversion. We tested GNUMAP-bs and other commonly-used bisulfite alignment methods using both simulated and real bisulfite reads and found that GNUMAP-bs and other dynamic programming methods were more accurate than the more heuristic methods.

Conclusions: The GNUMAP-bs aligner is a highly accurate alignment approach for processing the data from bisulfite sequencing experiments. The GNUMAP-bs algorithm is freely available for download at: http://dna.cs.byu.edu/gnumap. The software runs on multiple threads and multiple processors to increase the alignment speed.

Show MeSH
Relative complement mapping consistency of GNUMAP-bs  with HEP methylation profiles of human chromosome 22. Venn diagrams between GNUMAP-bs and (a) Novoalign, (b) BSMAP, and (c) Bismark showing both the number of covered/uncovered CG sites and the concordance (in parenthesis) of these sites with the HEP methylation profiles. The estimated levels of methylation in the additional CG sites covered by the probabilistic aligners but not by the other aligners are highly concordant with the HEP results.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3924334&req=5

Figure 2: Relative complement mapping consistency of GNUMAP-bs with HEP methylation profiles of human chromosome 22. Venn diagrams between GNUMAP-bs and (a) Novoalign, (b) BSMAP, and (c) Bismark showing both the number of covered/uncovered CG sites and the concordance (in parenthesis) of these sites with the HEP methylation profiles. The estimated levels of methylation in the additional CG sites covered by the probabilistic aligners but not by the other aligners are highly concordant with the HEP results.

Mentions: The GNUMAP-bs alignment algorithm is a modification of the GNUMAP algorithm, which consists of three main steps, all of which needed to be modified to align BSRs to a reference genome. A flow chart of the GNUMAP-bs algorithm is displayed in Figure2. The first step is the construction of a hash table using all genomic subsequences, where the k nucleotide (nt) long (k-mers) are the keys and the hash table values store the genomic locations of the k-mer. In addition, k-mers from the BSRs are incrementally referenced in the reads in the genomic hash table. In GNUMAP-bs, the genome and the reads are artificially 'BS-converted’ by changing all Cs to Ts before the hashing step. This process ensures that all BSRs can be referenced into the hash table regardless of whether or not they contain methylated bases.


Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data.

Hong C, Clement NL, Clement S, Hammoud SS, Carrell DT, Cairns BR, Snell Q, Clement MJ, Johnson WE - BMC Bioinformatics (2013)

Relative complement mapping consistency of GNUMAP-bs  with HEP methylation profiles of human chromosome 22. Venn diagrams between GNUMAP-bs and (a) Novoalign, (b) BSMAP, and (c) Bismark showing both the number of covered/uncovered CG sites and the concordance (in parenthesis) of these sites with the HEP methylation profiles. The estimated levels of methylation in the additional CG sites covered by the probabilistic aligners but not by the other aligners are highly concordant with the HEP results.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3924334&req=5

Figure 2: Relative complement mapping consistency of GNUMAP-bs with HEP methylation profiles of human chromosome 22. Venn diagrams between GNUMAP-bs and (a) Novoalign, (b) BSMAP, and (c) Bismark showing both the number of covered/uncovered CG sites and the concordance (in parenthesis) of these sites with the HEP methylation profiles. The estimated levels of methylation in the additional CG sites covered by the probabilistic aligners but not by the other aligners are highly concordant with the HEP results.
Mentions: The GNUMAP-bs alignment algorithm is a modification of the GNUMAP algorithm, which consists of three main steps, all of which needed to be modified to align BSRs to a reference genome. A flow chart of the GNUMAP-bs algorithm is displayed in Figure2. The first step is the construction of a hash table using all genomic subsequences, where the k nucleotide (nt) long (k-mers) are the keys and the hash table values store the genomic locations of the k-mer. In addition, k-mers from the BSRs are incrementally referenced in the reads in the genomic hash table. In GNUMAP-bs, the genome and the reads are artificially 'BS-converted’ by changing all Cs to Ts before the hashing step. This process ensures that all BSRs can be referenced into the hash table regardless of whether or not they contain methylated bases.

Bottom Line: However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases.We tested GNUMAP-bs and other commonly-used bisulfite alignment methods using both simulated and real bisulfite reads and found that GNUMAP-bs and other dynamic programming methods were more accurate than the more heuristic methods.The GNUMAP-bs aligner is a highly accurate alignment approach for processing the data from bisulfite sequencing experiments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA. wej@bu.edu.

ABSTRACT

Background: DNA methylation has been linked to many important biological phenomena. Researchers have recently begun to sequence bisulfite treated DNA to determine its pattern of methylation. However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases. Therefore, it is often difficult to align reads to the correct locations in the reference genome. Furthermore, bisulfite sequencing experiments have the additional complexity of having to estimate the DNA methylation levels within the sample.

Results: Here, we present a highly accurate probabilistic algorithm, which is an extension of the Genomic Next-generation Universal MAPper to accommodate bisulfite sequencing data (GNUMAP-bs), that addresses the computational problems associated with aligning bisulfite sequencing data to a reference genome. GNUMAP-bs integrates uncertainty from read and mapping qualities to help resolve the difference between poor quality bases and the ambiguity inherent in bisulfite conversion. We tested GNUMAP-bs and other commonly-used bisulfite alignment methods using both simulated and real bisulfite reads and found that GNUMAP-bs and other dynamic programming methods were more accurate than the more heuristic methods.

Conclusions: The GNUMAP-bs aligner is a highly accurate alignment approach for processing the data from bisulfite sequencing experiments. The GNUMAP-bs algorithm is freely available for download at: http://dna.cs.byu.edu/gnumap. The software runs on multiple threads and multiple processors to increase the alignment speed.

Show MeSH