Limits...
Improving read mapping using additional prefix grams.

Kim J, Li C, Xie X - BMC Bioinformatics (2014)

Bottom Line: Hobbes2 efficiently identifies all mapping locations of reads using a novel technique that utilizes additional prefix q-grams to improve filtering.We extensively compare Hobbes2 with state-of-the-art read mappers, and show that Hobbes2 can be an order of magnitude faster than other read mappers while consuming less memory space and achieving similar accuracy.We propose Hobbes2 to improve the accuracy of read mapping, specialized in identifying all mapping locations of each read.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of California, Irvine, USA. xhx@ics.uci.edu.

ABSTRACT

Background: Next-generation sequencing (NGS) enables rapid production of billions of bases at a relatively low cost. Mapping reads from next-generation sequencers to a given reference genome is an important first step in many sequencing applications. Popular read mappers, such as Bowtie and BWA, are optimized to return top one or a few candidate locations of each read. However, identifying all mapping locations of each read, instead of just one or a few, is also important in some sequencing applications such as ChIP-seq for discovering binding sites in repeat regions, and RNA-seq for transcript abundance estimation.

Results: Here we present Hobbes2, a software package designed for fast and accurate alignment of NGS reads and specialized in identifying all mapping locations of each read. Hobbes2 efficiently identifies all mapping locations of reads using a novel technique that utilizes additional prefix q-grams to improve filtering. We extensively compare Hobbes2 with state-of-the-art read mappers, and show that Hobbes2 can be an order of magnitude faster than other read mappers while consuming less memory space and achieving similar accuracy.

Conclusions: We propose Hobbes2 to improve the accuracy of read mapping, specialized in identifying all mapping locations of each read. Hobbes2 is implemented in C++, and the source code is freely available for download at http://hobbes.ics.uci.edu.

Show MeSH
Filtering effect of an additional prefixq-gram.Gray-scaled areas indicate candidates. (a) An additional prefixq-gram g3 plays an important role of filteringout a number of false positives in E and F. (b) If weuse k+1=2q-grams, g1 andg2, much more candidates are generated.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3927682&req=5

Figure 2: Filtering effect of an additional prefixq-gram.Gray-scaled areas indicate candidates. (a) An additional prefixq-gram g3 plays an important role of filteringout a number of false positives in E and F. (b) If weuse k+1=2q-grams, g1 andg2, much more candidates are generated.

Mentions: which is illustrated in a diagram in Figure 2(a). If we compareit with the candidate set generated by k+1 prefix q-gramsg1 and g2, which is depicted in Figure2(b), we can see that an additional prefix q-gramg3 plays a significant role of filtering out falsepositives.


Improving read mapping using additional prefix grams.

Kim J, Li C, Xie X - BMC Bioinformatics (2014)

Filtering effect of an additional prefixq-gram.Gray-scaled areas indicate candidates. (a) An additional prefixq-gram g3 plays an important role of filteringout a number of false positives in E and F. (b) If weuse k+1=2q-grams, g1 andg2, much more candidates are generated.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3927682&req=5

Figure 2: Filtering effect of an additional prefixq-gram.Gray-scaled areas indicate candidates. (a) An additional prefixq-gram g3 plays an important role of filteringout a number of false positives in E and F. (b) If weuse k+1=2q-grams, g1 andg2, much more candidates are generated.
Mentions: which is illustrated in a diagram in Figure 2(a). If we compareit with the candidate set generated by k+1 prefix q-gramsg1 and g2, which is depicted in Figure2(b), we can see that an additional prefix q-gramg3 plays a significant role of filtering out falsepositives.

Bottom Line: Hobbes2 efficiently identifies all mapping locations of reads using a novel technique that utilizes additional prefix q-grams to improve filtering.We extensively compare Hobbes2 with state-of-the-art read mappers, and show that Hobbes2 can be an order of magnitude faster than other read mappers while consuming less memory space and achieving similar accuracy.We propose Hobbes2 to improve the accuracy of read mapping, specialized in identifying all mapping locations of each read.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of California, Irvine, USA. xhx@ics.uci.edu.

ABSTRACT

Background: Next-generation sequencing (NGS) enables rapid production of billions of bases at a relatively low cost. Mapping reads from next-generation sequencers to a given reference genome is an important first step in many sequencing applications. Popular read mappers, such as Bowtie and BWA, are optimized to return top one or a few candidate locations of each read. However, identifying all mapping locations of each read, instead of just one or a few, is also important in some sequencing applications such as ChIP-seq for discovering binding sites in repeat regions, and RNA-seq for transcript abundance estimation.

Results: Here we present Hobbes2, a software package designed for fast and accurate alignment of NGS reads and specialized in identifying all mapping locations of each read. Hobbes2 efficiently identifies all mapping locations of reads using a novel technique that utilizes additional prefix q-grams to improve filtering. We extensively compare Hobbes2 with state-of-the-art read mappers, and show that Hobbes2 can be an order of magnitude faster than other read mappers while consuming less memory space and achieving similar accuracy.

Conclusions: We propose Hobbes2 to improve the accuracy of read mapping, specialized in identifying all mapping locations of each read. Hobbes2 is implemented in C++, and the source code is freely available for download at http://hobbes.ics.uci.edu.

Show MeSH