Limits...
AKE - the Accelerated k-mer Exploration web-tool for rapid taxonomic classification and visualization.

Langenkämper D, Goesmann A, Nattkemper TW - BMC Bioinformatics (2014)

Bottom Line: With the advent of low cost, fast sequencing technologies metagenomic analyses are made possible.The large data volumes gathered by these techniques and the unpredictable diversity captured in them are still, however, a challenge for computational biology.We show that the execution time of this approach is orders of magnitude shorter than competitive approaches and that accuracy is comparable.

View Article: PubMed Central - PubMed

Affiliation: Biodata Mining, Bielefeld University, Universitätsstraße 15, Bielefeld, Germany. dlangenk@cebitec.uni-bielefeld.de.

ABSTRACT

Background: With the advent of low cost, fast sequencing technologies metagenomic analyses are made possible. The large data volumes gathered by these techniques and the unpredictable diversity captured in them are still, however, a challenge for computational biology.

Results: In this paper we address the problem of rapid taxonomic assignment with small and adaptive data models (< 5 MB) and present the accelerated k-mer explorer (AKE). Acceleration in AKE's taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical. We report classification accuracy reasonably well for ranks down to order, observed on a study on real world data (Acid Mine Drainage, Cow Rumen).

Conclusion: We show that the execution time of this approach is orders of magnitude shorter than competitive approaches and that accuracy is comparable. The tool is presented to the public as a web application (url: https://ani.cebitec.uni-bielefeld.de/ake/ , username: bmc, password: bmcbioinfo).

Show MeSH

Related in: MedlinePlus

Comparison of web-based taxonomic classifiers for AMD data set. The data is based on [12], the respective web sites and personal measures. Note that Speed (runtime in seconds) is depicted in reverse order. Predicted correctly is the number of predictions which are also present in the reference composition. Model size is the number of genomes which are included in the model building process. The percent classified value expresses the inverse of the percentage of rejected data. (Note that differences between Figure 3 (data is cited) and this Figure (data is measured, July 2014) are most probably due to different application versions).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4307196&req=5

Fig8: Comparison of web-based taxonomic classifiers for AMD data set. The data is based on [12], the respective web sites and personal measures. Note that Speed (runtime in seconds) is depicted in reverse order. Predicted correctly is the number of predictions which are also present in the reference composition. Model size is the number of genomes which are included in the model building process. The percent classified value expresses the inverse of the percentage of rejected data. (Note that differences between Figure 3 (data is cited) and this Figure (data is measured, July 2014) are most probably due to different application versions).

Mentions: A comparison of web-based taxonomic classifiers is shown in Figure 8 based on the analysis of the AMD data set. AKE outperforms PhylopythiaS [11] (generic model) and NBC [12] in all measured categories and the execution time is one (PhylopythiaS) or two (NBC) orders of magnitude faster. A result with WebCarma [8], which is a homology-based classifier, has been obtained within about a week. It outperforms all composition-based methods, with 678 correct assignments, except our system AKE (902 correct assignments) on phylum level. The number of rejects of WebCarma, i.e. the assignment to an “other” unknown class, on phylum level (42%) is comparable to PhylopythiaS but it is much higher than in NBCs or AKEs results. The detailed results are given in Table 2.Figure 8


AKE - the Accelerated k-mer Exploration web-tool for rapid taxonomic classification and visualization.

Langenkämper D, Goesmann A, Nattkemper TW - BMC Bioinformatics (2014)

Comparison of web-based taxonomic classifiers for AMD data set. The data is based on [12], the respective web sites and personal measures. Note that Speed (runtime in seconds) is depicted in reverse order. Predicted correctly is the number of predictions which are also present in the reference composition. Model size is the number of genomes which are included in the model building process. The percent classified value expresses the inverse of the percentage of rejected data. (Note that differences between Figure 3 (data is cited) and this Figure (data is measured, July 2014) are most probably due to different application versions).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4307196&req=5

Fig8: Comparison of web-based taxonomic classifiers for AMD data set. The data is based on [12], the respective web sites and personal measures. Note that Speed (runtime in seconds) is depicted in reverse order. Predicted correctly is the number of predictions which are also present in the reference composition. Model size is the number of genomes which are included in the model building process. The percent classified value expresses the inverse of the percentage of rejected data. (Note that differences between Figure 3 (data is cited) and this Figure (data is measured, July 2014) are most probably due to different application versions).
Mentions: A comparison of web-based taxonomic classifiers is shown in Figure 8 based on the analysis of the AMD data set. AKE outperforms PhylopythiaS [11] (generic model) and NBC [12] in all measured categories and the execution time is one (PhylopythiaS) or two (NBC) orders of magnitude faster. A result with WebCarma [8], which is a homology-based classifier, has been obtained within about a week. It outperforms all composition-based methods, with 678 correct assignments, except our system AKE (902 correct assignments) on phylum level. The number of rejects of WebCarma, i.e. the assignment to an “other” unknown class, on phylum level (42%) is comparable to PhylopythiaS but it is much higher than in NBCs or AKEs results. The detailed results are given in Table 2.Figure 8

Bottom Line: With the advent of low cost, fast sequencing technologies metagenomic analyses are made possible.The large data volumes gathered by these techniques and the unpredictable diversity captured in them are still, however, a challenge for computational biology.We show that the execution time of this approach is orders of magnitude shorter than competitive approaches and that accuracy is comparable.

View Article: PubMed Central - PubMed

Affiliation: Biodata Mining, Bielefeld University, Universitätsstraße 15, Bielefeld, Germany. dlangenk@cebitec.uni-bielefeld.de.

ABSTRACT

Background: With the advent of low cost, fast sequencing technologies metagenomic analyses are made possible. The large data volumes gathered by these techniques and the unpredictable diversity captured in them are still, however, a challenge for computational biology.

Results: In this paper we address the problem of rapid taxonomic assignment with small and adaptive data models (< 5 MB) and present the accelerated k-mer explorer (AKE). Acceleration in AKE's taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical. We report classification accuracy reasonably well for ranks down to order, observed on a study on real world data (Acid Mine Drainage, Cow Rumen).

Conclusion: We show that the execution time of this approach is orders of magnitude shorter than competitive approaches and that accuracy is comparable. The tool is presented to the public as a web application (url: https://ani.cebitec.uni-bielefeld.de/ake/ , username: bmc, password: bmcbioinfo).

Show MeSH
Related in: MedlinePlus