Limits...
CoMeta: classification of metagenomes using k-mers.

Kawulok J, Deorowicz S - PLoS ONE (2015)

Bottom Line: In CoMeta, we used the exact method for read classification using short subsequences (k-mers) and fast program for indexing large set of k-mers.In contrast to the most popular methods based on BLAST, where the query is compared with each reference sequence, we begin the classification from the top of the taxonomy tree to reduce the number of comparisons.The presented experimental study confirms that CoMeta outperforms other programs used in this context.

View Article: PubMed Central - PubMed

Affiliation: Institute of Informatics, Silesian University of Technology, Gliwice, Poland.

ABSTRACT
Nowadays, the study of environmental samples has been developing rapidly. Characterization of the environment composition broadens the knowledge about the relationship between species composition and environmental conditions. An important element of extracting the knowledge of the sample composition is to compare the extracted fragments of DNA with sequences derived from known organisms. In the presented paper, we introduce an algorithm called CoMeta (Classification of metagenomes), which assigns a query read (a DNA fragment) into one of the groups previously prepared by the user. Typically, this is one of the taxonomic rank (e.g., phylum, genus), however prepared groups may contain sequences having various functions. In CoMeta, we used the exact method for read classification using short subsequences (k-mers) and fast program for indexing large set of k-mers. In contrast to the most popular methods based on BLAST, where the query is compared with each reference sequence, we begin the classification from the top of the taxonomy tree to reduce the number of comparisons. The presented experimental study confirms that CoMeta outperforms other programs used in this context. CoMeta is available at https://github.com/jkawulok/cometa under a free GNU GPL 2 license.

Show MeSH

Related in: MedlinePlus

Classification accuracy for the Experiment One using various k parameter.The plot A represents scores after classification using FACS-P, the plot B—using FACS-C, and the plot C—using pre-CoMeta. Each series shows the results for 11 different threshold values, in sequence starting from the left part of each figure: MC = 30,35,40,…,80 [%].
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4401624&req=5

pone.0121453.g005: Classification accuracy for the Experiment One using various k parameter.The plot A represents scores after classification using FACS-P, the plot B—using FACS-C, and the plot C—using pre-CoMeta. Each series shows the results for 11 different threshold values, in sequence starting from the left part of each figure: MC = 30,35,40,…,80 [%].

Mentions: The sensitivity and precision for FACS-P, FACS-C, and pre-CoMeta for various k are presented in Fig 5A–5C. Each series shows the results for 11 different threshold values, in sequence starting from the left part of each figure: MC = 30,35,40,…,80 [%]. It can be seen from the plot A that only for a small value of k in FACS-P, the sensitivity does not drop with the increasing threshold values, while in other cases, the sensitivity for a large MC declines. The detailed analysis of the impact of the parameters k, MC and pf (for building the Bloom filters) on the accuracy of FACS-P was presented in our earlier study [57].


CoMeta: classification of metagenomes using k-mers.

Kawulok J, Deorowicz S - PLoS ONE (2015)

Classification accuracy for the Experiment One using various k parameter.The plot A represents scores after classification using FACS-P, the plot B—using FACS-C, and the plot C—using pre-CoMeta. Each series shows the results for 11 different threshold values, in sequence starting from the left part of each figure: MC = 30,35,40,…,80 [%].
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4401624&req=5

pone.0121453.g005: Classification accuracy for the Experiment One using various k parameter.The plot A represents scores after classification using FACS-P, the plot B—using FACS-C, and the plot C—using pre-CoMeta. Each series shows the results for 11 different threshold values, in sequence starting from the left part of each figure: MC = 30,35,40,…,80 [%].
Mentions: The sensitivity and precision for FACS-P, FACS-C, and pre-CoMeta for various k are presented in Fig 5A–5C. Each series shows the results for 11 different threshold values, in sequence starting from the left part of each figure: MC = 30,35,40,…,80 [%]. It can be seen from the plot A that only for a small value of k in FACS-P, the sensitivity does not drop with the increasing threshold values, while in other cases, the sensitivity for a large MC declines. The detailed analysis of the impact of the parameters k, MC and pf (for building the Bloom filters) on the accuracy of FACS-P was presented in our earlier study [57].

Bottom Line: In CoMeta, we used the exact method for read classification using short subsequences (k-mers) and fast program for indexing large set of k-mers.In contrast to the most popular methods based on BLAST, where the query is compared with each reference sequence, we begin the classification from the top of the taxonomy tree to reduce the number of comparisons.The presented experimental study confirms that CoMeta outperforms other programs used in this context.

View Article: PubMed Central - PubMed

Affiliation: Institute of Informatics, Silesian University of Technology, Gliwice, Poland.

ABSTRACT
Nowadays, the study of environmental samples has been developing rapidly. Characterization of the environment composition broadens the knowledge about the relationship between species composition and environmental conditions. An important element of extracting the knowledge of the sample composition is to compare the extracted fragments of DNA with sequences derived from known organisms. In the presented paper, we introduce an algorithm called CoMeta (Classification of metagenomes), which assigns a query read (a DNA fragment) into one of the groups previously prepared by the user. Typically, this is one of the taxonomic rank (e.g., phylum, genus), however prepared groups may contain sequences having various functions. In CoMeta, we used the exact method for read classification using short subsequences (k-mers) and fast program for indexing large set of k-mers. In contrast to the most popular methods based on BLAST, where the query is compared with each reference sequence, we begin the classification from the top of the taxonomy tree to reduce the number of comparisons. The presented experimental study confirms that CoMeta outperforms other programs used in this context. CoMeta is available at https://github.com/jkawulok/cometa under a free GNU GPL 2 license.

Show MeSH
Related in: MedlinePlus