Limits...
Exploration and retrieval of whole-metagenome sequencing samples.

Seth S, Välimäki N, Kaski S, Honkela A - Bioinformatics (2014)

Bottom Line: Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before.We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples.We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples.

View Article: PubMed Central - PubMed

Affiliation: Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Show MeSH

Related in: MedlinePlus

Processing steps of our method. Given a collection of metagenomic samples, we use the collection as an input to the DSM method (4). For the method, we estimate the frequency of each k-mer (1, 2), evaluate if the k-mer is informative or not (3), and compute the needed dissimilarities (5). Finally, in this article we evaluate the performance considering the existing annotations as ground truth; annotations are not needed for the retrieval in general
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230234&req=5

btu340-F2: Processing steps of our method. Given a collection of metagenomic samples, we use the collection as an input to the DSM method (4). For the method, we estimate the frequency of each k-mer (1, 2), evaluate if the k-mer is informative or not (3), and compute the needed dissimilarities (5). Finally, in this article we evaluate the performance considering the existing annotations as ground truth; annotations are not needed for the retrieval in general

Mentions: To summarize, we introduce methods to (i) estimate the frequencies of a large number of k-mers over multiple samples, (ii) decide if a k-mer is informative or uninformative in the context of a retrieval task, (iii) compute a distance metric using the filtered k-mer frequencies, and (iv) execute these steps fast without explicitly storing the frequency values. Figure 2 summarizes the method.Fig. 2.


Exploration and retrieval of whole-metagenome sequencing samples.

Seth S, Välimäki N, Kaski S, Honkela A - Bioinformatics (2014)

Processing steps of our method. Given a collection of metagenomic samples, we use the collection as an input to the DSM method (4). For the method, we estimate the frequency of each k-mer (1, 2), evaluate if the k-mer is informative or not (3), and compute the needed dissimilarities (5). Finally, in this article we evaluate the performance considering the existing annotations as ground truth; annotations are not needed for the retrieval in general
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230234&req=5

btu340-F2: Processing steps of our method. Given a collection of metagenomic samples, we use the collection as an input to the DSM method (4). For the method, we estimate the frequency of each k-mer (1, 2), evaluate if the k-mer is informative or not (3), and compute the needed dissimilarities (5). Finally, in this article we evaluate the performance considering the existing annotations as ground truth; annotations are not needed for the retrieval in general
Mentions: To summarize, we introduce methods to (i) estimate the frequencies of a large number of k-mers over multiple samples, (ii) decide if a k-mer is informative or uninformative in the context of a retrieval task, (iii) compute a distance metric using the filtered k-mer frequencies, and (iv) execute these steps fast without explicitly storing the frequency values. Figure 2 summarizes the method.Fig. 2.

Bottom Line: Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before.We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples.We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples.

View Article: PubMed Central - PubMed

Affiliation: Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Show MeSH
Related in: MedlinePlus