Limits...
Exploration and retrieval of whole-metagenome sequencing samples.

Seth S, Välimäki N, Kaski S, Honkela A - Bioinformatics (2014)

Bottom Line: Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before.We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples.We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples.

View Article: PubMed Central - PubMed

Affiliation: Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Show MeSH

Related in: MedlinePlus

Comparison of the best retrieval performance achieved with ‘optimized metric’ (middle), ‘average metric’ (right) and without entropy filtering (left), for proposed approach ‘All’, individual ks as well as FIGfam-based distance metric. The metrics are ‘optimized’/‘averaged’ over 101 equally spaced threshold values between 0 and 1. Each error bar line shows the MAP value along with the standard error. The grey horizontal line shows retrieval by chance: MAP computed over zero dissimilarity metric. An arrow (if present) over a method implies whether the performance of the corresponding method (top: ‘average metric’, bottom: ‘optimized metric’) is better (↑) or worse (↓) than when entropy filtering is employed: The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that filtering has a positive impact on the retrieval performance
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230234&req=5

btu340-F6: Comparison of the best retrieval performance achieved with ‘optimized metric’ (middle), ‘average metric’ (right) and without entropy filtering (left), for proposed approach ‘All’, individual ks as well as FIGfam-based distance metric. The metrics are ‘optimized’/‘averaged’ over 101 equally spaced threshold values between 0 and 1. Each error bar line shows the MAP value along with the standard error. The grey horizontal line shows retrieval by chance: MAP computed over zero dissimilarity metric. An arrow (if present) over a method implies whether the performance of the corresponding method (top: ‘average metric’, bottom: ‘optimized metric’) is better (↑) or worse (↓) than when entropy filtering is employed: The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that filtering has a positive impact on the retrieval performance

Mentions: Effect of the entropy filtering: Next, we evaluated the efficacy of filtering the informative k-mers against retrieval performance without the filtering operation. The results are presented in Figure 6 (and Supplementary Fig. S3). We observed that entropy filtering usually improved retrieval performance for all tested k-mer lengths when using the ‘optimized metric’, although the improvement might not always be statistically significant. Although ‘average metric’ often provides significant performance, it might not always improve over performance without filtering. Also, retrieval performance of FIGfam may or may not improve with entropy filtering (‘optimized metric’ and ‘average metric’ selected in the same way as other methods).Fig. 6.


Exploration and retrieval of whole-metagenome sequencing samples.

Seth S, Välimäki N, Kaski S, Honkela A - Bioinformatics (2014)

Comparison of the best retrieval performance achieved with ‘optimized metric’ (middle), ‘average metric’ (right) and without entropy filtering (left), for proposed approach ‘All’, individual ks as well as FIGfam-based distance metric. The metrics are ‘optimized’/‘averaged’ over 101 equally spaced threshold values between 0 and 1. Each error bar line shows the MAP value along with the standard error. The grey horizontal line shows retrieval by chance: MAP computed over zero dissimilarity metric. An arrow (if present) over a method implies whether the performance of the corresponding method (top: ‘average metric’, bottom: ‘optimized metric’) is better (↑) or worse (↓) than when entropy filtering is employed: The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that filtering has a positive impact on the retrieval performance
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230234&req=5

btu340-F6: Comparison of the best retrieval performance achieved with ‘optimized metric’ (middle), ‘average metric’ (right) and without entropy filtering (left), for proposed approach ‘All’, individual ks as well as FIGfam-based distance metric. The metrics are ‘optimized’/‘averaged’ over 101 equally spaced threshold values between 0 and 1. Each error bar line shows the MAP value along with the standard error. The grey horizontal line shows retrieval by chance: MAP computed over zero dissimilarity metric. An arrow (if present) over a method implies whether the performance of the corresponding method (top: ‘average metric’, bottom: ‘optimized metric’) is better (↑) or worse (↓) than when entropy filtering is employed: The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that filtering has a positive impact on the retrieval performance
Mentions: Effect of the entropy filtering: Next, we evaluated the efficacy of filtering the informative k-mers against retrieval performance without the filtering operation. The results are presented in Figure 6 (and Supplementary Fig. S3). We observed that entropy filtering usually improved retrieval performance for all tested k-mer lengths when using the ‘optimized metric’, although the improvement might not always be statistically significant. Although ‘average metric’ often provides significant performance, it might not always improve over performance without filtering. Also, retrieval performance of FIGfam may or may not improve with entropy filtering (‘optimized metric’ and ‘average metric’ selected in the same way as other methods).Fig. 6.

Bottom Line: Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before.We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples.We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples.

View Article: PubMed Central - PubMed

Affiliation: Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Show MeSH
Related in: MedlinePlus