Limits...
Exploration and retrieval of whole-metagenome sequencing samples.

Seth S, Välimäki N, Kaski S, Honkela A - Bioinformatics (2014)

Bottom Line: Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before.We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples.We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples.

View Article: PubMed Central - PubMed

Affiliation: Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Show MeSH

Related in: MedlinePlus

Comparison of best performances for different k-mer lengths. The figures show the performance over queries by all positive samples as a violin plot. All methods use the ‘optimized metric’ chosen over 101 equally spaced threshold values between 0 and 1: the box denotes the MAP value. The horizontal lines show retrieval by chance: AveP computed over zero dissimilarity metric. Straight line is the mean, and dotted lines are 5 and 95% quantiles, respectively, when number of relevant samples differ for different queries. An arrow (if present) over a method implies whether the corresponding method performs significantly better (↑) or worse (↓) than ‘All’: The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that the considering all k-mers usually perform equally well with respect to considering a single k
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230234&req=5

btu340-F5: Comparison of best performances for different k-mer lengths. The figures show the performance over queries by all positive samples as a violin plot. All methods use the ‘optimized metric’ chosen over 101 equally spaced threshold values between 0 and 1: the box denotes the MAP value. The horizontal lines show retrieval by chance: AveP computed over zero dissimilarity metric. Straight line is the mean, and dotted lines are 5 and 95% quantiles, respectively, when number of relevant samples differ for different queries. An arrow (if present) over a method implies whether the corresponding method performs significantly better (↑) or worse (↓) than ‘All’: The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that the considering all k-mers usually perform equally well with respect to considering a single k

Mentions: Effect of using specific or unspecific k-mer length: We next compared the proposed approach of using all k-mers to using a specific k. The retrieval performance using ‘optimized metric’ is shown in Figure 5 (and Supplementary Fig. S2). The figures show the complete distribution of average precision values over different queries whose mean is the MAP of Figure 4. The performance of the proposed method is usually better than with any individual k. Thus, the proposed method appears to be a relatively safe choice that does not suffer from catastrophically bad performance on any of the datasets.Fig. 5.


Exploration and retrieval of whole-metagenome sequencing samples.

Seth S, Välimäki N, Kaski S, Honkela A - Bioinformatics (2014)

Comparison of best performances for different k-mer lengths. The figures show the performance over queries by all positive samples as a violin plot. All methods use the ‘optimized metric’ chosen over 101 equally spaced threshold values between 0 and 1: the box denotes the MAP value. The horizontal lines show retrieval by chance: AveP computed over zero dissimilarity metric. Straight line is the mean, and dotted lines are 5 and 95% quantiles, respectively, when number of relevant samples differ for different queries. An arrow (if present) over a method implies whether the corresponding method performs significantly better (↑) or worse (↓) than ‘All’: The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that the considering all k-mers usually perform equally well with respect to considering a single k
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230234&req=5

btu340-F5: Comparison of best performances for different k-mer lengths. The figures show the performance over queries by all positive samples as a violin plot. All methods use the ‘optimized metric’ chosen over 101 equally spaced threshold values between 0 and 1: the box denotes the MAP value. The horizontal lines show retrieval by chance: AveP computed over zero dissimilarity metric. Straight line is the mean, and dotted lines are 5 and 95% quantiles, respectively, when number of relevant samples differ for different queries. An arrow (if present) over a method implies whether the corresponding method performs significantly better (↑) or worse (↓) than ‘All’: The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that the considering all k-mers usually perform equally well with respect to considering a single k
Mentions: Effect of using specific or unspecific k-mer length: We next compared the proposed approach of using all k-mers to using a specific k. The retrieval performance using ‘optimized metric’ is shown in Figure 5 (and Supplementary Fig. S2). The figures show the complete distribution of average precision values over different queries whose mean is the MAP of Figure 4. The performance of the proposed method is usually better than with any individual k. Thus, the proposed method appears to be a relatively safe choice that does not suffer from catastrophically bad performance on any of the datasets.Fig. 5.

Bottom Line: Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before.We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples.We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples.

View Article: PubMed Central - PubMed

Affiliation: Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Show MeSH
Related in: MedlinePlus