Limits...
Exploration and retrieval of whole-metagenome sequencing samples.

Seth S, Välimäki N, Kaski S, Honkela A - Bioinformatics (2014)

Bottom Line: Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before.We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples.We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples.

View Article: PubMed Central - PubMed

Affiliation: Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Show MeSH

Related in: MedlinePlus

Comparison of the best retrieval performance for different distance metrics using all k-mers. They show a violin plot of the average performances over queries by all positive samples in the datasets. The ‘optimized metrics’ have been selected over 101 equally spaced threshold values between 0 and 1: the box denotes the MAP value. The horizontal lines show retrieval by chance: AveP computed over zero dissimilarity metric. Straight line is the mean, and dotted lines are 5 and 95% quantiles, respectively, when number of relevant samples differ for different queries. An arrow (if present) over a method implies whether the corresponding method performs significantly better (↑) or worse (↓) than the other methods (denoted by their colors): The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that different distance metrics usually demonstrate similar performance
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230234&req=5

btu340-F7: Comparison of the best retrieval performance for different distance metrics using all k-mers. They show a violin plot of the average performances over queries by all positive samples in the datasets. The ‘optimized metrics’ have been selected over 101 equally spaced threshold values between 0 and 1: the box denotes the MAP value. The horizontal lines show retrieval by chance: AveP computed over zero dissimilarity metric. Straight line is the mean, and dotted lines are 5 and 95% quantiles, respectively, when number of relevant samples differ for different queries. An arrow (if present) over a method implies whether the corresponding method performs significantly better (↑) or worse (↓) than the other methods (denoted by their colors): The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that different distance metrics usually demonstrate similar performance

Mentions: Comparison across different metrics: Finally, we evaluated the retrieval performance over different dissimilarity metrics. We presented the performance using ‘optimized metric’ for different metrics in Figure 7 (and Supplementary Fig. S4). We observed that the simple presence-/absence-based metric Dcount performed at least as well as abundance-sensitive log and sqrt metrics, except for the MetaHIT data for which the other metrics performed better.Fig. 7.


Exploration and retrieval of whole-metagenome sequencing samples.

Seth S, Välimäki N, Kaski S, Honkela A - Bioinformatics (2014)

Comparison of the best retrieval performance for different distance metrics using all k-mers. They show a violin plot of the average performances over queries by all positive samples in the datasets. The ‘optimized metrics’ have been selected over 101 equally spaced threshold values between 0 and 1: the box denotes the MAP value. The horizontal lines show retrieval by chance: AveP computed over zero dissimilarity metric. Straight line is the mean, and dotted lines are 5 and 95% quantiles, respectively, when number of relevant samples differ for different queries. An arrow (if present) over a method implies whether the corresponding method performs significantly better (↑) or worse (↓) than the other methods (denoted by their colors): The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that different distance metrics usually demonstrate similar performance
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230234&req=5

btu340-F7: Comparison of the best retrieval performance for different distance metrics using all k-mers. They show a violin plot of the average performances over queries by all positive samples in the datasets. The ‘optimized metrics’ have been selected over 101 equally spaced threshold values between 0 and 1: the box denotes the MAP value. The horizontal lines show retrieval by chance: AveP computed over zero dissimilarity metric. Straight line is the mean, and dotted lines are 5 and 95% quantiles, respectively, when number of relevant samples differ for different queries. An arrow (if present) over a method implies whether the corresponding method performs significantly better (↑) or worse (↓) than the other methods (denoted by their colors): The stars denote significance level: 0 <*** < 0.001 <** < 0.01 <* < 0.05. We observe that different distance metrics usually demonstrate similar performance
Mentions: Comparison across different metrics: Finally, we evaluated the retrieval performance over different dissimilarity metrics. We presented the performance using ‘optimized metric’ for different metrics in Figure 7 (and Supplementary Fig. S4). We observed that the simple presence-/absence-based metric Dcount performed at least as well as abundance-sensitive log and sqrt metrics, except for the MetaHIT data for which the other metrics performed better.Fig. 7.

Bottom Line: Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before.We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples.We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples.

View Article: PubMed Central - PubMed

Affiliation: Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Show MeSH
Related in: MedlinePlus