Limits...
Metagenomic profiling of known and unknown microbes with microbeGPS.

Lindner MS, Renard BY - PLoS ONE (2015)

Bottom Line: Since only a small fraction of environmental genomes is represented in genomic databases, these approaches entail the risk of false identifications and often suggest a higher precision than justified by the data.Therefore, we developed MicrobeGPS, a novel metagenomic profiling approach that overcomes these limitations.Our results indicate that MicrobeGPS can enable reference based taxonomic profiling of complex and less characterized microbial communities.

View Article: PubMed Central - PubMed

Affiliation: Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany.

ABSTRACT
Microbial community profiling identifies and quantifies organisms in metagenomic sequencing data using either reference based or unsupervised approaches. However, current reference based profiling methods only report the presence and abundance of single reference genomes that are available in databases. Since only a small fraction of environmental genomes is represented in genomic databases, these approaches entail the risk of false identifications and often suggest a higher precision than justified by the data. Therefore, we developed MicrobeGPS, a novel metagenomic profiling approach that overcomes these limitations. MicrobeGPS is the first method that identifies microbiota in the sample and estimates their genomic distances to known reference genomes. With this strategy, MicrobeGPS identifies organisms down to the strain level and highlights possibly inaccurate identifications when the correct reference genome is missing. We demonstrate on three metagenomic datasets with different origin that our approach successfully avoids misleading interpretation of results and additionally provides more accurate results than current profiling methods. Our results indicate that MicrobeGPS can enable reference based taxonomic profiling of complex and less characterized microbial communities. MicrobeGPS is open source and available from https://sourceforge.net/projects/microbegps/ as source code and binary distribution for Windows and Linux operating systems.

Show MeSH

Related in: MedlinePlus

MicrobeGPS workflow.As a first step, MicrobeGPS reads and analyzes the SAM files of the metagenomic reads mapped to a set of reference genomes. Early filtering of the reads helps reducing the amount of data by discarding reads that are not meaningful for MicrobeGPS. Then, MicrobeGPS estimates the Genome Dataset Validity score and the local sequencing depth of each reference genome and uses this information to extract core reads (CR), reads that are presumably unique for a particular organism in the sample. Based on the CR and the shared reads, MicrobeGPS clusters the reference genomes into groups, where each group represents a single biological candidate organism in the sample.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4314203&req=5

pone.0117711.g001: MicrobeGPS workflow.As a first step, MicrobeGPS reads and analyzes the SAM files of the metagenomic reads mapped to a set of reference genomes. Early filtering of the reads helps reducing the amount of data by discarding reads that are not meaningful for MicrobeGPS. Then, MicrobeGPS estimates the Genome Dataset Validity score and the local sequencing depth of each reference genome and uses this information to extract core reads (CR), reads that are presumably unique for a particular organism in the sample. Based on the CR and the shared reads, MicrobeGPS clusters the reference genomes into groups, where each group represents a single biological candidate organism in the sample.

Mentions: The MicrobeGPS method is outlined in Fig. 1 and is described in this section. The fundamental idea behind MicrobeGPS is that we use already characterized reference genomes to describe the identity of the (possibly unknown) organisms in a metagenomic dataset (see Fig. 2a). This approach contrasts current methods that typically seek to directly identify and quantify reference genomes (species/strains) or higher taxa in the dataset. In analogy to the global positioning system GPS, MicrobeGPS estimates the taxonomic position of an organism in the sample by measuring the genomic distance between the organism and the closest related reference genomes. In contrast to taxonomic binning approaches (such as Pathoscope or Megan [13]), MicrobeGPS does not assign each read to a single taxon, but operates on the whole dataset to identify the taxa present in the sample.


Metagenomic profiling of known and unknown microbes with microbeGPS.

Lindner MS, Renard BY - PLoS ONE (2015)

MicrobeGPS workflow.As a first step, MicrobeGPS reads and analyzes the SAM files of the metagenomic reads mapped to a set of reference genomes. Early filtering of the reads helps reducing the amount of data by discarding reads that are not meaningful for MicrobeGPS. Then, MicrobeGPS estimates the Genome Dataset Validity score and the local sequencing depth of each reference genome and uses this information to extract core reads (CR), reads that are presumably unique for a particular organism in the sample. Based on the CR and the shared reads, MicrobeGPS clusters the reference genomes into groups, where each group represents a single biological candidate organism in the sample.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4314203&req=5

pone.0117711.g001: MicrobeGPS workflow.As a first step, MicrobeGPS reads and analyzes the SAM files of the metagenomic reads mapped to a set of reference genomes. Early filtering of the reads helps reducing the amount of data by discarding reads that are not meaningful for MicrobeGPS. Then, MicrobeGPS estimates the Genome Dataset Validity score and the local sequencing depth of each reference genome and uses this information to extract core reads (CR), reads that are presumably unique for a particular organism in the sample. Based on the CR and the shared reads, MicrobeGPS clusters the reference genomes into groups, where each group represents a single biological candidate organism in the sample.
Mentions: The MicrobeGPS method is outlined in Fig. 1 and is described in this section. The fundamental idea behind MicrobeGPS is that we use already characterized reference genomes to describe the identity of the (possibly unknown) organisms in a metagenomic dataset (see Fig. 2a). This approach contrasts current methods that typically seek to directly identify and quantify reference genomes (species/strains) or higher taxa in the dataset. In analogy to the global positioning system GPS, MicrobeGPS estimates the taxonomic position of an organism in the sample by measuring the genomic distance between the organism and the closest related reference genomes. In contrast to taxonomic binning approaches (such as Pathoscope or Megan [13]), MicrobeGPS does not assign each read to a single taxon, but operates on the whole dataset to identify the taxa present in the sample.

Bottom Line: Since only a small fraction of environmental genomes is represented in genomic databases, these approaches entail the risk of false identifications and often suggest a higher precision than justified by the data.Therefore, we developed MicrobeGPS, a novel metagenomic profiling approach that overcomes these limitations.Our results indicate that MicrobeGPS can enable reference based taxonomic profiling of complex and less characterized microbial communities.

View Article: PubMed Central - PubMed

Affiliation: Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany.

ABSTRACT
Microbial community profiling identifies and quantifies organisms in metagenomic sequencing data using either reference based or unsupervised approaches. However, current reference based profiling methods only report the presence and abundance of single reference genomes that are available in databases. Since only a small fraction of environmental genomes is represented in genomic databases, these approaches entail the risk of false identifications and often suggest a higher precision than justified by the data. Therefore, we developed MicrobeGPS, a novel metagenomic profiling approach that overcomes these limitations. MicrobeGPS is the first method that identifies microbiota in the sample and estimates their genomic distances to known reference genomes. With this strategy, MicrobeGPS identifies organisms down to the strain level and highlights possibly inaccurate identifications when the correct reference genome is missing. We demonstrate on three metagenomic datasets with different origin that our approach successfully avoids misleading interpretation of results and additionally provides more accurate results than current profiling methods. Our results indicate that MicrobeGPS can enable reference based taxonomic profiling of complex and less characterized microbial communities. MicrobeGPS is open source and available from https://sourceforge.net/projects/microbegps/ as source code and binary distribution for Windows and Linux operating systems.

Show MeSH
Related in: MedlinePlus