Limits...
Bacterial genomes: habitat specificity and uncharted organisms.

Dini-Andreote F, Andreote FD, Araújo WL, Trevors JT, van Elsas JD - Microb. Ecol. (2012)

Bottom Line: The capability and speed in generating genomic data have increased profoundly since the release of the draft human genome in 2000.Here, we propose that scientists should be concerned with attaining an improved equal representation of most of the bacterial tree of life organisms, at the genomic level.Not only will such efforts contribute to our overall understanding of the microbial diversity extant in ecosystems as well as the structuring of the extant genomes, but they will also facilitate the development of better methods for (meta)genome annotation.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil.

ABSTRACT
The capability and speed in generating genomic data have increased profoundly since the release of the draft human genome in 2000. Additionally, sequencing costs have continued to plummet as the next generation of highly efficient sequencing technologies (next-generation sequencing) became available and commercial facilities promote market competition. However, new challenges have emerged as researchers attempt to efficiently process the massive amounts of sequence data being generated. First, the described genome sequences are unequally distributed among the branches of bacterial life and, second, bacterial pan-genomes are often not considered when setting aims for sequencing projects. Here, we propose that scientists should be concerned with attaining an improved equal representation of most of the bacterial tree of life organisms, at the genomic level. Moreover, they should take into account the natural variation that is often observed within bacterial species and the role of the often changing surrounding environment and natural selection pressures, which is central to bacterial speciation and genome evolution. Not only will such efforts contribute to our overall understanding of the microbial diversity extant in ecosystems as well as the structuring of the extant genomes, but they will also facilitate the development of better methods for (meta)genome annotation.

Show MeSH
Phylogenetic distribution of microbial genome projects at the phylum level. Data were extracted from the Genomes OnLine Database (GOLD) [4] in September 2011. The phylogenetic distribution was constructed using Silva Ref SSU database release 104 (http://www.arb-silva.de). (*) encompass ‘incomplete’, ‘permanent draft’ and ‘target’ status at GOLD
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3375415&req=5

Fig2: Phylogenetic distribution of microbial genome projects at the phylum level. Data were extracted from the Genomes OnLine Database (GOLD) [4] in September 2011. The phylogenetic distribution was constructed using Silva Ref SSU database release 104 (http://www.arb-silva.de). (*) encompass ‘incomplete’, ‘permanent draft’ and ‘target’ status at GOLD

Mentions: A key issue of great current relevance is the metagenomics approach to ecosystem analyses [6, 23]. This approach has been expanding since 1999, mostly as a result of the power of NGS. While the generation of massive numbers of sequences from extant microbial communities appears promising to achieve a complete overview of the genetic profile in distinct environments [23], the analysis of these sequences and the proper assignment of DNA tags to their original owners in nature has emerged as a major challenge for bioinformatics (called the “computational bubble”). Our ability to properly correlate environmental genomic data to currently charted bacteria is strongly hindered by the lack of whole-genome sequences for many of the microorganisms dispersed along the phylogenetic tree of life (Fig. 2). Here, we posit that a major cause of this problem is that the basis of the current data set is in the subset of culturable Bacteria and Archaea. This, as stated by Gilbert et al. [7], is the underlying cause of our current inability to robustly annotate the major part of the genes found in environmental metagenomics data. Only up to 4% of the sequences were thus found to be identifiable to species [7]. The pool of hitherto-cultured microorganisms indeed vastly underrepresents the true scope of the microbial diversity found in most natural ecosystems. And, on top of this, we lack information on the within-species diversity (defining the pan-genome) across both the poorly accessed as well as most of the well-known organisms. This lack of representativeness can, for instance, be observed by comparing the number of 16S ribosomal RNA gene tags from each bacterial and archaeal phylum in the Ribosomal Database Project (RDP) database (mostly obtained from environmental samples) to the number of complete and ongoing genome sequencing projects per phylum (Fig. 3). To date, no large-scope sequencing project has been filed that aims to comprehensively cover the genomes of as-yet unculturable uncharted microorganisms. Not to speak of the members of the still underexplored rare biosphere, which might fall into the previous class, but might also have been missed by their sheer rarity [15, 19].Figure 2


Bacterial genomes: habitat specificity and uncharted organisms.

Dini-Andreote F, Andreote FD, Araújo WL, Trevors JT, van Elsas JD - Microb. Ecol. (2012)

Phylogenetic distribution of microbial genome projects at the phylum level. Data were extracted from the Genomes OnLine Database (GOLD) [4] in September 2011. The phylogenetic distribution was constructed using Silva Ref SSU database release 104 (http://www.arb-silva.de). (*) encompass ‘incomplete’, ‘permanent draft’ and ‘target’ status at GOLD
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3375415&req=5

Fig2: Phylogenetic distribution of microbial genome projects at the phylum level. Data were extracted from the Genomes OnLine Database (GOLD) [4] in September 2011. The phylogenetic distribution was constructed using Silva Ref SSU database release 104 (http://www.arb-silva.de). (*) encompass ‘incomplete’, ‘permanent draft’ and ‘target’ status at GOLD
Mentions: A key issue of great current relevance is the metagenomics approach to ecosystem analyses [6, 23]. This approach has been expanding since 1999, mostly as a result of the power of NGS. While the generation of massive numbers of sequences from extant microbial communities appears promising to achieve a complete overview of the genetic profile in distinct environments [23], the analysis of these sequences and the proper assignment of DNA tags to their original owners in nature has emerged as a major challenge for bioinformatics (called the “computational bubble”). Our ability to properly correlate environmental genomic data to currently charted bacteria is strongly hindered by the lack of whole-genome sequences for many of the microorganisms dispersed along the phylogenetic tree of life (Fig. 2). Here, we posit that a major cause of this problem is that the basis of the current data set is in the subset of culturable Bacteria and Archaea. This, as stated by Gilbert et al. [7], is the underlying cause of our current inability to robustly annotate the major part of the genes found in environmental metagenomics data. Only up to 4% of the sequences were thus found to be identifiable to species [7]. The pool of hitherto-cultured microorganisms indeed vastly underrepresents the true scope of the microbial diversity found in most natural ecosystems. And, on top of this, we lack information on the within-species diversity (defining the pan-genome) across both the poorly accessed as well as most of the well-known organisms. This lack of representativeness can, for instance, be observed by comparing the number of 16S ribosomal RNA gene tags from each bacterial and archaeal phylum in the Ribosomal Database Project (RDP) database (mostly obtained from environmental samples) to the number of complete and ongoing genome sequencing projects per phylum (Fig. 3). To date, no large-scope sequencing project has been filed that aims to comprehensively cover the genomes of as-yet unculturable uncharted microorganisms. Not to speak of the members of the still underexplored rare biosphere, which might fall into the previous class, but might also have been missed by their sheer rarity [15, 19].Figure 2

Bottom Line: The capability and speed in generating genomic data have increased profoundly since the release of the draft human genome in 2000.Here, we propose that scientists should be concerned with attaining an improved equal representation of most of the bacterial tree of life organisms, at the genomic level.Not only will such efforts contribute to our overall understanding of the microbial diversity extant in ecosystems as well as the structuring of the extant genomes, but they will also facilitate the development of better methods for (meta)genome annotation.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil.

ABSTRACT
The capability and speed in generating genomic data have increased profoundly since the release of the draft human genome in 2000. Additionally, sequencing costs have continued to plummet as the next generation of highly efficient sequencing technologies (next-generation sequencing) became available and commercial facilities promote market competition. However, new challenges have emerged as researchers attempt to efficiently process the massive amounts of sequence data being generated. First, the described genome sequences are unequally distributed among the branches of bacterial life and, second, bacterial pan-genomes are often not considered when setting aims for sequencing projects. Here, we propose that scientists should be concerned with attaining an improved equal representation of most of the bacterial tree of life organisms, at the genomic level. Moreover, they should take into account the natural variation that is often observed within bacterial species and the role of the often changing surrounding environment and natural selection pressures, which is central to bacterial speciation and genome evolution. Not only will such efforts contribute to our overall understanding of the microbial diversity extant in ecosystems as well as the structuring of the extant genomes, but they will also facilitate the development of better methods for (meta)genome annotation.

Show MeSH