Limits...
Soil bacterial diversity screening using single 16S rRNA gene V regions coupled with multi-million read generating sequencing technologies.

Vasileiadis S, Puglisi E, Arena M, Cappa F, Cocconcelli PS, Trevisan M - PLoS ONE (2012)

Bottom Line: However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment.Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions.Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

View Article: PubMed Central - PubMed

Affiliation: Università Cattolica del Sacro Cuore, Faculty of Agricultural Sciences, Institute of Agricultural and Environmental Chemistry, Piacenza, Italy.

ABSTRACT
The novel multi-million read generating sequencing technologies are very promising for resolving the immense soil 16S rRNA gene bacterial diversity. Yet they have a limited maximum sequence length screening ability, restricting studies in screening DNA stretches of single 16S rRNA gene hypervariable (V) regions. The aim of the present study was to assess the effects of properties of four consecutive V regions (V3-6) on commonly applied analytical methodologies in bacterial ecology studies. Using an in silico approach, the performance of each V region was compared with the complete 16S rRNA gene stretch. We assessed related properties of the soil derived bacterial sequence collection of the Ribosomal Database Project (RDP) database and concomitantly performed simulations based on published datasets. Results indicate that overall the most prominent V region for soil bacterial diversity studies was V3, even though it was outperformed in some of the tests. Despite its high performance during most tests, V4 was less conserved along flanking sites, thus reducing its ability for bacterial diversity coverage. V5 performed well in the non-redundant RDP database based analysis. However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment. Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions. Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

Show MeSH

Related in: MedlinePlus

Taxonomy classification depth comparisons among V region datasets and the FL variants.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3412817&req=5

pone-0042671-g005: Taxonomy classification depth comparisons among V region datasets and the FL variants.

Mentions: Classification depth testing indicated that all V region datasets showed a similar under-representation of existing sequences throughout all taxa per taxonomical level, with V6 performing worst of all (Fig. 5). Phylum level taxonomical classification differences between the full-length sequences and the V region trimmed variants were assessed considering obtained sequence numbers per phylum. Phyla were characterized as “highly populated”, “intermediate populated”, and “low populated” (or “rare”) according to the sequence numbers existing in each taxon as indicated in the footnote of Table 1 and the Materials and Methods section. Highly populated phyla of the dataset, were shown to be less affected by sequence trimming, than either the phyla encompassing 1000 or less sequences or the group containing the unclassified sequences (Table 1 and Fig. 6). Under-representation trends were observed for intermediate and low sequence numbers encompassing phyla, while over-representation by above 50% was observed for the unclassified sequences. V4 and V6 included classifications of highly populated phyla with a difference of greater than 5% in sequence content between the examined V region and the corresponding full-length variants. Main source of this reduced FL representation was the phylum of Acidobacteria. In intermediate populated phyla such differences existed for Planctomycetes, Chloroflexi, Gemmatimonadetes and Nitrospira that were under-represented for all V region datasets, while the TM7 was under-represented only for V3 and V5 and Verrucomicrobia along with Cyanobacteria were under-represented for V6. In rare phyla V3 and V5 had more bacterial phyla with differences smaller than 5% as compared to the full-length dataset, with Chlamydiae and Fusobacteria having smaller differences for all V region datasets. Analysis for faulty assignments of the highly and intermediate populated taxa showed that the overall effect of such events was low for all tested V regions (Figure S1). However, the unclassified group compositions demonstrated a reduced ability of all regions to contribute in identifying rare taxa and a lesser ability of V6 to identify all phylum “population” categories (particularly the highly populated Acidobacteria).


Soil bacterial diversity screening using single 16S rRNA gene V regions coupled with multi-million read generating sequencing technologies.

Vasileiadis S, Puglisi E, Arena M, Cappa F, Cocconcelli PS, Trevisan M - PLoS ONE (2012)

Taxonomy classification depth comparisons among V region datasets and the FL variants.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3412817&req=5

pone-0042671-g005: Taxonomy classification depth comparisons among V region datasets and the FL variants.
Mentions: Classification depth testing indicated that all V region datasets showed a similar under-representation of existing sequences throughout all taxa per taxonomical level, with V6 performing worst of all (Fig. 5). Phylum level taxonomical classification differences between the full-length sequences and the V region trimmed variants were assessed considering obtained sequence numbers per phylum. Phyla were characterized as “highly populated”, “intermediate populated”, and “low populated” (or “rare”) according to the sequence numbers existing in each taxon as indicated in the footnote of Table 1 and the Materials and Methods section. Highly populated phyla of the dataset, were shown to be less affected by sequence trimming, than either the phyla encompassing 1000 or less sequences or the group containing the unclassified sequences (Table 1 and Fig. 6). Under-representation trends were observed for intermediate and low sequence numbers encompassing phyla, while over-representation by above 50% was observed for the unclassified sequences. V4 and V6 included classifications of highly populated phyla with a difference of greater than 5% in sequence content between the examined V region and the corresponding full-length variants. Main source of this reduced FL representation was the phylum of Acidobacteria. In intermediate populated phyla such differences existed for Planctomycetes, Chloroflexi, Gemmatimonadetes and Nitrospira that were under-represented for all V region datasets, while the TM7 was under-represented only for V3 and V5 and Verrucomicrobia along with Cyanobacteria were under-represented for V6. In rare phyla V3 and V5 had more bacterial phyla with differences smaller than 5% as compared to the full-length dataset, with Chlamydiae and Fusobacteria having smaller differences for all V region datasets. Analysis for faulty assignments of the highly and intermediate populated taxa showed that the overall effect of such events was low for all tested V regions (Figure S1). However, the unclassified group compositions demonstrated a reduced ability of all regions to contribute in identifying rare taxa and a lesser ability of V6 to identify all phylum “population” categories (particularly the highly populated Acidobacteria).

Bottom Line: However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment.Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions.Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

View Article: PubMed Central - PubMed

Affiliation: Università Cattolica del Sacro Cuore, Faculty of Agricultural Sciences, Institute of Agricultural and Environmental Chemistry, Piacenza, Italy.

ABSTRACT
The novel multi-million read generating sequencing technologies are very promising for resolving the immense soil 16S rRNA gene bacterial diversity. Yet they have a limited maximum sequence length screening ability, restricting studies in screening DNA stretches of single 16S rRNA gene hypervariable (V) regions. The aim of the present study was to assess the effects of properties of four consecutive V regions (V3-6) on commonly applied analytical methodologies in bacterial ecology studies. Using an in silico approach, the performance of each V region was compared with the complete 16S rRNA gene stretch. We assessed related properties of the soil derived bacterial sequence collection of the Ribosomal Database Project (RDP) database and concomitantly performed simulations based on published datasets. Results indicate that overall the most prominent V region for soil bacterial diversity studies was V3, even though it was outperformed in some of the tests. Despite its high performance during most tests, V4 was less conserved along flanking sites, thus reducing its ability for bacterial diversity coverage. V5 performed well in the non-redundant RDP database based analysis. However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment. Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions. Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

Show MeSH
Related in: MedlinePlus