Limits...
Soil bacterial diversity screening using single 16S rRNA gene V regions coupled with multi-million read generating sequencing technologies.

Vasileiadis S, Puglisi E, Arena M, Cappa F, Cocconcelli PS, Trevisan M - PLoS ONE (2012)

Bottom Line: However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment.Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions.Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

View Article: PubMed Central - PubMed

Affiliation: Università Cattolica del Sacro Cuore, Faculty of Agricultural Sciences, Institute of Agricultural and Environmental Chemistry, Piacenza, Italy.

ABSTRACT
The novel multi-million read generating sequencing technologies are very promising for resolving the immense soil 16S rRNA gene bacterial diversity. Yet they have a limited maximum sequence length screening ability, restricting studies in screening DNA stretches of single 16S rRNA gene hypervariable (V) regions. The aim of the present study was to assess the effects of properties of four consecutive V regions (V3-6) on commonly applied analytical methodologies in bacterial ecology studies. Using an in silico approach, the performance of each V region was compared with the complete 16S rRNA gene stretch. We assessed related properties of the soil derived bacterial sequence collection of the Ribosomal Database Project (RDP) database and concomitantly performed simulations based on published datasets. Results indicate that overall the most prominent V region for soil bacterial diversity studies was V3, even though it was outperformed in some of the tests. Despite its high performance during most tests, V4 was less conserved along flanking sites, thus reducing its ability for bacterial diversity coverage. V5 performed well in the non-redundant RDP database based analysis. However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment. Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions. Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

Show MeSH

Related in: MedlinePlus

Taxonomy, OTU (3% sequence distance) analysis and Unifrac results of the performed simulation.A) PCA results of matrix generated by sample distances based on classified sequence relative abundance (left) and presence absence (right) for the V regions and FL datasets. B) Similarly to A for OTU relative abundance (left) and presence absence (right). C) PCA results for matrices generated using the weighted (left - phylotype relative abundance based) and unweighted (right - phylotype occurrence based) Unifrac analysis result distances between samples for the V regions and FL datasets.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3412817&req=5

pone-0042671-g007: Taxonomy, OTU (3% sequence distance) analysis and Unifrac results of the performed simulation.A) PCA results of matrix generated by sample distances based on classified sequence relative abundance (left) and presence absence (right) for the V regions and FL datasets. B) Similarly to A for OTU relative abundance (left) and presence absence (right). C) PCA results for matrices generated using the weighted (left - phylotype relative abundance based) and unweighted (right - phylotype occurrence based) Unifrac analysis result distances between samples for the V regions and FL datasets.

Mentions: Dataset topologies based on sample distances showed an overall better approximation of the FL dataset by the longer stretch V region datasets, V3 and V4 (Fig. 7). V3 showed the best clustering ability with the FL for both relative abundance and presence-absence taxonomical classification matrices, while V4 only coincided close to FL for the relative abundance matrices (Fig. 7 A). V3 and V4 also performed better than V5 and V6 in the OTU approach for both relative abundance and presence-absence matrices of OTUs (Fig. 7 B). Sample distances according to weighted and unweighted Unifrac results indicated that when relative abundance of reads is estimated V4 and V5 resided closer to the FL dataset (Fig. 7 C left). However, in the case that only sequence occurrence per sample was considered, sample distances for V4 and V3 more closely resembled the FL sample distances, but they did not reside as closely as in the previously mentioned approaches (Fig. 7 C right). Overall, V5 and V6 datasets had a poor performance with V5 being slightly closer to the FL than V6 according on the horizontal axes, where most of the variance is explained.


Soil bacterial diversity screening using single 16S rRNA gene V regions coupled with multi-million read generating sequencing technologies.

Vasileiadis S, Puglisi E, Arena M, Cappa F, Cocconcelli PS, Trevisan M - PLoS ONE (2012)

Taxonomy, OTU (3% sequence distance) analysis and Unifrac results of the performed simulation.A) PCA results of matrix generated by sample distances based on classified sequence relative abundance (left) and presence absence (right) for the V regions and FL datasets. B) Similarly to A for OTU relative abundance (left) and presence absence (right). C) PCA results for matrices generated using the weighted (left - phylotype relative abundance based) and unweighted (right - phylotype occurrence based) Unifrac analysis result distances between samples for the V regions and FL datasets.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3412817&req=5

pone-0042671-g007: Taxonomy, OTU (3% sequence distance) analysis and Unifrac results of the performed simulation.A) PCA results of matrix generated by sample distances based on classified sequence relative abundance (left) and presence absence (right) for the V regions and FL datasets. B) Similarly to A for OTU relative abundance (left) and presence absence (right). C) PCA results for matrices generated using the weighted (left - phylotype relative abundance based) and unweighted (right - phylotype occurrence based) Unifrac analysis result distances between samples for the V regions and FL datasets.
Mentions: Dataset topologies based on sample distances showed an overall better approximation of the FL dataset by the longer stretch V region datasets, V3 and V4 (Fig. 7). V3 showed the best clustering ability with the FL for both relative abundance and presence-absence taxonomical classification matrices, while V4 only coincided close to FL for the relative abundance matrices (Fig. 7 A). V3 and V4 also performed better than V5 and V6 in the OTU approach for both relative abundance and presence-absence matrices of OTUs (Fig. 7 B). Sample distances according to weighted and unweighted Unifrac results indicated that when relative abundance of reads is estimated V4 and V5 resided closer to the FL dataset (Fig. 7 C left). However, in the case that only sequence occurrence per sample was considered, sample distances for V4 and V3 more closely resembled the FL sample distances, but they did not reside as closely as in the previously mentioned approaches (Fig. 7 C right). Overall, V5 and V6 datasets had a poor performance with V5 being slightly closer to the FL than V6 according on the horizontal axes, where most of the variance is explained.

Bottom Line: However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment.Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions.Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

View Article: PubMed Central - PubMed

Affiliation: Università Cattolica del Sacro Cuore, Faculty of Agricultural Sciences, Institute of Agricultural and Environmental Chemistry, Piacenza, Italy.

ABSTRACT
The novel multi-million read generating sequencing technologies are very promising for resolving the immense soil 16S rRNA gene bacterial diversity. Yet they have a limited maximum sequence length screening ability, restricting studies in screening DNA stretches of single 16S rRNA gene hypervariable (V) regions. The aim of the present study was to assess the effects of properties of four consecutive V regions (V3-6) on commonly applied analytical methodologies in bacterial ecology studies. Using an in silico approach, the performance of each V region was compared with the complete 16S rRNA gene stretch. We assessed related properties of the soil derived bacterial sequence collection of the Ribosomal Database Project (RDP) database and concomitantly performed simulations based on published datasets. Results indicate that overall the most prominent V region for soil bacterial diversity studies was V3, even though it was outperformed in some of the tests. Despite its high performance during most tests, V4 was less conserved along flanking sites, thus reducing its ability for bacterial diversity coverage. V5 performed well in the non-redundant RDP database based analysis. However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment. Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions. Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

Show MeSH
Related in: MedlinePlus