Limits...
Soil bacterial diversity screening using single 16S rRNA gene V regions coupled with multi-million read generating sequencing technologies.

Vasileiadis S, Puglisi E, Arena M, Cappa F, Cocconcelli PS, Trevisan M - PLoS ONE (2012)

Bottom Line: However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment.Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions.Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

View Article: PubMed Central - PubMed

Affiliation: Università Cattolica del Sacro Cuore, Faculty of Agricultural Sciences, Institute of Agricultural and Environmental Chemistry, Piacenza, Italy.

ABSTRACT
The novel multi-million read generating sequencing technologies are very promising for resolving the immense soil 16S rRNA gene bacterial diversity. Yet they have a limited maximum sequence length screening ability, restricting studies in screening DNA stretches of single 16S rRNA gene hypervariable (V) regions. The aim of the present study was to assess the effects of properties of four consecutive V regions (V3-6) on commonly applied analytical methodologies in bacterial ecology studies. Using an in silico approach, the performance of each V region was compared with the complete 16S rRNA gene stretch. We assessed related properties of the soil derived bacterial sequence collection of the Ribosomal Database Project (RDP) database and concomitantly performed simulations based on published datasets. Results indicate that overall the most prominent V region for soil bacterial diversity studies was V3, even though it was outperformed in some of the tests. Despite its high performance during most tests, V4 was less conserved along flanking sites, thus reducing its ability for bacterial diversity coverage. V5 performed well in the non-redundant RDP database based analysis. However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment. Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions. Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

Show MeSH

Related in: MedlinePlus

Pearson correlation tests between corresponding sequence distances of examined V regions and FL variants.All tests were significant (P<001). Test correlation index (r) values and linear models (presented with solid lines) used to describe overall trends are provided above and below each plot. Local relationships between corresponding sequence distances of the FL and other datasets are expressed with the non-parametric LOWESS (locally weighted regression and smoothing scatterplots) regression analysis plotting (dot-dashed lines), while the ideal y = x correlation is also plotted (dashed lines).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3412817&req=5

pone-0042671-g004: Pearson correlation tests between corresponding sequence distances of examined V regions and FL variants.All tests were significant (P<001). Test correlation index (r) values and linear models (presented with solid lines) used to describe overall trends are provided above and below each plot. Local relationships between corresponding sequence distances of the FL and other datasets are expressed with the non-parametric LOWESS (locally weighted regression and smoothing scatterplots) regression analysis plotting (dot-dashed lines), while the ideal y = x correlation is also plotted (dashed lines).

Mentions: Effects of sequence length and V region variability patterns on obtained sequence distances were assessed by comparing distances of trimmed V region sequences with their full length variants (Fig. 4). Correlation tests showed V region datasets to perform in the following descending order: V4, V5, V6, V3. Overall trends were further assessed by linear model applications. Out of the four V regions, slopes closer to 1 were observed for V4 (R2 = 0.88) and V5 (R2 = 0.82). V3 and V6 slopes had values lower than one and applied linear models did not describe the data-points well. Linear model formulas indicate an over-estimation trend for V3 distances and a corresponding under-estimation for V5 and V6 for distances between 0 and 10%. The non-parametric locally weighted regression model analyses (LOWESS) showed consistency of the linear regression with local trends for sequence distances of the referred range. V4-FL comparison demonstrated an averaging distance consistency to up to sequence distances of 0.2.


Soil bacterial diversity screening using single 16S rRNA gene V regions coupled with multi-million read generating sequencing technologies.

Vasileiadis S, Puglisi E, Arena M, Cappa F, Cocconcelli PS, Trevisan M - PLoS ONE (2012)

Pearson correlation tests between corresponding sequence distances of examined V regions and FL variants.All tests were significant (P<001). Test correlation index (r) values and linear models (presented with solid lines) used to describe overall trends are provided above and below each plot. Local relationships between corresponding sequence distances of the FL and other datasets are expressed with the non-parametric LOWESS (locally weighted regression and smoothing scatterplots) regression analysis plotting (dot-dashed lines), while the ideal y = x correlation is also plotted (dashed lines).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3412817&req=5

pone-0042671-g004: Pearson correlation tests between corresponding sequence distances of examined V regions and FL variants.All tests were significant (P<001). Test correlation index (r) values and linear models (presented with solid lines) used to describe overall trends are provided above and below each plot. Local relationships between corresponding sequence distances of the FL and other datasets are expressed with the non-parametric LOWESS (locally weighted regression and smoothing scatterplots) regression analysis plotting (dot-dashed lines), while the ideal y = x correlation is also plotted (dashed lines).
Mentions: Effects of sequence length and V region variability patterns on obtained sequence distances were assessed by comparing distances of trimmed V region sequences with their full length variants (Fig. 4). Correlation tests showed V region datasets to perform in the following descending order: V4, V5, V6, V3. Overall trends were further assessed by linear model applications. Out of the four V regions, slopes closer to 1 were observed for V4 (R2 = 0.88) and V5 (R2 = 0.82). V3 and V6 slopes had values lower than one and applied linear models did not describe the data-points well. Linear model formulas indicate an over-estimation trend for V3 distances and a corresponding under-estimation for V5 and V6 for distances between 0 and 10%. The non-parametric locally weighted regression model analyses (LOWESS) showed consistency of the linear regression with local trends for sequence distances of the referred range. V4-FL comparison demonstrated an averaging distance consistency to up to sequence distances of 0.2.

Bottom Line: However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment.Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions.Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

View Article: PubMed Central - PubMed

Affiliation: Università Cattolica del Sacro Cuore, Faculty of Agricultural Sciences, Institute of Agricultural and Environmental Chemistry, Piacenza, Italy.

ABSTRACT
The novel multi-million read generating sequencing technologies are very promising for resolving the immense soil 16S rRNA gene bacterial diversity. Yet they have a limited maximum sequence length screening ability, restricting studies in screening DNA stretches of single 16S rRNA gene hypervariable (V) regions. The aim of the present study was to assess the effects of properties of four consecutive V regions (V3-6) on commonly applied analytical methodologies in bacterial ecology studies. Using an in silico approach, the performance of each V region was compared with the complete 16S rRNA gene stretch. We assessed related properties of the soil derived bacterial sequence collection of the Ribosomal Database Project (RDP) database and concomitantly performed simulations based on published datasets. Results indicate that overall the most prominent V region for soil bacterial diversity studies was V3, even though it was outperformed in some of the tests. Despite its high performance during most tests, V4 was less conserved along flanking sites, thus reducing its ability for bacterial diversity coverage. V5 performed well in the non-redundant RDP database based analysis. However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment. Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions. Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

Show MeSH
Related in: MedlinePlus