Limits...
Soil bacterial diversity screening using single 16S rRNA gene V regions coupled with multi-million read generating sequencing technologies.

Vasileiadis S, Puglisi E, Arena M, Cappa F, Cocconcelli PS, Trevisan M - PLoS ONE (2012)

Bottom Line: However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment.Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions.Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

View Article: PubMed Central - PubMed

Affiliation: Università Cattolica del Sacro Cuore, Faculty of Agricultural Sciences, Institute of Agricultural and Environmental Chemistry, Piacenza, Italy.

ABSTRACT
The novel multi-million read generating sequencing technologies are very promising for resolving the immense soil 16S rRNA gene bacterial diversity. Yet they have a limited maximum sequence length screening ability, restricting studies in screening DNA stretches of single 16S rRNA gene hypervariable (V) regions. The aim of the present study was to assess the effects of properties of four consecutive V regions (V3-6) on commonly applied analytical methodologies in bacterial ecology studies. Using an in silico approach, the performance of each V region was compared with the complete 16S rRNA gene stretch. We assessed related properties of the soil derived bacterial sequence collection of the Ribosomal Database Project (RDP) database and concomitantly performed simulations based on published datasets. Results indicate that overall the most prominent V region for soil bacterial diversity studies was V3, even though it was outperformed in some of the tests. Despite its high performance during most tests, V4 was less conserved along flanking sites, thus reducing its ability for bacterial diversity coverage. V5 performed well in the non-redundant RDP database based analysis. However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment. Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions. Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

Show MeSH

Related in: MedlinePlus

16S rRNA gene sequence conservation of soil derived sequences.A) Nucleic acid base composition of the 16S rRNA gene consensus sequence of the 41,109 RDP database soil derived sequences for 90% conservation cutoff value. Red background positions include hypervariable stretches as reported in reference [24] and expanded in the current study, while green background positions are proposed primer designing sites in reference [11]. The IUPAC system was used for denoting per base variability (degeneracies) and lower-case letters are used for nucleotide positions where gaps participated by more than 10% in the position throughout the sequence alignment. B) Comparison of present study results for 95% sequence conservation with the ones provided in reference [11] for 90% sequence conservation. Letter color coding referring to differences found on sequences of this study compared to that of reference [11]: red) increased variability; blue) altered degeneracy without variability increase; green) reduced variability; grey) although presence of two nucleotides in that position is implied in reference [11], these are missing in the published table.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3412817&req=5

pone-0042671-g002: 16S rRNA gene sequence conservation of soil derived sequences.A) Nucleic acid base composition of the 16S rRNA gene consensus sequence of the 41,109 RDP database soil derived sequences for 90% conservation cutoff value. Red background positions include hypervariable stretches as reported in reference [24] and expanded in the current study, while green background positions are proposed primer designing sites in reference [11]. The IUPAC system was used for denoting per base variability (degeneracies) and lower-case letters are used for nucleotide positions where gaps participated by more than 10% in the position throughout the sequence alignment. B) Comparison of present study results for 95% sequence conservation with the ones provided in reference [11] for 90% sequence conservation. Letter color coding referring to differences found on sequences of this study compared to that of reference [11]: red) increased variability; blue) altered degeneracy without variability increase; green) reduced variability; grey) although presence of two nucleotides in that position is implied in reference [11], these are missing in the published table.

Mentions: 42,109 full or nearly full length 16S rRNA gene sequences derived from currently cultured and uncultured soil bacteria were used for performing the following analyses. Sequence conservation was examined using the Shannon entropy values (H′), while conserved sites flanking the hypervariable regions were also assessed concerning their suitability for designing primers. Out of the four selected V regions those showing the greatest variability were V3 and V6, and those with the longest V sequence lengths were V3 and V4 (Fig. 1 and 2). Stretches longer than 105 bp were identified as hypervariable for V3 and V4 while the corresponding value for V5 and V6 was slightly higher than 27–35 bp. Conservation screening of nucleic acid bases that were common for at least 95% of the examined sequences produced stretches with the potential for being selected as priming sites (green background color in Fig. 2). Identified potential amplicon lengths for the referred per primer coverage (or minimum 90% per primer-set) were: 175 bp (348–533 E. coli numbering) with maximum 3 degeneracies per primer for 18 bp primers or 190 bp (341–531 E. coli numbering) without degeneracies per primer for V3; 282 bp (516–798 E. coli numbering) with low primer degeneracies for V4; 108 bp (788–896 E. coli numbering) with low number of degeneracies per primer for V5; 137 bp (921–1068 E. coli numbering) with low number of per primer degeneracies for V6. When examined, regardless of the conservation of the various sites, and based on previously indicated sites [11], amplicon lengths were less than 200 bp for more than 99.8% of the amplicons for V3 and V4 and less than 150 bp for V5 and V6 (Fig. 3).


Soil bacterial diversity screening using single 16S rRNA gene V regions coupled with multi-million read generating sequencing technologies.

Vasileiadis S, Puglisi E, Arena M, Cappa F, Cocconcelli PS, Trevisan M - PLoS ONE (2012)

16S rRNA gene sequence conservation of soil derived sequences.A) Nucleic acid base composition of the 16S rRNA gene consensus sequence of the 41,109 RDP database soil derived sequences for 90% conservation cutoff value. Red background positions include hypervariable stretches as reported in reference [24] and expanded in the current study, while green background positions are proposed primer designing sites in reference [11]. The IUPAC system was used for denoting per base variability (degeneracies) and lower-case letters are used for nucleotide positions where gaps participated by more than 10% in the position throughout the sequence alignment. B) Comparison of present study results for 95% sequence conservation with the ones provided in reference [11] for 90% sequence conservation. Letter color coding referring to differences found on sequences of this study compared to that of reference [11]: red) increased variability; blue) altered degeneracy without variability increase; green) reduced variability; grey) although presence of two nucleotides in that position is implied in reference [11], these are missing in the published table.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3412817&req=5

pone-0042671-g002: 16S rRNA gene sequence conservation of soil derived sequences.A) Nucleic acid base composition of the 16S rRNA gene consensus sequence of the 41,109 RDP database soil derived sequences for 90% conservation cutoff value. Red background positions include hypervariable stretches as reported in reference [24] and expanded in the current study, while green background positions are proposed primer designing sites in reference [11]. The IUPAC system was used for denoting per base variability (degeneracies) and lower-case letters are used for nucleotide positions where gaps participated by more than 10% in the position throughout the sequence alignment. B) Comparison of present study results for 95% sequence conservation with the ones provided in reference [11] for 90% sequence conservation. Letter color coding referring to differences found on sequences of this study compared to that of reference [11]: red) increased variability; blue) altered degeneracy without variability increase; green) reduced variability; grey) although presence of two nucleotides in that position is implied in reference [11], these are missing in the published table.
Mentions: 42,109 full or nearly full length 16S rRNA gene sequences derived from currently cultured and uncultured soil bacteria were used for performing the following analyses. Sequence conservation was examined using the Shannon entropy values (H′), while conserved sites flanking the hypervariable regions were also assessed concerning their suitability for designing primers. Out of the four selected V regions those showing the greatest variability were V3 and V6, and those with the longest V sequence lengths were V3 and V4 (Fig. 1 and 2). Stretches longer than 105 bp were identified as hypervariable for V3 and V4 while the corresponding value for V5 and V6 was slightly higher than 27–35 bp. Conservation screening of nucleic acid bases that were common for at least 95% of the examined sequences produced stretches with the potential for being selected as priming sites (green background color in Fig. 2). Identified potential amplicon lengths for the referred per primer coverage (or minimum 90% per primer-set) were: 175 bp (348–533 E. coli numbering) with maximum 3 degeneracies per primer for 18 bp primers or 190 bp (341–531 E. coli numbering) without degeneracies per primer for V3; 282 bp (516–798 E. coli numbering) with low primer degeneracies for V4; 108 bp (788–896 E. coli numbering) with low number of degeneracies per primer for V5; 137 bp (921–1068 E. coli numbering) with low number of per primer degeneracies for V6. When examined, regardless of the conservation of the various sites, and based on previously indicated sites [11], amplicon lengths were less than 200 bp for more than 99.8% of the amplicons for V3 and V4 and less than 150 bp for V5 and V6 (Fig. 3).

Bottom Line: However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment.Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions.Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

View Article: PubMed Central - PubMed

Affiliation: Università Cattolica del Sacro Cuore, Faculty of Agricultural Sciences, Institute of Agricultural and Environmental Chemistry, Piacenza, Italy.

ABSTRACT
The novel multi-million read generating sequencing technologies are very promising for resolving the immense soil 16S rRNA gene bacterial diversity. Yet they have a limited maximum sequence length screening ability, restricting studies in screening DNA stretches of single 16S rRNA gene hypervariable (V) regions. The aim of the present study was to assess the effects of properties of four consecutive V regions (V3-6) on commonly applied analytical methodologies in bacterial ecology studies. Using an in silico approach, the performance of each V region was compared with the complete 16S rRNA gene stretch. We assessed related properties of the soil derived bacterial sequence collection of the Ribosomal Database Project (RDP) database and concomitantly performed simulations based on published datasets. Results indicate that overall the most prominent V region for soil bacterial diversity studies was V3, even though it was outperformed in some of the tests. Despite its high performance during most tests, V4 was less conserved along flanking sites, thus reducing its ability for bacterial diversity coverage. V5 performed well in the non-redundant RDP database based analysis. However V5 did not resemble the full-length 16S rRNA gene sequence results as well as V3 and V4 did when the natural sequence frequency and occurrence approximation was considered in the virtual experiment. Although, the highly conserved flanking sequence regions of V6 provide the ability to amplify partial 16S rRNA gene sequences from very diverse owners, it was demonstrated that V6 was the least informative compared to the rest examined V regions. Our results indicate that environment specific database exploration and theoretical assessment of the experimental approach are strongly suggested in 16S rRNA gene based bacterial diversity studies.

Show MeSH
Related in: MedlinePlus