Limits...
Experimental discovery of sRNAs in Vibrio cholerae by direct cloning, 5S/tRNA depletion and parallel sequencing.

Liu JM, Livny J, Lawrence MS, Kimball MD, Waldor MK, Camilli A - Nucleic Acids Res. (2009)

Bottom Line: Our results provide information, at unprecedented depth, on the complexity of the sRNA component of a bacterial transcriptome.In addition, characterization of a subset of the newly identified transcripts led to the identification of a novel sRNA regulator of carbon metabolism.Collectively, these results strongly suggest that the number of sRNAs in bacteria has been greatly underestimated and that future efforts to analyze bacterial transcriptomes will benefit from direct cloning and parallel sequencing experiments aided by 5S/tRNA depletion.

View Article: PubMed Central - PubMed

Affiliation: HHMI, Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Boston, MA 02111, USA.

ABSTRACT
Direct cloning and parallel sequencing, an extremely powerful method for microRNA (miRNA) discovery, has not yet been applied to bacterial transcriptomes. Here we present sRNA-Seq, an unbiased method that allows for interrogation of the entire small, non-coding RNA (sRNA) repertoire in any prokaryotic or eukaryotic organism. This method includes a novel treatment that depletes total RNA fractions of highly abundant tRNAs and small subunit rRNA, thereby enriching the starting pool for sRNA transcripts with novel functionality. As a proof-of-principle, we applied sRNA-Seq to the human pathogen Vibrio cholerae. Our results provide information, at unprecedented depth, on the complexity of the sRNA component of a bacterial transcriptome. From 407 039 sequence reads, all 20 known V. cholerae sRNAs, 500 new, putative intergenic sRNAs and 127 putative antisense sRNAs were identified in a limited number of growth conditions examined. In addition, characterization of a subset of the newly identified transcripts led to the identification of a novel sRNA regulator of carbon metabolism. Collectively, these results strongly suggest that the number of sRNAs in bacteria has been greatly underestimated and that future efforts to analyze bacterial transcriptomes will benefit from direct cloning and parallel sequencing experiments aided by 5S/tRNA depletion.

Show MeSH

Related in: MedlinePlus

Results from V. cholerae sRNA-Seq experiment. (A) Length distribution of all 681 205 reads from 454 sequencing. Reads that match the V. cholerae genome are in blue; reads that represent contaminants are in yellow. (B) Breakdown of total V. cholerae reads from 454 sequencing based on their genomic origin (n = 407 039). ORF, annotated open reading frames; AS, transcripts antisense to ORFs; IGR, transcripts from intergenic regions. (C) Breakdown of sequencing reads that correspond to candidate sRNAs based on their genomic origin (n = 205 537). The IGR-derived candidates include the 92 690 reads of known or previously predicted V. cholerae sRNAs. (D) Length distribution of sRNA candidates. The 2140 sRNA candidates, corresponding to 205 537 total reads, are plotted based on the length of the most abundant sequence observed for each candidate. (E) Visual representation of the depth of the sRNA-Seq data. Approximately 50% of the reads mapping to the V. cholerae genome grouped into candidate sRNAs that were found in two or more samples (yellow). All the previously known and verified V. cholerae sRNAs (white) were amongst these candidate reads; this includes the 20 sRNAs of known function and the 5 sRNAs predicted and verified by northern blot analysis (20,21).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2665243&req=5

Figure 1: Results from V. cholerae sRNA-Seq experiment. (A) Length distribution of all 681 205 reads from 454 sequencing. Reads that match the V. cholerae genome are in blue; reads that represent contaminants are in yellow. (B) Breakdown of total V. cholerae reads from 454 sequencing based on their genomic origin (n = 407 039). ORF, annotated open reading frames; AS, transcripts antisense to ORFs; IGR, transcripts from intergenic regions. (C) Breakdown of sequencing reads that correspond to candidate sRNAs based on their genomic origin (n = 205 537). The IGR-derived candidates include the 92 690 reads of known or previously predicted V. cholerae sRNAs. (D) Length distribution of sRNA candidates. The 2140 sRNA candidates, corresponding to 205 537 total reads, are plotted based on the length of the most abundant sequence observed for each candidate. (E) Visual representation of the depth of the sRNA-Seq data. Approximately 50% of the reads mapping to the V. cholerae genome grouped into candidate sRNAs that were found in two or more samples (yellow). All the previously known and verified V. cholerae sRNAs (white) were amongst these candidate reads; this includes the 20 sRNAs of known function and the 5 sRNAs predicted and verified by northern blot analysis (20,21).

Mentions: Four independent samples were prepared, affording a total of 681 205 sequence reads of which 88% were full length (with complete and perfect 5′ and 3′ linkers) (Supplementary Table 2). The resulting sequences trimmed of linkers had the length distribution shown in Figure 1A. Using the BLASTN algorithm (E-value cutoff of 0.1) on the 626 351 reads 15 nt and longer, we found that 407 039 of them matched the V. cholerae genome (Table 1). Based on analysis of all possible n-mers, of the 3066 unique 15 nt reads that we assigned as being V. cholerae transcripts, it is likely that at most 0.4% of them (∼12), are false positives. We also observed many small transcripts that mapped to the yeast genome (Table 1). We hypothesize that the media that the bacteria were grown in was the source of these contaminating sequences. Additional unidentified sequences may be the result of trace DNA contaminating materials used during the cloning process. Regardless of this contamination, the large majority of the reads >40 nt mapped to the V. cholerae genome. We observed that 73% of the V. cholerae reads matched the genome perfectly, 12% had one mismatch or gap, 6% had multiple mismatches or gaps and a small fraction (9%) represented transcript recombinations that were likely artifacts resulting from the cloning process (‘BXA’ reads; see ‘Materials and methods’ section). We then took all the 407 039 reads that mapped to the V. cholerae genome and grouped them into putative transcripts by combining sequences with genomic coordinates that agreed within ± 10 bp at each end. Based on this method of grouping reads together, the sequences that mapped to the V. cholerae genome represented 16 825 unique transcripts, 9% of which corresponded to rRNA or tRNA sequences. Overall, 34% of the total V. cholerae reads analyzed were derived from rRNA or tRNA (Figure 1B). Our previous attempt to directly clone 14–200 nt RNA resulted in 93% (180/193) 5S/tRNA sequences. The sRNA-Seq results therefore suggest that the 5S/tRNA depletion step was successful in eliminating the majority of these highly abundant transcripts.Table 1.


Experimental discovery of sRNAs in Vibrio cholerae by direct cloning, 5S/tRNA depletion and parallel sequencing.

Liu JM, Livny J, Lawrence MS, Kimball MD, Waldor MK, Camilli A - Nucleic Acids Res. (2009)

Results from V. cholerae sRNA-Seq experiment. (A) Length distribution of all 681 205 reads from 454 sequencing. Reads that match the V. cholerae genome are in blue; reads that represent contaminants are in yellow. (B) Breakdown of total V. cholerae reads from 454 sequencing based on their genomic origin (n = 407 039). ORF, annotated open reading frames; AS, transcripts antisense to ORFs; IGR, transcripts from intergenic regions. (C) Breakdown of sequencing reads that correspond to candidate sRNAs based on their genomic origin (n = 205 537). The IGR-derived candidates include the 92 690 reads of known or previously predicted V. cholerae sRNAs. (D) Length distribution of sRNA candidates. The 2140 sRNA candidates, corresponding to 205 537 total reads, are plotted based on the length of the most abundant sequence observed for each candidate. (E) Visual representation of the depth of the sRNA-Seq data. Approximately 50% of the reads mapping to the V. cholerae genome grouped into candidate sRNAs that were found in two or more samples (yellow). All the previously known and verified V. cholerae sRNAs (white) were amongst these candidate reads; this includes the 20 sRNAs of known function and the 5 sRNAs predicted and verified by northern blot analysis (20,21).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2665243&req=5

Figure 1: Results from V. cholerae sRNA-Seq experiment. (A) Length distribution of all 681 205 reads from 454 sequencing. Reads that match the V. cholerae genome are in blue; reads that represent contaminants are in yellow. (B) Breakdown of total V. cholerae reads from 454 sequencing based on their genomic origin (n = 407 039). ORF, annotated open reading frames; AS, transcripts antisense to ORFs; IGR, transcripts from intergenic regions. (C) Breakdown of sequencing reads that correspond to candidate sRNAs based on their genomic origin (n = 205 537). The IGR-derived candidates include the 92 690 reads of known or previously predicted V. cholerae sRNAs. (D) Length distribution of sRNA candidates. The 2140 sRNA candidates, corresponding to 205 537 total reads, are plotted based on the length of the most abundant sequence observed for each candidate. (E) Visual representation of the depth of the sRNA-Seq data. Approximately 50% of the reads mapping to the V. cholerae genome grouped into candidate sRNAs that were found in two or more samples (yellow). All the previously known and verified V. cholerae sRNAs (white) were amongst these candidate reads; this includes the 20 sRNAs of known function and the 5 sRNAs predicted and verified by northern blot analysis (20,21).
Mentions: Four independent samples were prepared, affording a total of 681 205 sequence reads of which 88% were full length (with complete and perfect 5′ and 3′ linkers) (Supplementary Table 2). The resulting sequences trimmed of linkers had the length distribution shown in Figure 1A. Using the BLASTN algorithm (E-value cutoff of 0.1) on the 626 351 reads 15 nt and longer, we found that 407 039 of them matched the V. cholerae genome (Table 1). Based on analysis of all possible n-mers, of the 3066 unique 15 nt reads that we assigned as being V. cholerae transcripts, it is likely that at most 0.4% of them (∼12), are false positives. We also observed many small transcripts that mapped to the yeast genome (Table 1). We hypothesize that the media that the bacteria were grown in was the source of these contaminating sequences. Additional unidentified sequences may be the result of trace DNA contaminating materials used during the cloning process. Regardless of this contamination, the large majority of the reads >40 nt mapped to the V. cholerae genome. We observed that 73% of the V. cholerae reads matched the genome perfectly, 12% had one mismatch or gap, 6% had multiple mismatches or gaps and a small fraction (9%) represented transcript recombinations that were likely artifacts resulting from the cloning process (‘BXA’ reads; see ‘Materials and methods’ section). We then took all the 407 039 reads that mapped to the V. cholerae genome and grouped them into putative transcripts by combining sequences with genomic coordinates that agreed within ± 10 bp at each end. Based on this method of grouping reads together, the sequences that mapped to the V. cholerae genome represented 16 825 unique transcripts, 9% of which corresponded to rRNA or tRNA sequences. Overall, 34% of the total V. cholerae reads analyzed were derived from rRNA or tRNA (Figure 1B). Our previous attempt to directly clone 14–200 nt RNA resulted in 93% (180/193) 5S/tRNA sequences. The sRNA-Seq results therefore suggest that the 5S/tRNA depletion step was successful in eliminating the majority of these highly abundant transcripts.Table 1.

Bottom Line: Our results provide information, at unprecedented depth, on the complexity of the sRNA component of a bacterial transcriptome.In addition, characterization of a subset of the newly identified transcripts led to the identification of a novel sRNA regulator of carbon metabolism.Collectively, these results strongly suggest that the number of sRNAs in bacteria has been greatly underestimated and that future efforts to analyze bacterial transcriptomes will benefit from direct cloning and parallel sequencing experiments aided by 5S/tRNA depletion.

View Article: PubMed Central - PubMed

Affiliation: HHMI, Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Boston, MA 02111, USA.

ABSTRACT
Direct cloning and parallel sequencing, an extremely powerful method for microRNA (miRNA) discovery, has not yet been applied to bacterial transcriptomes. Here we present sRNA-Seq, an unbiased method that allows for interrogation of the entire small, non-coding RNA (sRNA) repertoire in any prokaryotic or eukaryotic organism. This method includes a novel treatment that depletes total RNA fractions of highly abundant tRNAs and small subunit rRNA, thereby enriching the starting pool for sRNA transcripts with novel functionality. As a proof-of-principle, we applied sRNA-Seq to the human pathogen Vibrio cholerae. Our results provide information, at unprecedented depth, on the complexity of the sRNA component of a bacterial transcriptome. From 407 039 sequence reads, all 20 known V. cholerae sRNAs, 500 new, putative intergenic sRNAs and 127 putative antisense sRNAs were identified in a limited number of growth conditions examined. In addition, characterization of a subset of the newly identified transcripts led to the identification of a novel sRNA regulator of carbon metabolism. Collectively, these results strongly suggest that the number of sRNAs in bacteria has been greatly underestimated and that future efforts to analyze bacterial transcriptomes will benefit from direct cloning and parallel sequencing experiments aided by 5S/tRNA depletion.

Show MeSH
Related in: MedlinePlus