Limits...
Identification of RNA molecules by specific enzyme digestion and mass spectrometry: software for and implementation of RNA mass mapping.

Matthiesen R, Kirpekar F - Nucleic Acids Res. (2009)

Bottom Line: A simple and powerful probability model for ranking RNA matches is proposed.We demonstrate viability of the entire setup by identifying the DNA template of a series of RNAs of biological and of in vitro transcriptional origin in complete microbial genomes and by identifying authentic 16S ribosomal RNAs in a 'small ribosomal subunit RNA' database.Thus, we present a new tool for a rapid identification of unknown RNAs using only a few picomoles of starting material.

View Article: PubMed Central - PubMed

Affiliation: Population Genetics-Instituto de Patologia e Imunologia Molecular da Universidad do Porto, Porto, Portugal. rmatthiesen@ipatimup.pt

ABSTRACT
The idea of identifying or characterizing an RNA molecule based on a mass spectrum of specifically generated RNA fragments has been used in various forms for well over a decade. We have developed software-named RRM for 'RNA mass mapping'-which can search whole prokaryotic genomes or RNA FASTA sequence databases to identify the origin of a given RNA based on a mass spectrum of RNA fragments. As input, the program uses the masses of specific RNase cleavage of the RNA under investigation. RNase T1 digestion is used here as a demonstration of the usability of the method for RNA identification. The concept for identification is that the masses of the digestion products constitute a specific fingerprint, which characterize the given RNA. The search algorithm is based on the same principles as those used in peptide mass fingerprinting, but has here been extended to work for both RNA sequence databases and for genome searches. A simple and powerful probability model for ranking RNA matches is proposed. We demonstrate viability of the entire setup by identifying the DNA template of a series of RNAs of biological and of in vitro transcriptional origin in complete microbial genomes and by identifying authentic 16S ribosomal RNAs in a 'small ribosomal subunit RNA' database. Thus, we present a new tool for a rapid identification of unknown RNAs using only a few picomoles of starting material.

Show MeSH
Mass spectrometry data and search result for a H. marismortui 23S rRNA subfragment. (a) Mass spectrum of H. marismortui 23S rRNA subfragment (around positions 2323–2630) digested with RNase T1. Assigned masses are from singly protonated digestion products, these masses were used in the subsequent genome search. Insert: zoom on peak clusters to illustrate the effect of digestion products with partially overlapping isotope distributions—see text for details. (b) Top scoring genomic region with flanks for RNA mass mapping of H. marismortui 23S rRNA subfragments 2323–2630. Underlined: identified sequence. Yellow highlight: RNase T1 digestion fragments with masses in peak list. Bold italic: RNase T1 digestion fragments with masses not present in peak list.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2665245&req=5

Figure 2: Mass spectrometry data and search result for a H. marismortui 23S rRNA subfragment. (a) Mass spectrum of H. marismortui 23S rRNA subfragment (around positions 2323–2630) digested with RNase T1. Assigned masses are from singly protonated digestion products, these masses were used in the subsequent genome search. Insert: zoom on peak clusters to illustrate the effect of digestion products with partially overlapping isotope distributions—see text for details. (b) Top scoring genomic region with flanks for RNA mass mapping of H. marismortui 23S rRNA subfragments 2323–2630. Underlined: identified sequence. Yellow highlight: RNase T1 digestion fragments with masses in peak list. Bold italic: RNase T1 digestion fragments with masses not present in peak list.

Mentions: The overview of the entire bioinformatics workflow is depicted in Figure 1 and was tested with real data as described in the following. We have previously purified and MALDI mass spectrometry-analysed a series of defined subfragments of 23S rRNA from H. marismortui (26), and one of these spectra—covering approximately positions 2323–2630 of the 23S rRNA is displayed in Figure 2a. The assigned peaks constitute the mass list that was used to search the H. marismortui genome with a window size of 310 nucleotides. (Note that the RNase T1 digestion in this case produced almost exclusively the 2′–3′-cyclic phosphate versions of the digestion products (17), which was taken into account when creating the mass list). Mono- and di-nucleotide digestion products are not recorded, because these will be ubiquitous in essentially all RNAs and therefore have no value for identification. First round of peak assignment was performed by the data processing software, but we subsequently did a manual adjustment to assure labelling of the correct isotopic peaks, and to take into account partial signal overlap occurring due to the ∼1.0 Da mass difference between U- and C-nucleotides. The latter issue is illustrated in the insert of Figure 1. The peak cluster represented by assignment of the monoisotopic species at m/z 1309.18 has an intensity distribution that is close to the theoretically expected if only one analyte contributes to the peak cluster. The neighbouring peak clusters, on the other hand, has an intensity distribution that cannot be explained by a single analyte species; consequently, we chose to interpret these data as overlapping isotopic distributions of two (m/z 1285.15 and 1286.18) and three (m/z 1261.17, 1262.16 and 1263.13) species, respectively. We cannot always discern all species contributing to a given cluster, but the manual inspection of the peak intensity pattern in many cases allows indisputable assignment of additional signals from genuine digestion fragments.Figure 2.


Identification of RNA molecules by specific enzyme digestion and mass spectrometry: software for and implementation of RNA mass mapping.

Matthiesen R, Kirpekar F - Nucleic Acids Res. (2009)

Mass spectrometry data and search result for a H. marismortui 23S rRNA subfragment. (a) Mass spectrum of H. marismortui 23S rRNA subfragment (around positions 2323–2630) digested with RNase T1. Assigned masses are from singly protonated digestion products, these masses were used in the subsequent genome search. Insert: zoom on peak clusters to illustrate the effect of digestion products with partially overlapping isotope distributions—see text for details. (b) Top scoring genomic region with flanks for RNA mass mapping of H. marismortui 23S rRNA subfragments 2323–2630. Underlined: identified sequence. Yellow highlight: RNase T1 digestion fragments with masses in peak list. Bold italic: RNase T1 digestion fragments with masses not present in peak list.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2665245&req=5

Figure 2: Mass spectrometry data and search result for a H. marismortui 23S rRNA subfragment. (a) Mass spectrum of H. marismortui 23S rRNA subfragment (around positions 2323–2630) digested with RNase T1. Assigned masses are from singly protonated digestion products, these masses were used in the subsequent genome search. Insert: zoom on peak clusters to illustrate the effect of digestion products with partially overlapping isotope distributions—see text for details. (b) Top scoring genomic region with flanks for RNA mass mapping of H. marismortui 23S rRNA subfragments 2323–2630. Underlined: identified sequence. Yellow highlight: RNase T1 digestion fragments with masses in peak list. Bold italic: RNase T1 digestion fragments with masses not present in peak list.
Mentions: The overview of the entire bioinformatics workflow is depicted in Figure 1 and was tested with real data as described in the following. We have previously purified and MALDI mass spectrometry-analysed a series of defined subfragments of 23S rRNA from H. marismortui (26), and one of these spectra—covering approximately positions 2323–2630 of the 23S rRNA is displayed in Figure 2a. The assigned peaks constitute the mass list that was used to search the H. marismortui genome with a window size of 310 nucleotides. (Note that the RNase T1 digestion in this case produced almost exclusively the 2′–3′-cyclic phosphate versions of the digestion products (17), which was taken into account when creating the mass list). Mono- and di-nucleotide digestion products are not recorded, because these will be ubiquitous in essentially all RNAs and therefore have no value for identification. First round of peak assignment was performed by the data processing software, but we subsequently did a manual adjustment to assure labelling of the correct isotopic peaks, and to take into account partial signal overlap occurring due to the ∼1.0 Da mass difference between U- and C-nucleotides. The latter issue is illustrated in the insert of Figure 1. The peak cluster represented by assignment of the monoisotopic species at m/z 1309.18 has an intensity distribution that is close to the theoretically expected if only one analyte contributes to the peak cluster. The neighbouring peak clusters, on the other hand, has an intensity distribution that cannot be explained by a single analyte species; consequently, we chose to interpret these data as overlapping isotopic distributions of two (m/z 1285.15 and 1286.18) and three (m/z 1261.17, 1262.16 and 1263.13) species, respectively. We cannot always discern all species contributing to a given cluster, but the manual inspection of the peak intensity pattern in many cases allows indisputable assignment of additional signals from genuine digestion fragments.Figure 2.

Bottom Line: A simple and powerful probability model for ranking RNA matches is proposed.We demonstrate viability of the entire setup by identifying the DNA template of a series of RNAs of biological and of in vitro transcriptional origin in complete microbial genomes and by identifying authentic 16S ribosomal RNAs in a 'small ribosomal subunit RNA' database.Thus, we present a new tool for a rapid identification of unknown RNAs using only a few picomoles of starting material.

View Article: PubMed Central - PubMed

Affiliation: Population Genetics-Instituto de Patologia e Imunologia Molecular da Universidad do Porto, Porto, Portugal. rmatthiesen@ipatimup.pt

ABSTRACT
The idea of identifying or characterizing an RNA molecule based on a mass spectrum of specifically generated RNA fragments has been used in various forms for well over a decade. We have developed software-named RRM for 'RNA mass mapping'-which can search whole prokaryotic genomes or RNA FASTA sequence databases to identify the origin of a given RNA based on a mass spectrum of RNA fragments. As input, the program uses the masses of specific RNase cleavage of the RNA under investigation. RNase T1 digestion is used here as a demonstration of the usability of the method for RNA identification. The concept for identification is that the masses of the digestion products constitute a specific fingerprint, which characterize the given RNA. The search algorithm is based on the same principles as those used in peptide mass fingerprinting, but has here been extended to work for both RNA sequence databases and for genome searches. A simple and powerful probability model for ranking RNA matches is proposed. We demonstrate viability of the entire setup by identifying the DNA template of a series of RNAs of biological and of in vitro transcriptional origin in complete microbial genomes and by identifying authentic 16S ribosomal RNAs in a 'small ribosomal subunit RNA' database. Thus, we present a new tool for a rapid identification of unknown RNAs using only a few picomoles of starting material.

Show MeSH