Limits...
RNA global alignment in the joint sequence-structure space using elastic shape analysis.

Laborde J, Robinson D, Srivastava A, Klassen E, Zhang J - Nucleic Acids Res. (2013)

Bottom Line: Comparison/alignment of RNA molecules provides an effective means to predict their functions and understand their evolutionary relationships.Means and covariances of full structures can be defined and computed, and probability distributions on spaces of such structures can be constructed for a group of RNAs.Our method was further applied to predict functions of RNA molecules and showed superior performance compared with previous methods when tested on benchmark datasets.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, Florida State University, FL, USA.

ABSTRACT
The functions of RNAs, like proteins, are determined by their structures, which, in turn, are determined by their sequences. Comparison/alignment of RNA molecules provides an effective means to predict their functions and understand their evolutionary relationships. For RNA sequence alignment, most methods developed for protein and DNA sequence alignment can be directly applied. RNA 3-dimensional structure alignment, on the other hand, tends to be more difficult than protein structure alignment due to the lack of regular secondary structures as observed in proteins. Most of the existing RNA 3D structure alignment methods use only the backbone geometry and ignore the sequence information. Using both the sequence and backbone geometry information in RNA alignment may not only produce more accurate classification, but also deepen our understanding of the sequence-structure-function relationship of RNA molecules. In this study, we developed a new RNA alignment method based on elastic shape analysis (ESA). ESA treats RNA structures as three dimensional curves with sequence information encoded on additional dimensions so that the alignment can be performed in the joint sequence-structure space. The similarity between two RNA molecules is quantified by a formal distance, geodesic distance. Based on ESA, a rigorous mathematical framework can be built for RNA structure comparison. Means and covariances of full structures can be defined and computed, and probability distributions on spaces of such structures can be constructed for a group of RNAs. Our method was further applied to predict functions of RNA molecules and showed superior performance compared with previous methods when tested on benchmark datasets. The programs are available at http://stat.fsu.edu/ ∼jinfeng/ESA.html.

Show MeSH

Related in: MedlinePlus

Piecewise-linear re-parameterization  yielding an optimal matching between 1h3e chain B and 1gtr chain B. The matching grid has 82 columns, corresponding to the 80 points on the backbone of 1h3eB plus the dummy points at each end. Similarly, the grid has 76 rows, corresponding to the 74 points on the backbone of 1gtrB plus the dummy points at each end.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3675459&req=5

gkt187-F1: Piecewise-linear re-parameterization yielding an optimal matching between 1h3e chain B and 1gtr chain B. The matching grid has 82 columns, corresponding to the 80 points on the backbone of 1h3eB plus the dummy points at each end. Similarly, the grid has 76 rows, corresponding to the 74 points on the backbone of 1gtrB plus the dummy points at each end.

Mentions: In the original ESA framework for RNA/protein alignment (21,26), we added a pre-processing step: any two RNAs/proteins to be compared had to have their 3D coordinates re-sampled using interpolations, and we obtained smooth curves with equal number of points on each structure. That treatment was convenient since it facilitated the use of a uniform grid that evenly partitions to search for the optimal . In the new framework, each point is mapped to a nucleotide and re-sampling of points is not necessary. With this modification, sequence alignments are obtained together with structure alignments, like most of the other structure alignment methods do. As in the original algorithm, we create a grid on the unit square and search for an optimal path through the grid from (0,0) to (1,1). However, our grid is non-uniform: the vertical lines are placed at x-coordinates and the horizontal lines are placed at y-coordinates (see Figure 1). In our case, the gridpoints have special significance: each gridpoint represents a match between one of the P atoms on the backbone of the molecule represented by and a P atom on the backbone of the molecule represented by . As in the original algorithm, a list of gridpoints which starts at (0,0), ends at (1,1), and is strictly increasing in both the x and y components can be thought of as the graph of an increasing piecewise-linear re-parameterization . In our case, such a list also gives us an alignment between the two nucleotide sequences. Here, parameter values which do not appear in the list of gridpoints correspond to alignment gaps (elements in the nucleotide sequence which are unmatched). Dynamic programming is used as in the original algorithm to search over the space of all such lists for one which minimizes the energy in equation A.1, and the result gives us both a full elastic matching between and , as well as a nucleotide sequence alignment. Figure 2 shows an example of a sequence alignment obtained in this way.Figure 1.


RNA global alignment in the joint sequence-structure space using elastic shape analysis.

Laborde J, Robinson D, Srivastava A, Klassen E, Zhang J - Nucleic Acids Res. (2013)

Piecewise-linear re-parameterization  yielding an optimal matching between 1h3e chain B and 1gtr chain B. The matching grid has 82 columns, corresponding to the 80 points on the backbone of 1h3eB plus the dummy points at each end. Similarly, the grid has 76 rows, corresponding to the 74 points on the backbone of 1gtrB plus the dummy points at each end.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3675459&req=5

gkt187-F1: Piecewise-linear re-parameterization yielding an optimal matching between 1h3e chain B and 1gtr chain B. The matching grid has 82 columns, corresponding to the 80 points on the backbone of 1h3eB plus the dummy points at each end. Similarly, the grid has 76 rows, corresponding to the 74 points on the backbone of 1gtrB plus the dummy points at each end.
Mentions: In the original ESA framework for RNA/protein alignment (21,26), we added a pre-processing step: any two RNAs/proteins to be compared had to have their 3D coordinates re-sampled using interpolations, and we obtained smooth curves with equal number of points on each structure. That treatment was convenient since it facilitated the use of a uniform grid that evenly partitions to search for the optimal . In the new framework, each point is mapped to a nucleotide and re-sampling of points is not necessary. With this modification, sequence alignments are obtained together with structure alignments, like most of the other structure alignment methods do. As in the original algorithm, we create a grid on the unit square and search for an optimal path through the grid from (0,0) to (1,1). However, our grid is non-uniform: the vertical lines are placed at x-coordinates and the horizontal lines are placed at y-coordinates (see Figure 1). In our case, the gridpoints have special significance: each gridpoint represents a match between one of the P atoms on the backbone of the molecule represented by and a P atom on the backbone of the molecule represented by . As in the original algorithm, a list of gridpoints which starts at (0,0), ends at (1,1), and is strictly increasing in both the x and y components can be thought of as the graph of an increasing piecewise-linear re-parameterization . In our case, such a list also gives us an alignment between the two nucleotide sequences. Here, parameter values which do not appear in the list of gridpoints correspond to alignment gaps (elements in the nucleotide sequence which are unmatched). Dynamic programming is used as in the original algorithm to search over the space of all such lists for one which minimizes the energy in equation A.1, and the result gives us both a full elastic matching between and , as well as a nucleotide sequence alignment. Figure 2 shows an example of a sequence alignment obtained in this way.Figure 1.

Bottom Line: Comparison/alignment of RNA molecules provides an effective means to predict their functions and understand their evolutionary relationships.Means and covariances of full structures can be defined and computed, and probability distributions on spaces of such structures can be constructed for a group of RNAs.Our method was further applied to predict functions of RNA molecules and showed superior performance compared with previous methods when tested on benchmark datasets.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, Florida State University, FL, USA.

ABSTRACT
The functions of RNAs, like proteins, are determined by their structures, which, in turn, are determined by their sequences. Comparison/alignment of RNA molecules provides an effective means to predict their functions and understand their evolutionary relationships. For RNA sequence alignment, most methods developed for protein and DNA sequence alignment can be directly applied. RNA 3-dimensional structure alignment, on the other hand, tends to be more difficult than protein structure alignment due to the lack of regular secondary structures as observed in proteins. Most of the existing RNA 3D structure alignment methods use only the backbone geometry and ignore the sequence information. Using both the sequence and backbone geometry information in RNA alignment may not only produce more accurate classification, but also deepen our understanding of the sequence-structure-function relationship of RNA molecules. In this study, we developed a new RNA alignment method based on elastic shape analysis (ESA). ESA treats RNA structures as three dimensional curves with sequence information encoded on additional dimensions so that the alignment can be performed in the joint sequence-structure space. The similarity between two RNA molecules is quantified by a formal distance, geodesic distance. Based on ESA, a rigorous mathematical framework can be built for RNA structure comparison. Means and covariances of full structures can be defined and computed, and probability distributions on spaces of such structures can be constructed for a group of RNAs. Our method was further applied to predict functions of RNA molecules and showed superior performance compared with previous methods when tested on benchmark datasets. The programs are available at http://stat.fsu.edu/ ∼jinfeng/ESA.html.

Show MeSH
Related in: MedlinePlus