Limits...
A novel representation of RNA secondary structure based on element-contact graphs.

Shu W, Bo X, Zheng Z, Wang S - BMC Bioinformatics (2008)

Bottom Line: Both the stem and loop topologies are encoded completely in the ECGs.The applicability of topological indices is illustrated by three application case studies.However, further research is needed to fully resolve the challenging problem of predicting and classifying noncoding RNAs.

View Article: PubMed Central - HTML - PubMed

Affiliation: Beijing Institute of Radiation Medicine, Beijing 100850, China. shuwj@bmi.ac.cn

ABSTRACT

Background: Depending on their specific structures, noncoding RNAs (ncRNAs) play important roles in many biological processes. Interest in developing new topological indices based on RNA graphs has been revived in recent years, as such indices can be used to compare, identify and classify RNAs. Although the topological indices presented before characterize the main topological features of RNA secondary structures, information on RNA structural details is ignored to some degree. Therefore, it is necessity to identify topological features with low degeneracy based on complete and fine-grained RNA graphical representations.

Results: In this study, we present a complete and fine scheme for RNA graph representation as a new basis for constructing RNA topological indices. We propose a combination of three vertex-weighted element-contact graphs (ECGs) to describe the RNA element details and their adjacent patterns in RNA secondary structure. Both the stem and loop topologies are encoded completely in the ECGs. The relationship among the three typical topological index families defined by their ECGs and RNA secondary structures was investigated from a dataset of 6,305 ncRNAs. The applicability of topological indices is illustrated by three application case studies. Based on the applied small dataset, we find that the topological indices can distinguish true pre-miRNAs from pseudo pre-miRNAs with about 96% accuracy, and can cluster known types of ncRNAs with about 98% accuracy, respectively.

Conclusion: The results indicate that the topological indices can characterize the details of RNA structures and may have a potential role in identifying and classifying ncRNAs. Moreover, these indices may lead to a new approach for discovering novel ncRNAs. However, further research is needed to fully resolve the challenging problem of predicting and classifying noncoding RNAs.

Show MeSH

Related in: MedlinePlus

Mapping results of miRNA identification. The mapping results of miRNA identification using K-means clustering algorithm for the three topological index families are shown. In this application case study, 200 real pre-miRNAs are randomly chosen from the 1,082 miRNAs in dataset of Table 1, and the corresponding 1,000 pseudo pre-miRNAs are generated as reference set. Principal component analysis mapping method is employed here to visualize the clustering results for three types of topological indices. The green circle and blue upward-pointing triangle respectively represent real and pseudo pre-miRNAs, and the centroid is marked with red '+'. (A) Mapping result of the real and pseudo pre-miRNAs in the Wiener indices space. (B) Mapping result of real and pseudo pre-miRNAs in the Balaban indices space. (C) Mapping result of real and pseudo pre-miRNAs in the Randić indices space.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2373570&req=5

Figure 3: Mapping results of miRNA identification. The mapping results of miRNA identification using K-means clustering algorithm for the three topological index families are shown. In this application case study, 200 real pre-miRNAs are randomly chosen from the 1,082 miRNAs in dataset of Table 1, and the corresponding 1,000 pseudo pre-miRNAs are generated as reference set. Principal component analysis mapping method is employed here to visualize the clustering results for three types of topological indices. The green circle and blue upward-pointing triangle respectively represent real and pseudo pre-miRNAs, and the centroid is marked with red '+'. (A) Mapping result of the real and pseudo pre-miRNAs in the Wiener indices space. (B) Mapping result of real and pseudo pre-miRNAs in the Balaban indices space. (C) Mapping result of real and pseudo pre-miRNAs in the Randić indices space.

Mentions: As numeric features of RNA structure, topological indices may be used to score candidates based on structure similarity measurements among the folds and structures of the reference miRNAs. We randomly chose 200 real pre-miRNAs from the 1,082 miRNAs in our dataset (Table 1) and generated 1,000 pseudo pre-miRNAs as a reference set using the dinucleotide shuffling method presented in our previous study [50]. To evaluate the potentials of topological indices as features in the miRNA identification procedure, we explored the distribution of the 200 real pre-miRNAs and the corresponding 1,000 pseudo pre-miRNAs in the topological feature space. Figures 3(A), (B) and 3(C) illustrate the 2D mapping results of these real and pseudo pre-miRNAs from the structural space to the topological feature space of the three types of topological indices using the K-means algorithm, respectively. The corresponding ROC curves are plotted in Figure 4.


A novel representation of RNA secondary structure based on element-contact graphs.

Shu W, Bo X, Zheng Z, Wang S - BMC Bioinformatics (2008)

Mapping results of miRNA identification. The mapping results of miRNA identification using K-means clustering algorithm for the three topological index families are shown. In this application case study, 200 real pre-miRNAs are randomly chosen from the 1,082 miRNAs in dataset of Table 1, and the corresponding 1,000 pseudo pre-miRNAs are generated as reference set. Principal component analysis mapping method is employed here to visualize the clustering results for three types of topological indices. The green circle and blue upward-pointing triangle respectively represent real and pseudo pre-miRNAs, and the centroid is marked with red '+'. (A) Mapping result of the real and pseudo pre-miRNAs in the Wiener indices space. (B) Mapping result of real and pseudo pre-miRNAs in the Balaban indices space. (C) Mapping result of real and pseudo pre-miRNAs in the Randić indices space.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2373570&req=5

Figure 3: Mapping results of miRNA identification. The mapping results of miRNA identification using K-means clustering algorithm for the three topological index families are shown. In this application case study, 200 real pre-miRNAs are randomly chosen from the 1,082 miRNAs in dataset of Table 1, and the corresponding 1,000 pseudo pre-miRNAs are generated as reference set. Principal component analysis mapping method is employed here to visualize the clustering results for three types of topological indices. The green circle and blue upward-pointing triangle respectively represent real and pseudo pre-miRNAs, and the centroid is marked with red '+'. (A) Mapping result of the real and pseudo pre-miRNAs in the Wiener indices space. (B) Mapping result of real and pseudo pre-miRNAs in the Balaban indices space. (C) Mapping result of real and pseudo pre-miRNAs in the Randić indices space.
Mentions: As numeric features of RNA structure, topological indices may be used to score candidates based on structure similarity measurements among the folds and structures of the reference miRNAs. We randomly chose 200 real pre-miRNAs from the 1,082 miRNAs in our dataset (Table 1) and generated 1,000 pseudo pre-miRNAs as a reference set using the dinucleotide shuffling method presented in our previous study [50]. To evaluate the potentials of topological indices as features in the miRNA identification procedure, we explored the distribution of the 200 real pre-miRNAs and the corresponding 1,000 pseudo pre-miRNAs in the topological feature space. Figures 3(A), (B) and 3(C) illustrate the 2D mapping results of these real and pseudo pre-miRNAs from the structural space to the topological feature space of the three types of topological indices using the K-means algorithm, respectively. The corresponding ROC curves are plotted in Figure 4.

Bottom Line: Both the stem and loop topologies are encoded completely in the ECGs.The applicability of topological indices is illustrated by three application case studies.However, further research is needed to fully resolve the challenging problem of predicting and classifying noncoding RNAs.

View Article: PubMed Central - HTML - PubMed

Affiliation: Beijing Institute of Radiation Medicine, Beijing 100850, China. shuwj@bmi.ac.cn

ABSTRACT

Background: Depending on their specific structures, noncoding RNAs (ncRNAs) play important roles in many biological processes. Interest in developing new topological indices based on RNA graphs has been revived in recent years, as such indices can be used to compare, identify and classify RNAs. Although the topological indices presented before characterize the main topological features of RNA secondary structures, information on RNA structural details is ignored to some degree. Therefore, it is necessity to identify topological features with low degeneracy based on complete and fine-grained RNA graphical representations.

Results: In this study, we present a complete and fine scheme for RNA graph representation as a new basis for constructing RNA topological indices. We propose a combination of three vertex-weighted element-contact graphs (ECGs) to describe the RNA element details and their adjacent patterns in RNA secondary structure. Both the stem and loop topologies are encoded completely in the ECGs. The relationship among the three typical topological index families defined by their ECGs and RNA secondary structures was investigated from a dataset of 6,305 ncRNAs. The applicability of topological indices is illustrated by three application case studies. Based on the applied small dataset, we find that the topological indices can distinguish true pre-miRNAs from pseudo pre-miRNAs with about 96% accuracy, and can cluster known types of ncRNAs with about 98% accuracy, respectively.

Conclusion: The results indicate that the topological indices can characterize the details of RNA structures and may have a potential role in identifying and classifying ncRNAs. Moreover, these indices may lead to a new approach for discovering novel ncRNAs. However, further research is needed to fully resolve the challenging problem of predicting and classifying noncoding RNAs.

Show MeSH
Related in: MedlinePlus