Limits...
Quantifying sequence and structural features of protein-RNA interactions.

Li S, Yamashita K, Amada KM, Standley DM - Nucleic Acids Res. (2014)

Bottom Line: Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers.These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions.We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan standley@ifrec.osaka-u.ac.jp.

Show MeSH
Performance of our feature-coding scheme on three benchmark datasets under a 5 Å distance cutoff for RNA-binding residues. The three benchmarks shown are RB106 (A), RB144 (B) and RB198 (C). The label ‘PSSM’ indicates the AUC achieved with PSSM features only. The label ‘Seq-CTRL’ indicates the result with the sequence-based control and the label ‘aaRNA’ for all of our proposed features.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4150784&req=5

Figure 4: Performance of our feature-coding scheme on three benchmark datasets under a 5 Å distance cutoff for RNA-binding residues. The three benchmarks shown are RB106 (A), RB144 (B) and RB198 (C). The label ‘PSSM’ indicates the AUC achieved with PSSM features only. The label ‘Seq-CTRL’ indicates the result with the sequence-based control and the label ‘aaRNA’ for all of our proposed features.

Mentions: According to a recent study using a 5 Å cutoff to define RNA-binding (16), the AUC of different classifiers using PSSM features and their derivatives varied from 0.77 to 0.81. The best-performing method was the predictor RNABindR 2.0. In the aforementioned study, a balanced training dataset of positive and undersampled negative residues was prepared, while in our tests the datasets represented the actual distributions observed in the PDB, in which there are far more non RNA-binding residues. Nevertheless, when trained and tested on three standard benchmark datasets (RB106, RB144 and RB198) and evaluated in the same way (residue-based and protein-based evaluation on structure data), our additional features exhibited considerable improvement over sequence-based features alone, and exceed the previously reported AUC limit of 0.81 by 2–3%, as demonstrated in Figure 4. In Table 3 the results of these three benchmark tests are summarized. Performance differences were assessed both at the residue level (Benchmark [r]) and at the protein level (Benchmark [p]). The AUC distribution of the protein-chain based evaluation is shown in Supplementary Figure S13. In both residue-level and protein-level assessments the improvement in performance of aaRNA over the alternative methods was highly significant (P-values <10−5 and <10−10, respectively). To be complete, the number of RNA-binding and non-binding residues in the three benchmark datasets collected under a 3.5 or 5 Å distance cutoff are listed in the Supplementary Table S2. The performance of our model built from three benchmark datasets using a <3.5 Å cutoff as the RNA-binding definition can be found in Supplementary Figure S14. When a smaller cutoff was used, performances of models on three benchmarks all increased.


Quantifying sequence and structural features of protein-RNA interactions.

Li S, Yamashita K, Amada KM, Standley DM - Nucleic Acids Res. (2014)

Performance of our feature-coding scheme on three benchmark datasets under a 5 Å distance cutoff for RNA-binding residues. The three benchmarks shown are RB106 (A), RB144 (B) and RB198 (C). The label ‘PSSM’ indicates the AUC achieved with PSSM features only. The label ‘Seq-CTRL’ indicates the result with the sequence-based control and the label ‘aaRNA’ for all of our proposed features.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4150784&req=5

Figure 4: Performance of our feature-coding scheme on three benchmark datasets under a 5 Å distance cutoff for RNA-binding residues. The three benchmarks shown are RB106 (A), RB144 (B) and RB198 (C). The label ‘PSSM’ indicates the AUC achieved with PSSM features only. The label ‘Seq-CTRL’ indicates the result with the sequence-based control and the label ‘aaRNA’ for all of our proposed features.
Mentions: According to a recent study using a 5 Å cutoff to define RNA-binding (16), the AUC of different classifiers using PSSM features and their derivatives varied from 0.77 to 0.81. The best-performing method was the predictor RNABindR 2.0. In the aforementioned study, a balanced training dataset of positive and undersampled negative residues was prepared, while in our tests the datasets represented the actual distributions observed in the PDB, in which there are far more non RNA-binding residues. Nevertheless, when trained and tested on three standard benchmark datasets (RB106, RB144 and RB198) and evaluated in the same way (residue-based and protein-based evaluation on structure data), our additional features exhibited considerable improvement over sequence-based features alone, and exceed the previously reported AUC limit of 0.81 by 2–3%, as demonstrated in Figure 4. In Table 3 the results of these three benchmark tests are summarized. Performance differences were assessed both at the residue level (Benchmark [r]) and at the protein level (Benchmark [p]). The AUC distribution of the protein-chain based evaluation is shown in Supplementary Figure S13. In both residue-level and protein-level assessments the improvement in performance of aaRNA over the alternative methods was highly significant (P-values <10−5 and <10−10, respectively). To be complete, the number of RNA-binding and non-binding residues in the three benchmark datasets collected under a 3.5 or 5 Å distance cutoff are listed in the Supplementary Table S2. The performance of our model built from three benchmark datasets using a <3.5 Å cutoff as the RNA-binding definition can be found in Supplementary Figure S14. When a smaller cutoff was used, performances of models on three benchmarks all increased.

Bottom Line: Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers.These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions.We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan standley@ifrec.osaka-u.ac.jp.

Show MeSH