Quantifying sequence and structural features of protein-RNA interactions.
Bottom Line: Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers.These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions.We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.
Affiliation: Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan firstname.lastname@example.org.Show MeSH
Mentions: According to a recent study using a 5 Å cutoff to define RNA-binding (16), the AUC of different classifiers using PSSM features and their derivatives varied from 0.77 to 0.81. The best-performing method was the predictor RNABindR 2.0. In the aforementioned study, a balanced training dataset of positive and undersampled negative residues was prepared, while in our tests the datasets represented the actual distributions observed in the PDB, in which there are far more non RNA-binding residues. Nevertheless, when trained and tested on three standard benchmark datasets (RB106, RB144 and RB198) and evaluated in the same way (residue-based and protein-based evaluation on structure data), our additional features exhibited considerable improvement over sequence-based features alone, and exceed the previously reported AUC limit of 0.81 by 2–3%, as demonstrated in Figure 4. In Table 3 the results of these three benchmark tests are summarized. Performance differences were assessed both at the residue level (Benchmark [r]) and at the protein level (Benchmark [p]). The AUC distribution of the protein-chain based evaluation is shown in Supplementary Figure S13. In both residue-level and protein-level assessments the improvement in performance of aaRNA over the alternative methods was highly significant (P-values <10−5 and <10−10, respectively). To be complete, the number of RNA-binding and non-binding residues in the three benchmark datasets collected under a 3.5 or 5 Å distance cutoff are listed in the Supplementary Table S2. The performance of our model built from three benchmark datasets using a <3.5 Å cutoff as the RNA-binding definition can be found in Supplementary Figure S14. When a smaller cutoff was used, performances of models on three benchmarks all increased.
Affiliation: Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan email@example.com.