Quantifying sequence and structural features of protein-RNA interactions.
Bottom Line: Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers.These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions.We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.
Affiliation: Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan firstname.lastname@example.org.Show MeSH
Mentions: Since the performance of structure-based classifiers could be over-estimated when input structures are in their RNA-bound conformations, we tested the robustness of our model by using structures built by homology modeling using template structures selected within various sequence identity thresholds. The distribution of templates under five sequence identity thresholds is shown in Supplementary Figure S12. The number of protein chains that was modeled under different identity thresholds and their averaged root-mean-square deviation from native structures are listed in Supplementary Table S3. Note that even when using templates from the top group, where sequence identity can be as high as 100%, predicted structures were not identical to the template because we carried out energy minimization on the models without RNA. Also, depending on the template, the number of predicted residues differed in general, especially when low sequence identity templates were used. Therefore, under different sequence identity cutoffs, we rebuilt the PDB dataset to include only residues that could be reproduced in the model. Performance evaluated on the homology models built using the five different sequence identity thresholds are listed in Figure 3. We can see that, even at a lowest sequence identity threshold (<30%), incorporating structural features was significantly better than using sequence features alone. Moreover, when high quality but non-identical templates were used (identity <100%), the AUC was nearly identical to that of the bound structure. These results imply that the aaRNA method is robust against typical levels of input noise.
Affiliation: Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan email@example.com.