Limits...
Quantifying sequence and structural features of protein-RNA interactions.

Li S, Yamashita K, Amada KM, Standley DM - Nucleic Acids Res. (2014)

Bottom Line: Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers.These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions.We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan standley@ifrec.osaka-u.ac.jp.

Show MeSH
Performance evaluation using homology models. The left panel (A) shows the performance on the non-ribosomal set and the right panel (B) shows the performance on the full set. The figure shows the performance for the top, <100%, <90%, <50% and <30% homologs in subfigures. Since the number of residues generally decreases as the threshold is lowered, performance is only comparable within a given set. The performance using bound structures, homology models and sequence-based control are indicated by ‘PDB’, ‘Homo’ and ‘Seq-CTRL’.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4150784&req=5

Figure 3: Performance evaluation using homology models. The left panel (A) shows the performance on the non-ribosomal set and the right panel (B) shows the performance on the full set. The figure shows the performance for the top, <100%, <90%, <50% and <30% homologs in subfigures. Since the number of residues generally decreases as the threshold is lowered, performance is only comparable within a given set. The performance using bound structures, homology models and sequence-based control are indicated by ‘PDB’, ‘Homo’ and ‘Seq-CTRL’.

Mentions: Since the performance of structure-based classifiers could be over-estimated when input structures are in their RNA-bound conformations, we tested the robustness of our model by using structures built by homology modeling using template structures selected within various sequence identity thresholds. The distribution of templates under five sequence identity thresholds is shown in Supplementary Figure S12. The number of protein chains that was modeled under different identity thresholds and their averaged root-mean-square deviation from native structures are listed in Supplementary Table S3. Note that even when using templates from the top group, where sequence identity can be as high as 100%, predicted structures were not identical to the template because we carried out energy minimization on the models without RNA. Also, depending on the template, the number of predicted residues differed in general, especially when low sequence identity templates were used. Therefore, under different sequence identity cutoffs, we rebuilt the PDB dataset to include only residues that could be reproduced in the model. Performance evaluated on the homology models built using the five different sequence identity thresholds are listed in Figure 3. We can see that, even at a lowest sequence identity threshold (<30%), incorporating structural features was significantly better than using sequence features alone. Moreover, when high quality but non-identical templates were used (identity <100%), the AUC was nearly identical to that of the bound structure. These results imply that the aaRNA method is robust against typical levels of input noise.


Quantifying sequence and structural features of protein-RNA interactions.

Li S, Yamashita K, Amada KM, Standley DM - Nucleic Acids Res. (2014)

Performance evaluation using homology models. The left panel (A) shows the performance on the non-ribosomal set and the right panel (B) shows the performance on the full set. The figure shows the performance for the top, <100%, <90%, <50% and <30% homologs in subfigures. Since the number of residues generally decreases as the threshold is lowered, performance is only comparable within a given set. The performance using bound structures, homology models and sequence-based control are indicated by ‘PDB’, ‘Homo’ and ‘Seq-CTRL’.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4150784&req=5

Figure 3: Performance evaluation using homology models. The left panel (A) shows the performance on the non-ribosomal set and the right panel (B) shows the performance on the full set. The figure shows the performance for the top, <100%, <90%, <50% and <30% homologs in subfigures. Since the number of residues generally decreases as the threshold is lowered, performance is only comparable within a given set. The performance using bound structures, homology models and sequence-based control are indicated by ‘PDB’, ‘Homo’ and ‘Seq-CTRL’.
Mentions: Since the performance of structure-based classifiers could be over-estimated when input structures are in their RNA-bound conformations, we tested the robustness of our model by using structures built by homology modeling using template structures selected within various sequence identity thresholds. The distribution of templates under five sequence identity thresholds is shown in Supplementary Figure S12. The number of protein chains that was modeled under different identity thresholds and their averaged root-mean-square deviation from native structures are listed in Supplementary Table S3. Note that even when using templates from the top group, where sequence identity can be as high as 100%, predicted structures were not identical to the template because we carried out energy minimization on the models without RNA. Also, depending on the template, the number of predicted residues differed in general, especially when low sequence identity templates were used. Therefore, under different sequence identity cutoffs, we rebuilt the PDB dataset to include only residues that could be reproduced in the model. Performance evaluated on the homology models built using the five different sequence identity thresholds are listed in Figure 3. We can see that, even at a lowest sequence identity threshold (<30%), incorporating structural features was significantly better than using sequence features alone. Moreover, when high quality but non-identical templates were used (identity <100%), the AUC was nearly identical to that of the bound structure. These results imply that the aaRNA method is robust against typical levels of input noise.

Bottom Line: Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers.These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions.We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan standley@ifrec.osaka-u.ac.jp.

Show MeSH