Limits...
Quantifying sequence and structural features of protein-RNA interactions.

Li S, Yamashita K, Amada KM, Standley DM - Nucleic Acids Res. (2014)

Bottom Line: Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers.These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions.We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan standley@ifrec.osaka-u.ac.jp.

Show MeSH
Novel features used in this work. (A) EC. A surface representation of the class-I A. fulgidus CCA-adding enzyme bound to a tRNA fragment (PDB ID: 3OVB). A distance map between protein and bound RNA with near (far) residues colored red (blue) is shown on the left. The EC value with high (low) colored red (blue) is shown on the right. (B) LN under a series of scales. LN values increase from blue to red. At each granularity level, warmer colors indicate convex residues, while cooler color represents concave residues. (C) Solvent ASA. A surface representation of RNase Cas6 (PDB ID: 4ILL) is shown. The protein makes both side-chain and backbone contacts with substrate RNA. Target residues (meshed) and nucleotides are represented by opaque sticks, connected by hydrogen bonds (dashed lines). The side chain of R268 protrudes and binds G15 (top). The backbone of Y168, which is mostly buried and forms part of a cleft, interacts with A5 (bottom). All figures of 3D structure representation in this work were generated by PyMOL Molecular Graphics System, Version 1.5, Schrödinger, LLC.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4150784&req=5

Figure 1: Novel features used in this work. (A) EC. A surface representation of the class-I A. fulgidus CCA-adding enzyme bound to a tRNA fragment (PDB ID: 3OVB). A distance map between protein and bound RNA with near (far) residues colored red (blue) is shown on the left. The EC value with high (low) colored red (blue) is shown on the right. (B) LN under a series of scales. LN values increase from blue to red. At each granularity level, warmer colors indicate convex residues, while cooler color represents concave residues. (C) Solvent ASA. A surface representation of RNase Cas6 (PDB ID: 4ILL) is shown. The protein makes both side-chain and backbone contacts with substrate RNA. Target residues (meshed) and nucleotides are represented by opaque sticks, connected by hydrogen bonds (dashed lines). The side chain of R268 protrudes and binds G15 (top). The backbone of Y168, which is mostly buried and forms part of a cleft, interacts with A5 (bottom). All figures of 3D structure representation in this work were generated by PyMOL Molecular Graphics System, Version 1.5, Schrödinger, LLC.

Mentions: The EC feature is illustrated in Figure 1A, using the class-I Archaeoglobus fulgidus CCA-adding enzyme bound to a tRNA fragment as an example. We found that for non-ribosomal and full datasets, the EC feature could improve the AUC by ∼1.3% and 0.8%, respectively, and also resulted in a better PR curve (Supplementary Figure S3) than the control method (sequence features used in the SRCPred method (12)). In order to quantify the information contained in each feature, we used EC and PSSM separately. We took the substitution frequency of each residue to itself in the PSSM profile, and normalized the frequencies via a logistic operator. We also included the 21-bit sparse coding feature and the GAC feature. The resulting AUCs were 0.7277 and 0.7075, respectively, for the EC- and PSSM-based model on the non-ribosomal dataset, and 0.8046 and 0.7942 on the complete dataset. These values verify that the EC feature contains additional information not found in the conservation values of the PSSM. We tested different E-value thresholds (1E-3, 1E-5 and 1E-10) for building MSA profiles, from which EC values were calculated. Using different E-values, a combination of E-values, or building a PSSM-like substitution matrix with occurrence frequencies for each of the 20 amino acid types did not result in an increase in performance. Therefore, the default E-value threshold was set to 1E-3. It should be noted that, depending on the number of homologous sequences in the database, the weight calculation step could be time-consuming. We were able to greatly speed this process up, however, by parallelization. After manually inspecting many known protein–RNA complexes, we could discern a rough correlation between residue conservation and distance to the bound RNA. As shown in Supplementary Figure S4A, the mean distance between protein surface residues and their bound RNAs was inversely related to the EC values. Moreover, RNA-binding residues were more enriched in large EC values than non-binding or background residues (Supplementary Figure S4B). However, as expected, conserved residues were not always near RNA binding sites.


Quantifying sequence and structural features of protein-RNA interactions.

Li S, Yamashita K, Amada KM, Standley DM - Nucleic Acids Res. (2014)

Novel features used in this work. (A) EC. A surface representation of the class-I A. fulgidus CCA-adding enzyme bound to a tRNA fragment (PDB ID: 3OVB). A distance map between protein and bound RNA with near (far) residues colored red (blue) is shown on the left. The EC value with high (low) colored red (blue) is shown on the right. (B) LN under a series of scales. LN values increase from blue to red. At each granularity level, warmer colors indicate convex residues, while cooler color represents concave residues. (C) Solvent ASA. A surface representation of RNase Cas6 (PDB ID: 4ILL) is shown. The protein makes both side-chain and backbone contacts with substrate RNA. Target residues (meshed) and nucleotides are represented by opaque sticks, connected by hydrogen bonds (dashed lines). The side chain of R268 protrudes and binds G15 (top). The backbone of Y168, which is mostly buried and forms part of a cleft, interacts with A5 (bottom). All figures of 3D structure representation in this work were generated by PyMOL Molecular Graphics System, Version 1.5, Schrödinger, LLC.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4150784&req=5

Figure 1: Novel features used in this work. (A) EC. A surface representation of the class-I A. fulgidus CCA-adding enzyme bound to a tRNA fragment (PDB ID: 3OVB). A distance map between protein and bound RNA with near (far) residues colored red (blue) is shown on the left. The EC value with high (low) colored red (blue) is shown on the right. (B) LN under a series of scales. LN values increase from blue to red. At each granularity level, warmer colors indicate convex residues, while cooler color represents concave residues. (C) Solvent ASA. A surface representation of RNase Cas6 (PDB ID: 4ILL) is shown. The protein makes both side-chain and backbone contacts with substrate RNA. Target residues (meshed) and nucleotides are represented by opaque sticks, connected by hydrogen bonds (dashed lines). The side chain of R268 protrudes and binds G15 (top). The backbone of Y168, which is mostly buried and forms part of a cleft, interacts with A5 (bottom). All figures of 3D structure representation in this work were generated by PyMOL Molecular Graphics System, Version 1.5, Schrödinger, LLC.
Mentions: The EC feature is illustrated in Figure 1A, using the class-I Archaeoglobus fulgidus CCA-adding enzyme bound to a tRNA fragment as an example. We found that for non-ribosomal and full datasets, the EC feature could improve the AUC by ∼1.3% and 0.8%, respectively, and also resulted in a better PR curve (Supplementary Figure S3) than the control method (sequence features used in the SRCPred method (12)). In order to quantify the information contained in each feature, we used EC and PSSM separately. We took the substitution frequency of each residue to itself in the PSSM profile, and normalized the frequencies via a logistic operator. We also included the 21-bit sparse coding feature and the GAC feature. The resulting AUCs were 0.7277 and 0.7075, respectively, for the EC- and PSSM-based model on the non-ribosomal dataset, and 0.8046 and 0.7942 on the complete dataset. These values verify that the EC feature contains additional information not found in the conservation values of the PSSM. We tested different E-value thresholds (1E-3, 1E-5 and 1E-10) for building MSA profiles, from which EC values were calculated. Using different E-values, a combination of E-values, or building a PSSM-like substitution matrix with occurrence frequencies for each of the 20 amino acid types did not result in an increase in performance. Therefore, the default E-value threshold was set to 1E-3. It should be noted that, depending on the number of homologous sequences in the database, the weight calculation step could be time-consuming. We were able to greatly speed this process up, however, by parallelization. After manually inspecting many known protein–RNA complexes, we could discern a rough correlation between residue conservation and distance to the bound RNA. As shown in Supplementary Figure S4A, the mean distance between protein surface residues and their bound RNAs was inversely related to the EC values. Moreover, RNA-binding residues were more enriched in large EC values than non-binding or background residues (Supplementary Figure S4B). However, as expected, conserved residues were not always near RNA binding sites.

Bottom Line: Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers.These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions.We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Systems Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan standley@ifrec.osaka-u.ac.jp.

Show MeSH