Limits...
Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection.

Ma X, Guo J, Sun X - Biomed Res Int (2015)

Bottom Line: The results showed that these novel features have important roles in improving the performance of the predictor.Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient).High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

View Article: PubMed Central - PubMed

Affiliation: Golden Audit College, Nanjing Audit University, Nanjing 210029, China.

ABSTRACT
The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

No MeSH data available.


(a) Physicochemical property distribution to construct the 19 EIPP features that were selected in the optimal feature set. (b) The type of amino acids distribution to construct the 19 EIPP features that were selected in the optimal feature set.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4620426&req=5

fig3: (a) Physicochemical property distribution to construct the 19 EIPP features that were selected in the optimal feature set. (b) The type of amino acids distribution to construct the 19 EIPP features that were selected in the optimal feature set.

Mentions: We selected 19 EIPP features in the optimal feature set after using the mRMR-IFS method. Considering that EIPP was constructed by the evolutionary information of each type of amino acid in sequences and physicochemical property, we collected the statistics of the number of each type of amino acid and the number of each type of physicochemical property that constituted the 19 EIPP features. Figures 3(a) and 3(b) show the contributions of the number of each type of physicochemical property and the number of each type of amino acid, respectively.


Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection.

Ma X, Guo J, Sun X - Biomed Res Int (2015)

(a) Physicochemical property distribution to construct the 19 EIPP features that were selected in the optimal feature set. (b) The type of amino acids distribution to construct the 19 EIPP features that were selected in the optimal feature set.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4620426&req=5

fig3: (a) Physicochemical property distribution to construct the 19 EIPP features that were selected in the optimal feature set. (b) The type of amino acids distribution to construct the 19 EIPP features that were selected in the optimal feature set.
Mentions: We selected 19 EIPP features in the optimal feature set after using the mRMR-IFS method. Considering that EIPP was constructed by the evolutionary information of each type of amino acid in sequences and physicochemical property, we collected the statistics of the number of each type of amino acid and the number of each type of physicochemical property that constituted the 19 EIPP features. Figures 3(a) and 3(b) show the contributions of the number of each type of physicochemical property and the number of each type of amino acid, respectively.

Bottom Line: The results showed that these novel features have important roles in improving the performance of the predictor.Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient).High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

View Article: PubMed Central - PubMed

Affiliation: Golden Audit College, Nanjing Audit University, Nanjing 210029, China.

ABSTRACT
The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

No MeSH data available.