Limits...
Recognition of 27-class protein folds by adding the interaction of segments and motif information.

Feng Z, Hu X - Biomed Res Int (2014)

Bottom Line: After the recognition of 27-class protein folds in 2001 by Ding and Dubchak, prediction algorithms, prediction parameters, and new datasets for the prediction of protein folds have been improved.The overall accuracy of the testing set and structural classification measured up to 78.38% and 92.55%, respectively.In order to compare with the results of previous researchers, the method above was tested on Ding and Dubchak's dataset which has been widely used by many previous researchers, and an improved overall accuracy 70.24% was obtained.

View Article: PubMed Central - PubMed

Affiliation: Department of Sciences, Inner Mongolia University of Technology, Hohhot, China.

ABSTRACT
The recognition of protein folds is an important step for the prediction of protein structure and function. After the recognition of 27-class protein folds in 2001 by Ding and Dubchak, prediction algorithms, prediction parameters, and new datasets for the prediction of protein folds have been improved. However, the influences of interactions from predicted secondary structure segments and motif information on protein folding have not been considered. Therefore, the recognition of 27-class protein folds with the interaction of segments and motif information is very important. Based on the 27-class folds dataset built by Liu et al., amino acid composition, the interactions of secondary structure segments, motif frequency, and predicted secondary structure information were extracted. Using the Random Forest algorithm and the ensemble classification strategy, 27-class protein folds and corresponding structural classification were identified by independent test. The overall accuracy of the testing set and structural classification measured up to 78.38% and 92.55%, respectively. When the training set and testing set were combined, the overall accuracy by 5-fold cross validation was 81.16%. In order to compare with the results of previous researchers, the method above was tested on Ding and Dubchak's dataset which has been widely used by many previous researchers, and an improved overall accuracy 70.24% was obtained.

Show MeSH

Related in: MedlinePlus

The numbers of sequences containing secondary structure segments. (a) and (b) are for training set and testing set, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4127253&req=5

fig1: The numbers of sequences containing secondary structure segments. (a) and (b) are for training set and testing set, respectively.

Mentions: (2) The Selection of the Maximum Value of lg. The statistical analysis of the number of the secondary structure segments in the 27-class folds dataset is shown in Figure 1. The abscissa represents the number of secondary structure segments. The ordinate represents the number of sequences. The percentage of sequences that contained less than five secondary structure segments was below 0.5%. The maximum value of lg was selected as 4 (max (lg) = 4).


Recognition of 27-class protein folds by adding the interaction of segments and motif information.

Feng Z, Hu X - Biomed Res Int (2014)

The numbers of sequences containing secondary structure segments. (a) and (b) are for training set and testing set, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4127253&req=5

fig1: The numbers of sequences containing secondary structure segments. (a) and (b) are for training set and testing set, respectively.
Mentions: (2) The Selection of the Maximum Value of lg. The statistical analysis of the number of the secondary structure segments in the 27-class folds dataset is shown in Figure 1. The abscissa represents the number of secondary structure segments. The ordinate represents the number of sequences. The percentage of sequences that contained less than five secondary structure segments was below 0.5%. The maximum value of lg was selected as 4 (max (lg) = 4).

Bottom Line: After the recognition of 27-class protein folds in 2001 by Ding and Dubchak, prediction algorithms, prediction parameters, and new datasets for the prediction of protein folds have been improved.The overall accuracy of the testing set and structural classification measured up to 78.38% and 92.55%, respectively.In order to compare with the results of previous researchers, the method above was tested on Ding and Dubchak's dataset which has been widely used by many previous researchers, and an improved overall accuracy 70.24% was obtained.

View Article: PubMed Central - PubMed

Affiliation: Department of Sciences, Inner Mongolia University of Technology, Hohhot, China.

ABSTRACT
The recognition of protein folds is an important step for the prediction of protein structure and function. After the recognition of 27-class protein folds in 2001 by Ding and Dubchak, prediction algorithms, prediction parameters, and new datasets for the prediction of protein folds have been improved. However, the influences of interactions from predicted secondary structure segments and motif information on protein folding have not been considered. Therefore, the recognition of 27-class protein folds with the interaction of segments and motif information is very important. Based on the 27-class folds dataset built by Liu et al., amino acid composition, the interactions of secondary structure segments, motif frequency, and predicted secondary structure information were extracted. Using the Random Forest algorithm and the ensemble classification strategy, 27-class protein folds and corresponding structural classification were identified by independent test. The overall accuracy of the testing set and structural classification measured up to 78.38% and 92.55%, respectively. When the training set and testing set were combined, the overall accuracy by 5-fold cross validation was 81.16%. In order to compare with the results of previous researchers, the method above was tested on Ding and Dubchak's dataset which has been widely used by many previous researchers, and an improved overall accuracy 70.24% was obtained.

Show MeSH
Related in: MedlinePlus