Limits...
Impact of residue accessible surface area on the prediction of protein secondary structures.

Momen-Roknabadi A, Sadeghi M, Pezeshk H, Marashi SA - BMC Bioinformatics (2008)

Bottom Line: It has been previously suggested that amino acid relative solvent accessibility (RSA) might be an effective factor for increasing the accuracy of protein secondary structure prediction.The success of applying the RSA information on different secondary structure prediction methods suggest that prediction accuracy can be improved independent of prediction approaches.Thus, solvent accessibility can be considered as a rich source of information to help the improvement of these methods.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran. roknabadi@khayam.ut.ac.ir

ABSTRACT

Background: The problem of accurate prediction of protein secondary structure continues to be one of the challenging problems in Bioinformatics. It has been previously suggested that amino acid relative solvent accessibility (RSA) might be an effective factor for increasing the accuracy of protein secondary structure prediction. Previous studies have either used a single constant threshold to classify residues into discrete classes (buries vs. exposed), or used the real-value predicted RSAs in their prediction method.

Results: We studied the effect of applying different RSA threshold types (namely, fixed thresholds vs. residue-dependent thresholds) on a variety of secondary structure prediction methods. With the consideration of DSSP-assigned RSA values we realized that improvement in the accuracy of prediction strictly depends on the selected threshold(s). Furthermore, we showed that choosing a single threshold for all amino acids is not the best possible parameter. We therefore used residue-dependent thresholds and most of residues showed improvement in prediction. Next, we tried to consider predicted RSA values, since in the real-world problem, protein sequence is the only available information. We first predicted the RSA classes by RVP-net program and then used these data in our method. Using this approach, improvement in prediction was also obtained.

Conclusion: The success of applying the RSA information on different secondary structure prediction methods suggest that prediction accuracy can be improved independent of prediction approaches. Thus, solvent accessibility can be considered as a rich source of information to help the improvement of these methods.

Show MeSH

Related in: MedlinePlus

Correlations between observed and predicted values of RSA for different ranges of solvent exposure, scaled to [0,1] interval. The density of vectors is normalized in each column independently. Boxes with maximum density are marked in black, while boxes with minimum density are shown in white. Other colors are selected proportionally to the densities.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553345&req=5

Figure 4: Correlations between observed and predicted values of RSA for different ranges of solvent exposure, scaled to [0,1] interval. The density of vectors is normalized in each column independently. Boxes with maximum density are marked in black, while boxes with minimum density are shown in white. Other colors are selected proportionally to the densities.

Mentions: The reason for such a difference lies presumably in the nature of Chou-Fasman algorithm. In this algorithm one must first calculate helix and strand residues and then predict the coil residues. The RSA for strand residues are generally less than 50%. We used RVP-net program to predict the required RSAs. Correlations between observed and predicted values of RSA for different ranges of solvent exposure are shown in Figure 4. This Figure suggests that residues with RSA less than 50% are generally significantly underestimated. Thus when we used these data for SS prediction, residues in strand conformation might be inaccurately predicted. In Chou-Fasman algorithm this will also result in incorrect prediction of coils. For two-state RSA assumption, this problem is not a major one, since many residues in each class are still predicted correctly. However, when we classified the RSA data into three groups (using residue specific thresholds, which are typically less than 50%) this problem was intensified, since for the residues with the intermediate RSA, only a small ratio of them are correctly classified as intermediate, and most of them were wrongly categorized as buried.


Impact of residue accessible surface area on the prediction of protein secondary structures.

Momen-Roknabadi A, Sadeghi M, Pezeshk H, Marashi SA - BMC Bioinformatics (2008)

Correlations between observed and predicted values of RSA for different ranges of solvent exposure, scaled to [0,1] interval. The density of vectors is normalized in each column independently. Boxes with maximum density are marked in black, while boxes with minimum density are shown in white. Other colors are selected proportionally to the densities.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553345&req=5

Figure 4: Correlations between observed and predicted values of RSA for different ranges of solvent exposure, scaled to [0,1] interval. The density of vectors is normalized in each column independently. Boxes with maximum density are marked in black, while boxes with minimum density are shown in white. Other colors are selected proportionally to the densities.
Mentions: The reason for such a difference lies presumably in the nature of Chou-Fasman algorithm. In this algorithm one must first calculate helix and strand residues and then predict the coil residues. The RSA for strand residues are generally less than 50%. We used RVP-net program to predict the required RSAs. Correlations between observed and predicted values of RSA for different ranges of solvent exposure are shown in Figure 4. This Figure suggests that residues with RSA less than 50% are generally significantly underestimated. Thus when we used these data for SS prediction, residues in strand conformation might be inaccurately predicted. In Chou-Fasman algorithm this will also result in incorrect prediction of coils. For two-state RSA assumption, this problem is not a major one, since many residues in each class are still predicted correctly. However, when we classified the RSA data into three groups (using residue specific thresholds, which are typically less than 50%) this problem was intensified, since for the residues with the intermediate RSA, only a small ratio of them are correctly classified as intermediate, and most of them were wrongly categorized as buried.

Bottom Line: It has been previously suggested that amino acid relative solvent accessibility (RSA) might be an effective factor for increasing the accuracy of protein secondary structure prediction.The success of applying the RSA information on different secondary structure prediction methods suggest that prediction accuracy can be improved independent of prediction approaches.Thus, solvent accessibility can be considered as a rich source of information to help the improvement of these methods.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran. roknabadi@khayam.ut.ac.ir

ABSTRACT

Background: The problem of accurate prediction of protein secondary structure continues to be one of the challenging problems in Bioinformatics. It has been previously suggested that amino acid relative solvent accessibility (RSA) might be an effective factor for increasing the accuracy of protein secondary structure prediction. Previous studies have either used a single constant threshold to classify residues into discrete classes (buries vs. exposed), or used the real-value predicted RSAs in their prediction method.

Results: We studied the effect of applying different RSA threshold types (namely, fixed thresholds vs. residue-dependent thresholds) on a variety of secondary structure prediction methods. With the consideration of DSSP-assigned RSA values we realized that improvement in the accuracy of prediction strictly depends on the selected threshold(s). Furthermore, we showed that choosing a single threshold for all amino acids is not the best possible parameter. We therefore used residue-dependent thresholds and most of residues showed improvement in prediction. Next, we tried to consider predicted RSA values, since in the real-world problem, protein sequence is the only available information. We first predicted the RSA classes by RVP-net program and then used these data in our method. Using this approach, improvement in prediction was also obtained.

Conclusion: The success of applying the RSA information on different secondary structure prediction methods suggest that prediction accuracy can be improved independent of prediction approaches. Thus, solvent accessibility can be considered as a rich source of information to help the improvement of these methods.

Show MeSH
Related in: MedlinePlus