Limits...
SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERs.

Randall A, Baldi P - BMC Struct. Biol. (2008)

Bottom Line: The purpose of this work was to develop a structure-based model selection method based on predicted structural features that could be applied successfully to any set of models.Novel and unique energy terms include predicted secondary structure, predicted solvent accessibility, predicted contact map, beta-strand pairing, and side-chain hydrogen bonding.SELECTpro participated in the new model quality assessment (QA) category in CASP7, submitting predictions for all 95 targets and achieved top results.SELECTpro is an effective model selection method that scores models independently and is appropriate for use on any model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Information and Computer Sciences, University of California, Irvine, CA 92697, USA. arandall@ics.uci.edu

ABSTRACT

Background: Protein tertiary structure prediction is a fundamental problem in computational biology and identifying the most native-like model from a set of predicted models is a key sub-problem. Consensus methods work well when the redundant models in the set are the most native-like, but fail when the most native-like model is unique. In contrast, structure-based methods score models independently and can be applied to model sets of any size and redundancy level. Additionally, structure-based methods have a variety of important applications including analogous fold recognition, refinement of sequence-structure alignments, and de novo prediction. The purpose of this work was to develop a structure-based model selection method based on predicted structural features that could be applied successfully to any set of models.

Results: Here we introduce SELECTpro, a novel structure-based model selection method derived from an energy function comprising physical, statistical, and predicted structural terms. Novel and unique energy terms include predicted secondary structure, predicted solvent accessibility, predicted contact map, beta-strand pairing, and side-chain hydrogen bonding.SELECTpro participated in the new model quality assessment (QA) category in CASP7, submitting predictions for all 95 targets and achieved top results. The average difference in GDT-TS between models ranked first by SELECTpro and the most native-like model was 5.07. This GDT-TS difference was less than 1% of the GDT-TS of the most native-like model for 18 targets, and less than 10% for 66 targets. SELECTpro also ranked the single most native-like first for 15 targets, in the top five for 39 targets, and in the top ten for 53 targets, more often than any other method. Because the ranking metric is skewed by model redundancy and ignores poor models with a better ranking than the most native-like model, the BLUNDER metric is introduced to overcome these limitations. SELECTpro is also evaluated on a recent benchmark set of 16 small proteins with large decoy sets of 12500 to 20000 models for each protein, where it outperforms the benchmarked method (I-TASSER).

Conclusion: SELECTpro is an effective model selection method that scores models independently and is appropriate for use on any model set. SELECTpro is available for download as a stand alone application at: http://www.igb.uci.edu/~baldig/selectpro.html. SELECTpro is also available as a public server at the same site.

Show MeSH
Recovery of Mmax using SetComplete. (A) number of targets where Mmax is ranked first (top of dark blue bar), in the top five (top of gray bar), and in the top ten (top of white bar). (B) number of targets where ΔGDTBLUNDER% is less than 10% (top of light blue bar) and less than 20% (top of purple bar). Only the first ten groups are shown in both graphs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2667183&req=5

Figure 1: Recovery of Mmax using SetComplete. (A) number of targets where Mmax is ranked first (top of dark blue bar), in the top five (top of gray bar), and in the top ten (top of white bar). (B) number of targets where ΔGDTBLUNDER% is less than 10% (top of light blue bar) and less than 20% (top of purple bar). Only the first ten groups are shown in both graphs.

Mentions: The assessment of the recovery of the most native-like model, is performed on both SetAll and SetComplete (Table 2) because the few cases where an incomplete model is the most native-like have a significant effect on the average recovery metrics of all QA groups. Incomplete and irregular models are especially challenging for structure-based methods. A comparison of the average Pearson Correlation on SetAll and SetComplete, highlights these issues (Table 3). The frequency of recovering the most native-like model is calculated on SetComplete (Figure 1).


SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERs.

Randall A, Baldi P - BMC Struct. Biol. (2008)

Recovery of Mmax using SetComplete. (A) number of targets where Mmax is ranked first (top of dark blue bar), in the top five (top of gray bar), and in the top ten (top of white bar). (B) number of targets where ΔGDTBLUNDER% is less than 10% (top of light blue bar) and less than 20% (top of purple bar). Only the first ten groups are shown in both graphs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2667183&req=5

Figure 1: Recovery of Mmax using SetComplete. (A) number of targets where Mmax is ranked first (top of dark blue bar), in the top five (top of gray bar), and in the top ten (top of white bar). (B) number of targets where ΔGDTBLUNDER% is less than 10% (top of light blue bar) and less than 20% (top of purple bar). Only the first ten groups are shown in both graphs.
Mentions: The assessment of the recovery of the most native-like model, is performed on both SetAll and SetComplete (Table 2) because the few cases where an incomplete model is the most native-like have a significant effect on the average recovery metrics of all QA groups. Incomplete and irregular models are especially challenging for structure-based methods. A comparison of the average Pearson Correlation on SetAll and SetComplete, highlights these issues (Table 3). The frequency of recovering the most native-like model is calculated on SetComplete (Figure 1).

Bottom Line: The purpose of this work was to develop a structure-based model selection method based on predicted structural features that could be applied successfully to any set of models.Novel and unique energy terms include predicted secondary structure, predicted solvent accessibility, predicted contact map, beta-strand pairing, and side-chain hydrogen bonding.SELECTpro participated in the new model quality assessment (QA) category in CASP7, submitting predictions for all 95 targets and achieved top results.SELECTpro is an effective model selection method that scores models independently and is appropriate for use on any model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Information and Computer Sciences, University of California, Irvine, CA 92697, USA. arandall@ics.uci.edu

ABSTRACT

Background: Protein tertiary structure prediction is a fundamental problem in computational biology and identifying the most native-like model from a set of predicted models is a key sub-problem. Consensus methods work well when the redundant models in the set are the most native-like, but fail when the most native-like model is unique. In contrast, structure-based methods score models independently and can be applied to model sets of any size and redundancy level. Additionally, structure-based methods have a variety of important applications including analogous fold recognition, refinement of sequence-structure alignments, and de novo prediction. The purpose of this work was to develop a structure-based model selection method based on predicted structural features that could be applied successfully to any set of models.

Results: Here we introduce SELECTpro, a novel structure-based model selection method derived from an energy function comprising physical, statistical, and predicted structural terms. Novel and unique energy terms include predicted secondary structure, predicted solvent accessibility, predicted contact map, beta-strand pairing, and side-chain hydrogen bonding.SELECTpro participated in the new model quality assessment (QA) category in CASP7, submitting predictions for all 95 targets and achieved top results. The average difference in GDT-TS between models ranked first by SELECTpro and the most native-like model was 5.07. This GDT-TS difference was less than 1% of the GDT-TS of the most native-like model for 18 targets, and less than 10% for 66 targets. SELECTpro also ranked the single most native-like first for 15 targets, in the top five for 39 targets, and in the top ten for 53 targets, more often than any other method. Because the ranking metric is skewed by model redundancy and ignores poor models with a better ranking than the most native-like model, the BLUNDER metric is introduced to overcome these limitations. SELECTpro is also evaluated on a recent benchmark set of 16 small proteins with large decoy sets of 12500 to 20000 models for each protein, where it outperforms the benchmarked method (I-TASSER).

Conclusion: SELECTpro is an effective model selection method that scores models independently and is appropriate for use on any model set. SELECTpro is available for download as a stand alone application at: http://www.igb.uci.edu/~baldig/selectpro.html. SELECTpro is also available as a public server at the same site.

Show MeSH