Limits...
Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

Li Y, Rata I, Chiu SW, Jakobsson E - BMC Struct. Biol. (2010)

Bottom Line: Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets.Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops.By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. yaohang@cs.odu.edu

ABSTRACT

Background: Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

Results: We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.

Conclusions: By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

Show MeSH
RMSD-Score Plot of 1onc(70:78) Decoy Set in Various Scoring Functions
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2914074&req=5

Figure 1: RMSD-Score Plot of 1onc(70:78) Decoy Set in Various Scoring Functions

Mentions: There are problems in theoretical justification of both the physics- and knowledge-based scoring functions for protein structure modeling. Ideally, a physics-based scoring function would be evaluated with quantum mechanics, in which case the score could reflect the true energy. In computation practice, quantum mechanics is wildly intractable due to the size of protein molecule. As a compromise, the physics-based scoring functions (force fields) are developed mainly based on classical physics to approximate the true energy of a protein molecule. On the other hand, the knowledge-based functions derive their rules from the existing experimental structure data, typically by applying the inverse Boltzmann law. However, because compared to the unknown structures, the known structures are in an extremely small fraction, the data used to develop knowledge-based functions are potentially undersampled [36,37]. Moreover, studies have shown that inter-residue interactions may not be considered as independent factors [38,39], which violates the assumption of inverse Boltzmann law. In consequence, all these aspects led to inaccuracy or insensitivity factors in the existing scoring functions for protein loop modeling, as is true in overall protein structure modeling. That is, in practice, the native conformation usually does not exhibit the lowest score when it is put among the models generated by the computer simulation program [40]. Moreover, in the low score regions, a conformation with a relatively higher score may in fact be a more reasonable structure than the one with a lower score. The score-RMSD plots in Figure 1 show that in the decoy set of 1onc(70:78), the best model (0.17A RMSD from the native) never yields the lowest score in DFIRE [21], triplet backbone dihedral potential [28], OPLS-AA/SGB [31,32], Rosetta [41], or DOPE [42], which strongly indicates insensitivity in each individual scoring function.


Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

Li Y, Rata I, Chiu SW, Jakobsson E - BMC Struct. Biol. (2010)

RMSD-Score Plot of 1onc(70:78) Decoy Set in Various Scoring Functions
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2914074&req=5

Figure 1: RMSD-Score Plot of 1onc(70:78) Decoy Set in Various Scoring Functions
Mentions: There are problems in theoretical justification of both the physics- and knowledge-based scoring functions for protein structure modeling. Ideally, a physics-based scoring function would be evaluated with quantum mechanics, in which case the score could reflect the true energy. In computation practice, quantum mechanics is wildly intractable due to the size of protein molecule. As a compromise, the physics-based scoring functions (force fields) are developed mainly based on classical physics to approximate the true energy of a protein molecule. On the other hand, the knowledge-based functions derive their rules from the existing experimental structure data, typically by applying the inverse Boltzmann law. However, because compared to the unknown structures, the known structures are in an extremely small fraction, the data used to develop knowledge-based functions are potentially undersampled [36,37]. Moreover, studies have shown that inter-residue interactions may not be considered as independent factors [38,39], which violates the assumption of inverse Boltzmann law. In consequence, all these aspects led to inaccuracy or insensitivity factors in the existing scoring functions for protein loop modeling, as is true in overall protein structure modeling. That is, in practice, the native conformation usually does not exhibit the lowest score when it is put among the models generated by the computer simulation program [40]. Moreover, in the low score regions, a conformation with a relatively higher score may in fact be a more reasonable structure than the one with a lower score. The score-RMSD plots in Figure 1 show that in the decoy set of 1onc(70:78), the best model (0.17A RMSD from the native) never yields the lowest score in DFIRE [21], triplet backbone dihedral potential [28], OPLS-AA/SGB [31,32], Rosetta [41], or DOPE [42], which strongly indicates insensitivity in each individual scoring function.

Bottom Line: Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets.Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops.By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. yaohang@cs.odu.edu

ABSTRACT

Background: Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

Results: We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.

Conclusions: By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

Show MeSH