Limits...
Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

Li Y, Rata I, Chiu SW, Jakobsson E - BMC Struct. Biol. (2010)

Bottom Line: Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets.Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops.By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. yaohang@cs.odu.edu

ABSTRACT

Background: Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

Results: We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.

Conclusions: By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

Show MeSH
Effectiveness of the Pareto optimal decoys. The best decoy with minimum RMSD, or one very close to the best decoy (< 0.1A) are within the Pareto optimal decoys in 9-residue loop targets
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2914074&req=5

Figure 6: Effectiveness of the Pareto optimal decoys. The best decoy with minimum RMSD, or one very close to the best decoy (< 0.1A) are within the Pareto optimal decoys in 9-residue loop targets

Mentions: Because in the POC method, selection and ranking are based on Pareto optimality, the quality of the Pareto-optimal models is critical. The Pareto-optimal models include not only those optimums in individual scoring functions, but also the non-dominated ones yielding certain optimality in the (linear or non-linear) combination of various scoring functions. In our computational experiment, five scoring functions, including Rosetta, DDFIRE, DOPE, triplet backbone dihedral, and OPLS-AA/SGB, are selected to form the function space. Figure 4 shows that the average number of the Pareto optimal decoys is around 20% or less of the total number of decoys in the Jacobson's decoy sets for 4- to 12-residue targets. As shown in Figure 5, the Pareto optimal decoys have efficient coverage of the best decoy or one close to the best decoy in a target's decoy set. In more than 82% of the loop targets, the Pareto-optimal decoys include the best decoy of the target, whereas in more than 97% of the loop targets, the Pareto-optimal decoys include decoys within 0.1A RMSD to the best one. Moreover, 501 out of 502 targets include decoys within 0.4A RMSD cutoff to the best decoy. Figure 6 shows the RMSD distribution of the decoys in the sets corresponding to the 9-residue loop targets as well as the coverage of the Parento optimal decoys. One can find that in most of the 9-residue targets, the very best decoy is in the Pareto-optimal decoy set, which typically contains only 5%~20% of the decoys from the original decoy set. This indicates that the selected scoring functions can efficiently identify a much smaller set of decoys that contains the best decoy or one very close to the best.


Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

Li Y, Rata I, Chiu SW, Jakobsson E - BMC Struct. Biol. (2010)

Effectiveness of the Pareto optimal decoys. The best decoy with minimum RMSD, or one very close to the best decoy (< 0.1A) are within the Pareto optimal decoys in 9-residue loop targets
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2914074&req=5

Figure 6: Effectiveness of the Pareto optimal decoys. The best decoy with minimum RMSD, or one very close to the best decoy (< 0.1A) are within the Pareto optimal decoys in 9-residue loop targets
Mentions: Because in the POC method, selection and ranking are based on Pareto optimality, the quality of the Pareto-optimal models is critical. The Pareto-optimal models include not only those optimums in individual scoring functions, but also the non-dominated ones yielding certain optimality in the (linear or non-linear) combination of various scoring functions. In our computational experiment, five scoring functions, including Rosetta, DDFIRE, DOPE, triplet backbone dihedral, and OPLS-AA/SGB, are selected to form the function space. Figure 4 shows that the average number of the Pareto optimal decoys is around 20% or less of the total number of decoys in the Jacobson's decoy sets for 4- to 12-residue targets. As shown in Figure 5, the Pareto optimal decoys have efficient coverage of the best decoy or one close to the best decoy in a target's decoy set. In more than 82% of the loop targets, the Pareto-optimal decoys include the best decoy of the target, whereas in more than 97% of the loop targets, the Pareto-optimal decoys include decoys within 0.1A RMSD to the best one. Moreover, 501 out of 502 targets include decoys within 0.4A RMSD cutoff to the best decoy. Figure 6 shows the RMSD distribution of the decoys in the sets corresponding to the 9-residue loop targets as well as the coverage of the Parento optimal decoys. One can find that in most of the 9-residue targets, the very best decoy is in the Pareto-optimal decoy set, which typically contains only 5%~20% of the decoys from the original decoy set. This indicates that the selected scoring functions can efficiently identify a much smaller set of decoys that contains the best decoy or one very close to the best.

Bottom Line: Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets.Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops.By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. yaohang@cs.odu.edu

ABSTRACT

Background: Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

Results: We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.

Conclusions: By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

Show MeSH