Limits...
Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

Li Y, Rata I, Chiu SW, Jakobsson E - BMC Struct. Biol. (2010)

Bottom Line: Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets.Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops.By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. yaohang@cs.odu.edu

ABSTRACT

Background: Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

Results: We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.

Conclusions: By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

Show MeSH
The optimal decoy selected by our triplet potential for loop 1fus(28:38). The decoy makes internal hydrogen bonds (black dashed lines) but few contacts with the protein frame.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2914074&req=5

Figure 15: The optimal decoy selected by our triplet potential for loop 1fus(28:38). The decoy makes internal hydrogen bonds (black dashed lines) but few contacts with the protein frame.

Mentions: For the test case of 1fus(28:38) loop target, Rosetta and the triplet scoring functions select, as best scored, decoys with RMSD > 3A from the native structure, albeit the other scoring functions as well as POC can correctly identify the decoy with best resolution. The decoy selected by triplet has a better scoring torsion angle combination, allowing for some favorable near residue neighbor interactions depicted as black dashed lines in Figure 15. These are either local backbone to backbone or side-chain to backbone hydrogen bonds. Our triplet scoring function evaluates a loop only by its internal local interactions. The cause this decoy is highly deviated from the native loop structure is that it makes few tertiary contacts with the rest of the protein, being rather detached from it (as shown Figure 15), it is solvated and unfolded, and therefore, cannot be stable. This is the reason for which the triplet scoring function should be used in conjunction with other distance-based potentials, a task that is accomplished here by our consensus POC method, which performs well for this target despite the triplet and Rosetta failures.


Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

Li Y, Rata I, Chiu SW, Jakobsson E - BMC Struct. Biol. (2010)

The optimal decoy selected by our triplet potential for loop 1fus(28:38). The decoy makes internal hydrogen bonds (black dashed lines) but few contacts with the protein frame.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2914074&req=5

Figure 15: The optimal decoy selected by our triplet potential for loop 1fus(28:38). The decoy makes internal hydrogen bonds (black dashed lines) but few contacts with the protein frame.
Mentions: For the test case of 1fus(28:38) loop target, Rosetta and the triplet scoring functions select, as best scored, decoys with RMSD > 3A from the native structure, albeit the other scoring functions as well as POC can correctly identify the decoy with best resolution. The decoy selected by triplet has a better scoring torsion angle combination, allowing for some favorable near residue neighbor interactions depicted as black dashed lines in Figure 15. These are either local backbone to backbone or side-chain to backbone hydrogen bonds. Our triplet scoring function evaluates a loop only by its internal local interactions. The cause this decoy is highly deviated from the native loop structure is that it makes few tertiary contacts with the rest of the protein, being rather detached from it (as shown Figure 15), it is solvated and unfolded, and therefore, cannot be stable. This is the reason for which the triplet scoring function should be used in conjunction with other distance-based potentials, a task that is accomplished here by our consensus POC method, which performs well for this target despite the triplet and Rosetta failures.

Bottom Line: Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets.Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops.By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. yaohang@cs.odu.edu

ABSTRACT

Background: Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

Results: We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.

Conclusions: By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

Show MeSH