Limits...
Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

Li Y, Rata I, Chiu SW, Jakobsson E - BMC Struct. Biol. (2010)

Bottom Line: Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets.Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops.By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. yaohang@cs.odu.edu

ABSTRACT

Background: Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

Results: We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.

Conclusions: By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

Show MeSH
Analysis of the native loop 1hbq(31:38). The hydrophobic residues Phe36-Leu37 (enclosed by the surface) are buried in a stable protein hydrophobic core, being surrounded by many carbon atoms.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2914074&req=5

Figure 16: Analysis of the native loop 1hbq(31:38). The hydrophobic residues Phe36-Leu37 (enclosed by the surface) are buried in a stable protein hydrophobic core, being surrounded by many carbon atoms.

Mentions: The only case, from all the 502 loop targets studied here, where POC fails to capture a native-like structure (within 0.5A cutoff) on the Parento Optimal Front, is the 1hbq(31:38) loop. This also means that none of the individual scoring functions can correctly identify a decoy with native-like conformation as the top-ranked one. 1hbq(31:38) is a special case where the native loop energy is a compromise between a poor loop internal energy and a very favorable energy of interaction with the rest of the protein. The loop internal energy is compromised by the Leu37 residue with an unfavorable backbone conformation (phi = 54°, psi = 93°) that scores poorly in every potential function. On the other hand, this residue participates, together with its loop neighbor Phe36, in a network of favorable hydrophobic interactions involving many atoms from the protein frame. In Figure 16, the two loop residues are shown enclosed by their external surface, together with the protein frame atoms that are closest to their side-chains. With the exception of a sulfur atom, all the other surrounding atoms are hydrophobic carbons. The two loop side-chains are hydrophobic themselves and form a hydrophobic core that is very favorable for the protein stability.


Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

Li Y, Rata I, Chiu SW, Jakobsson E - BMC Struct. Biol. (2010)

Analysis of the native loop 1hbq(31:38). The hydrophobic residues Phe36-Leu37 (enclosed by the surface) are buried in a stable protein hydrophobic core, being surrounded by many carbon atoms.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2914074&req=5

Figure 16: Analysis of the native loop 1hbq(31:38). The hydrophobic residues Phe36-Leu37 (enclosed by the surface) are buried in a stable protein hydrophobic core, being surrounded by many carbon atoms.
Mentions: The only case, from all the 502 loop targets studied here, where POC fails to capture a native-like structure (within 0.5A cutoff) on the Parento Optimal Front, is the 1hbq(31:38) loop. This also means that none of the individual scoring functions can correctly identify a decoy with native-like conformation as the top-ranked one. 1hbq(31:38) is a special case where the native loop energy is a compromise between a poor loop internal energy and a very favorable energy of interaction with the rest of the protein. The loop internal energy is compromised by the Leu37 residue with an unfavorable backbone conformation (phi = 54°, psi = 93°) that scores poorly in every potential function. On the other hand, this residue participates, together with its loop neighbor Phe36, in a network of favorable hydrophobic interactions involving many atoms from the protein frame. In Figure 16, the two loop residues are shown enclosed by their external surface, together with the protein frame atoms that are closest to their side-chains. With the exception of a sulfur atom, all the other surrounding atoms are hydrophobic carbons. The two loop side-chains are hydrophobic themselves and form a hydrophobic core that is very favorable for the protein stability.

Bottom Line: Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets.Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops.By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. yaohang@cs.odu.edu

ABSTRACT

Background: Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

Results: We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.

Conclusions: By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

Show MeSH