Limits...
Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

Li Y, Rata I, Chiu SW, Jakobsson E - BMC Struct. Biol. (2010)

Bottom Line: Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets.Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops.By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. yaohang@cs.odu.edu

ABSTRACT

Background: Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

Results: We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.

Conclusions: By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

Show MeSH
ROC Curves for Decoys in 1ivd(244:252) and 153 l(98:109). In these ROC curves, the true positives are the number of top-N ranked decoys with RMSD less than or equal to r, the false positives are the number of top-N ranked decoys with RMSD greater than r, the false negatives are the number of decoys with RMSD less than or equal to r but having rank greater than N, and the true negatives are the number of decoys with rank greater than N and RMSD greater than r. In our ROC plots, r is the 10th best RMSD in a decoy set and N is the cutoff variable. The ROC curves generated by the POC method yield higher AUC values than those of the individual scoring functions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2914074&req=5

Figure 8: ROC Curves for Decoys in 1ivd(244:252) and 153 l(98:109). In these ROC curves, the true positives are the number of top-N ranked decoys with RMSD less than or equal to r, the false positives are the number of top-N ranked decoys with RMSD greater than r, the false negatives are the number of decoys with RMSD less than or equal to r but having rank greater than N, and the true negatives are the number of decoys with rank greater than N and RMSD greater than r. In our ROC plots, r is the 10th best RMSD in a decoy set and N is the cutoff variable. The ROC curves generated by the POC method yield higher AUC values than those of the individual scoring functions.

Mentions: We use the receiver operating characteristic (ROC) curves to evaluate the ranking performance of each individual scoring function as well as the POC method for each loop target, according to the method described in [46] for ranked data. ROC curves display the true positive rate versus the false positive rate. The area under the ROC curve (AUC) is determined from these ROC curves. An AUC of 1.0 indicates perfect ranking of the top N decoys whereas an AUC of 0.5 is representative of a random ranking. The higher an AUC value, the better the ranking performance. Figure 8 shows the ROC curves for evaluating the top-10-ranking of decoys in 1ivd(244:252) and 153 l(98:109). One can find that the POC method yields larger ROC AUC than individual scoring functions. Moreover, Table 1 shows the average ROC AUC values of individual scoring functions and POC in Jacobson's decoy sets and the membrane protein loop decoy sets, where Rosetta and DFIRE are the most effective individual scoring functions, respectively. POC yields even higher AUC value than Rosetta and DFIRE, as well as other scoring functions, in both cases. The OPLSAA score is not evaluated in membrane protein loop decoy sets because hydrogen atoms in the decoys are not available.


Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

Li Y, Rata I, Chiu SW, Jakobsson E - BMC Struct. Biol. (2010)

ROC Curves for Decoys in 1ivd(244:252) and 153 l(98:109). In these ROC curves, the true positives are the number of top-N ranked decoys with RMSD less than or equal to r, the false positives are the number of top-N ranked decoys with RMSD greater than r, the false negatives are the number of decoys with RMSD less than or equal to r but having rank greater than N, and the true negatives are the number of decoys with rank greater than N and RMSD greater than r. In our ROC plots, r is the 10th best RMSD in a decoy set and N is the cutoff variable. The ROC curves generated by the POC method yield higher AUC values than those of the individual scoring functions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2914074&req=5

Figure 8: ROC Curves for Decoys in 1ivd(244:252) and 153 l(98:109). In these ROC curves, the true positives are the number of top-N ranked decoys with RMSD less than or equal to r, the false positives are the number of top-N ranked decoys with RMSD greater than r, the false negatives are the number of decoys with RMSD less than or equal to r but having rank greater than N, and the true negatives are the number of decoys with rank greater than N and RMSD greater than r. In our ROC plots, r is the 10th best RMSD in a decoy set and N is the cutoff variable. The ROC curves generated by the POC method yield higher AUC values than those of the individual scoring functions.
Mentions: We use the receiver operating characteristic (ROC) curves to evaluate the ranking performance of each individual scoring function as well as the POC method for each loop target, according to the method described in [46] for ranked data. ROC curves display the true positive rate versus the false positive rate. The area under the ROC curve (AUC) is determined from these ROC curves. An AUC of 1.0 indicates perfect ranking of the top N decoys whereas an AUC of 0.5 is representative of a random ranking. The higher an AUC value, the better the ranking performance. Figure 8 shows the ROC curves for evaluating the top-10-ranking of decoys in 1ivd(244:252) and 153 l(98:109). One can find that the POC method yields larger ROC AUC than individual scoring functions. Moreover, Table 1 shows the average ROC AUC values of individual scoring functions and POC in Jacobson's decoy sets and the membrane protein loop decoy sets, where Rosetta and DFIRE are the most effective individual scoring functions, respectively. POC yields even higher AUC value than Rosetta and DFIRE, as well as other scoring functions, in both cases. The OPLSAA score is not evaluated in membrane protein loop decoy sets because hydrogen atoms in the decoys are not available.

Bottom Line: Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets.Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops.By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. yaohang@cs.odu.edu

ABSTRACT

Background: Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

Results: We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.

Conclusions: By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

Show MeSH