Limits...
Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D Structure and sequence properties.

Tong W, Wei Y, Murga LF, Ondrechen MJ, Williams RJ - PLoS Comput. Biol. (2009)

Bottom Line: This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data.Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures.Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.

View Article: PubMed Central - PubMed

Affiliation: College of Computer and Information Science, Northeastern University, Boston, Massachusetts, United States of America.

ABSTRACT
A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.

Show MeSH

Related in: MedlinePlus

Averaged recall as a function of Filtration Ratio (RFR) curve for POOL(T)xPOOL(G)xPOOL(C) for all residues in the 160 protein test set.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2612599&req=5

pcbi-1000266-g005: Averaged recall as a function of Filtration Ratio (RFR) curve for POOL(T)xPOOL(G)xPOOL(C) for all residues in the 160 protein test set.

Mentions: Note that neither axis of a ROC curve involves a directly user-controllable parameter. Neither recall nor false positive rate is under the direct control of a user who does not already know the correct classifications. Assuming the user wishes to select the highest-ranking values in the list, down to a certain fixed proportion, a more useful curve would be a recall-filtration ratio (RFR) curve, where filtration ratio is defined to be the fraction of all residues predicted as positive. Figure 5 shows an averaged RFR curve for the best-performing POOL(T)xPOOL(G)xPOOL(C) method for the 160-protein test set. In this case, the vertical axis is the average recall (across proteins) obtained when the proportion of predicted positives is set at the value on the horizontal axis. For the curve shown in Figure 5, for example, choosing the top 10% of the residues from the ranked list gives an average recall of 90%, while choosing the top 5% of the residues from the ranked list gives an average recall of 79%.


Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D Structure and sequence properties.

Tong W, Wei Y, Murga LF, Ondrechen MJ, Williams RJ - PLoS Comput. Biol. (2009)

Averaged recall as a function of Filtration Ratio (RFR) curve for POOL(T)xPOOL(G)xPOOL(C) for all residues in the 160 protein test set.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2612599&req=5

pcbi-1000266-g005: Averaged recall as a function of Filtration Ratio (RFR) curve for POOL(T)xPOOL(G)xPOOL(C) for all residues in the 160 protein test set.
Mentions: Note that neither axis of a ROC curve involves a directly user-controllable parameter. Neither recall nor false positive rate is under the direct control of a user who does not already know the correct classifications. Assuming the user wishes to select the highest-ranking values in the list, down to a certain fixed proportion, a more useful curve would be a recall-filtration ratio (RFR) curve, where filtration ratio is defined to be the fraction of all residues predicted as positive. Figure 5 shows an averaged RFR curve for the best-performing POOL(T)xPOOL(G)xPOOL(C) method for the 160-protein test set. In this case, the vertical axis is the average recall (across proteins) obtained when the proportion of predicted positives is set at the value on the horizontal axis. For the curve shown in Figure 5, for example, choosing the top 10% of the residues from the ranked list gives an average recall of 90%, while choosing the top 5% of the residues from the ranked list gives an average recall of 79%.

Bottom Line: This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data.Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures.Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.

View Article: PubMed Central - PubMed

Affiliation: College of Computer and Information Science, Northeastern University, Boston, Massachusetts, United States of America.

ABSTRACT
A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.

Show MeSH
Related in: MedlinePlus