Limits...
Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D Structure and sequence properties.

Tong W, Wei Y, Murga LF, Ondrechen MJ, Williams RJ - PLoS Comput. Biol. (2009)

Bottom Line: This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data.Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures.Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.

View Article: PubMed Central - PubMed

Affiliation: College of Computer and Information Science, Northeastern University, Boston, Massachusetts, United States of America.

ABSTRACT
A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.

Show MeSH

Related in: MedlinePlus

Prediction of annotated ionizable active site residues in a test set of 64 proteins using only THEMATICS features.Shown in the plot are the averaged ROC curves, recall as a function of false positive rate, for POOL(T4) (solid curve) and Wei's statistical analysis (dashed curve) along with Tong's SVM (point X). Predictions all use THEMATICS features on ionizable residues only; performance is measured using annotated active site ionizable residues. POOL(T4) outperforms both the SVM and Wei's method.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2612599&req=5

pcbi-1000266-g001: Prediction of annotated ionizable active site residues in a test set of 64 proteins using only THEMATICS features.Shown in the plot are the averaged ROC curves, recall as a function of false positive rate, for POOL(T4) (solid curve) and Wei's statistical analysis (dashed curve) along with Tong's SVM (point X). Predictions all use THEMATICS features on ionizable residues only; performance is measured using annotated active site ionizable residues. POOL(T4) outperforms both the SVM and Wei's method.

Mentions: Here we evaluate the ability of POOL with the four THEMATICS features, denoted POOL(T4), to predict ionizable residues in the active site. For the purposes of Figures 1 and 2, only the ionizable CSA-annotated active site residues are taken as the labeled positives. Thus if a method successfully predicts all of the labeled ionizable active residues, its true positive rate is 100%. The prediction of all active residues, including the non-ionizable ones, is addressed below.


Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D Structure and sequence properties.

Tong W, Wei Y, Murga LF, Ondrechen MJ, Williams RJ - PLoS Comput. Biol. (2009)

Prediction of annotated ionizable active site residues in a test set of 64 proteins using only THEMATICS features.Shown in the plot are the averaged ROC curves, recall as a function of false positive rate, for POOL(T4) (solid curve) and Wei's statistical analysis (dashed curve) along with Tong's SVM (point X). Predictions all use THEMATICS features on ionizable residues only; performance is measured using annotated active site ionizable residues. POOL(T4) outperforms both the SVM and Wei's method.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2612599&req=5

pcbi-1000266-g001: Prediction of annotated ionizable active site residues in a test set of 64 proteins using only THEMATICS features.Shown in the plot are the averaged ROC curves, recall as a function of false positive rate, for POOL(T4) (solid curve) and Wei's statistical analysis (dashed curve) along with Tong's SVM (point X). Predictions all use THEMATICS features on ionizable residues only; performance is measured using annotated active site ionizable residues. POOL(T4) outperforms both the SVM and Wei's method.
Mentions: Here we evaluate the ability of POOL with the four THEMATICS features, denoted POOL(T4), to predict ionizable residues in the active site. For the purposes of Figures 1 and 2, only the ionizable CSA-annotated active site residues are taken as the labeled positives. Thus if a method successfully predicts all of the labeled ionizable active residues, its true positive rate is 100%. The prediction of all active residues, including the non-ionizable ones, is addressed below.

Bottom Line: This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data.Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures.Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.

View Article: PubMed Central - PubMed

Affiliation: College of Computer and Information Science, Northeastern University, Boston, Massachusetts, United States of America.

ABSTRACT
A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.

Show MeSH
Related in: MedlinePlus