Limits...
Quality assessment of protein model-structures based on structural and functional similarities.

Konopka BM, Nebel JC, Kotulska M - BMC Bioinformatics (2012)

Bottom Line: The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets.GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants.In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biomedical Engineering and Instrumentation, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland.

ABSTRACT

Background: Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology.

Results: GOBA--Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests.

Conclusions: The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.

Show MeSH

Related in: MedlinePlus

Performance comparison between GOBA and model quality assessment programs from (A) CASP8 and (B) CASP9 based on per target correlations. The boxplots show the percentiles of the correlation of particular scores with the GDT-TS per- target; whiskers - 0.05 and 0.95; box edges 0.25 and 0.75; the intersecting line - 0.5; the average values of correlation are shown by squares; minimal and maximal values are marked with crosses. Boxes with pink and red borders show the “relative” and “single model” GOBA scores respectively. The basic version of GOBA scores are denoted by empty boxes, while the striped ones show the performance of GOBA after applying the outlier exclusion procedure. Blue solid boxes are consensus methods, while red ones are single and quasi-single model methods. Gray boxes are methods which could not be classified in either of those two categories.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3526563&req=5

Figure 11: Performance comparison between GOBA and model quality assessment programs from (A) CASP8 and (B) CASP9 based on per target correlations. The boxplots show the percentiles of the correlation of particular scores with the GDT-TS per- target; whiskers - 0.05 and 0.95; box edges 0.25 and 0.75; the intersecting line - 0.5; the average values of correlation are shown by squares; minimal and maximal values are marked with crosses. Boxes with pink and red borders show the “relative” and “single model” GOBA scores respectively. The basic version of GOBA scores are denoted by empty boxes, while the striped ones show the performance of GOBA after applying the outlier exclusion procedure. Blue solid boxes are consensus methods, while red ones are single and quasi-single model methods. Gray boxes are methods which could not be classified in either of those two categories.

Mentions: Based on average “per target” correlations and ΔGDT values we constructed rankings of CASP8 and CASP9 participants extended with the best performing GOBA scores (one from yGA-family and one from GA-family), i.e. yGA579 and GA579 (Figures 11 and 12). Our best scoring scheme - yGA579 with the exclusion procedure modification (see next paragraph) was ranked 31-st in CASP8 (Figure 11a) and 27-th in CASP9 (Figure 11b) (based on average per target correlation, pink striped boxes), which fits around the middle of both classifications. It was ranked 10-th and 5-th, if only “single model” MQAPs were taken into consideration. Many MQAPs showed very high correlations with the GDT-TS and differences between the scores of the top-performing methods were not statistically significant, as shown by CASP assessors [21]. The GA-family metrics performed worse than proposed relative yGA-family scores, especially in terms of ΔGDT. In the ΔGDT rankings, our best scoring scheme was ranked 34-th in CASP8 (Figure 12a) and 26-th in CASP9 (Figure 12b). It has to be stressed that all GOBA methods in the CASP8 and CASP9 validation sets produced Pearson correlations higher than 0.5 in both “over-all” and “per target” evaluations. This shows that this is a meaningful and valuable approach for the assessment of protein model structures. In addition, GOBA captures information which is not retained by other methods. Since one of the ways to enhance the performance of MQAPs is to combine approaches [7], GOBA, which introduces additional source of information, and selected GOBA metrics GA468 , GA579 , yGA468 and yGA579 in particular, are suited to complement more traditional approaches.


Quality assessment of protein model-structures based on structural and functional similarities.

Konopka BM, Nebel JC, Kotulska M - BMC Bioinformatics (2012)

Performance comparison between GOBA and model quality assessment programs from (A) CASP8 and (B) CASP9 based on per target correlations. The boxplots show the percentiles of the correlation of particular scores with the GDT-TS per- target; whiskers - 0.05 and 0.95; box edges 0.25 and 0.75; the intersecting line - 0.5; the average values of correlation are shown by squares; minimal and maximal values are marked with crosses. Boxes with pink and red borders show the “relative” and “single model” GOBA scores respectively. The basic version of GOBA scores are denoted by empty boxes, while the striped ones show the performance of GOBA after applying the outlier exclusion procedure. Blue solid boxes are consensus methods, while red ones are single and quasi-single model methods. Gray boxes are methods which could not be classified in either of those two categories.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3526563&req=5

Figure 11: Performance comparison between GOBA and model quality assessment programs from (A) CASP8 and (B) CASP9 based on per target correlations. The boxplots show the percentiles of the correlation of particular scores with the GDT-TS per- target; whiskers - 0.05 and 0.95; box edges 0.25 and 0.75; the intersecting line - 0.5; the average values of correlation are shown by squares; minimal and maximal values are marked with crosses. Boxes with pink and red borders show the “relative” and “single model” GOBA scores respectively. The basic version of GOBA scores are denoted by empty boxes, while the striped ones show the performance of GOBA after applying the outlier exclusion procedure. Blue solid boxes are consensus methods, while red ones are single and quasi-single model methods. Gray boxes are methods which could not be classified in either of those two categories.
Mentions: Based on average “per target” correlations and ΔGDT values we constructed rankings of CASP8 and CASP9 participants extended with the best performing GOBA scores (one from yGA-family and one from GA-family), i.e. yGA579 and GA579 (Figures 11 and 12). Our best scoring scheme - yGA579 with the exclusion procedure modification (see next paragraph) was ranked 31-st in CASP8 (Figure 11a) and 27-th in CASP9 (Figure 11b) (based on average per target correlation, pink striped boxes), which fits around the middle of both classifications. It was ranked 10-th and 5-th, if only “single model” MQAPs were taken into consideration. Many MQAPs showed very high correlations with the GDT-TS and differences between the scores of the top-performing methods were not statistically significant, as shown by CASP assessors [21]. The GA-family metrics performed worse than proposed relative yGA-family scores, especially in terms of ΔGDT. In the ΔGDT rankings, our best scoring scheme was ranked 34-th in CASP8 (Figure 12a) and 26-th in CASP9 (Figure 12b). It has to be stressed that all GOBA methods in the CASP8 and CASP9 validation sets produced Pearson correlations higher than 0.5 in both “over-all” and “per target” evaluations. This shows that this is a meaningful and valuable approach for the assessment of protein model structures. In addition, GOBA captures information which is not retained by other methods. Since one of the ways to enhance the performance of MQAPs is to combine approaches [7], GOBA, which introduces additional source of information, and selected GOBA metrics GA468 , GA579 , yGA468 and yGA579 in particular, are suited to complement more traditional approaches.

Bottom Line: The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets.GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants.In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biomedical Engineering and Instrumentation, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland.

ABSTRACT

Background: Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology.

Results: GOBA--Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests.

Conclusions: The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.

Show MeSH
Related in: MedlinePlus