Limits...
A resource for benchmarking the usefulness of protein structure models.

Carbajo D, Tramontano A - BMC Bioinformatics (2012)

Bottom Line: The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest.The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively.Any restrictions to use by non-academics: No.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Physics, Sapienza University of Rome, P,le A, Moro, 5, 00185 Rome, Italy.

ABSTRACT

Background: Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application.

Results: This paper describes a database and related software tools that allow testing of a given structure based method on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively.

Conclusions: The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database. IMPLEMENTATION, AVAILABILITY AND REQUIREMENTS: Project name: A resource for benchmarking the usefulness of protein structure models. Project home page: http://bl210.caspur.it/MODEL-DB/MODEL-DB_web/MODindex.php.Operating system(s): Platform independent. Programming language: Perl-BioPerl (program); mySQL, Perl DBI and DBD modules (database); php, JavaScript, Jmol scripting (web server). Other requirements: Java Runtime Environment v1.4 or later, Perl, BioPerl, CPAN modules, HHsearch, Modeller, LGA, NCBI Blast package, DSSP, Speedfill (Surfnet) and PSAIA. License: Free. Any restrictions to use by non-academics: No.

Show MeSH
Percentage of models with different GDT-TS values in which it is possible to correctly identify the largest surface cavity (at least 75% of its residues). On the left results are shown for all entries in the database, while on the right only enzymes annotated in CSA [21,22] are considered.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3473236&req=5

Figure 5: Percentage of models with different GDT-TS values in which it is possible to correctly identify the largest surface cavity (at least 75% of its residues). On the left results are shown for all entries in the database, while on the right only enzymes annotated in CSA [21,22] are considered.

Mentions: Another common use of models concerns the identification of enzyme active sites. It is known that binding sites tend to occur in the largest cavity on the surface of proteins [29], so the obvious question is how well this property is conserved in models of different quality. Our data (Figure 5) show that only in a 20% of the models with a GDT-TS above 90 can at least 75% of the residues constituting the largest cavity be detected (a residue is considered to belong to a cavity if any of its atoms belongs to it). It follows that this approach is not very suitable for medium to low quality models and perhaps it should be parameterized differently for these cases. The situation moderately improves when the subset of enzymes stored in CSA is considered (Figure 5).


A resource for benchmarking the usefulness of protein structure models.

Carbajo D, Tramontano A - BMC Bioinformatics (2012)

Percentage of models with different GDT-TS values in which it is possible to correctly identify the largest surface cavity (at least 75% of its residues). On the left results are shown for all entries in the database, while on the right only enzymes annotated in CSA [21,22] are considered.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3473236&req=5

Figure 5: Percentage of models with different GDT-TS values in which it is possible to correctly identify the largest surface cavity (at least 75% of its residues). On the left results are shown for all entries in the database, while on the right only enzymes annotated in CSA [21,22] are considered.
Mentions: Another common use of models concerns the identification of enzyme active sites. It is known that binding sites tend to occur in the largest cavity on the surface of proteins [29], so the obvious question is how well this property is conserved in models of different quality. Our data (Figure 5) show that only in a 20% of the models with a GDT-TS above 90 can at least 75% of the residues constituting the largest cavity be detected (a residue is considered to belong to a cavity if any of its atoms belongs to it). It follows that this approach is not very suitable for medium to low quality models and perhaps it should be parameterized differently for these cases. The situation moderately improves when the subset of enzymes stored in CSA is considered (Figure 5).

Bottom Line: The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest.The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively.Any restrictions to use by non-academics: No.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Physics, Sapienza University of Rome, P,le A, Moro, 5, 00185 Rome, Italy.

ABSTRACT

Background: Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application.

Results: This paper describes a database and related software tools that allow testing of a given structure based method on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively.

Conclusions: The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database. IMPLEMENTATION, AVAILABILITY AND REQUIREMENTS: Project name: A resource for benchmarking the usefulness of protein structure models. Project home page: http://bl210.caspur.it/MODEL-DB/MODEL-DB_web/MODindex.php.Operating system(s): Platform independent. Programming language: Perl-BioPerl (program); mySQL, Perl DBI and DBD modules (database); php, JavaScript, Jmol scripting (web server). Other requirements: Java Runtime Environment v1.4 or later, Perl, BioPerl, CPAN modules, HHsearch, Modeller, LGA, NCBI Blast package, DSSP, Speedfill (Surfnet) and PSAIA. License: Free. Any restrictions to use by non-academics: No.

Show MeSH