Limits...
Automatic assignment of EC numbers.

Egelhofer V, Schomburg I, Schomburg D - PLoS Comput. Biol. (2010)

Bottom Line: Over 80% agreement was found between our assignment and the EC classification.For 61 (i.e., only 2.5%) reactions we found that their assignment was inconsistent with the rules of the nomenclature committee; they have to be transferred to other sub-subclasses.We demonstrate that our validation results can be used to initiate corrections and improvements to the EC number classification scheme.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioinformatics and Biochemistry, Technical University Braunschweig, Braunschweig, Germany. volker.egelhofer@univie.ac.at

ABSTRACT
A wide range of research areas in molecular biology and medical biochemistry require a reliable enzyme classification system, e.g., drug design, metabolic network reconstruction and system biology. When research scientists in the above mentioned areas wish to unambiguously refer to an enzyme and its function, the EC number introduced by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) is used. However, each and every one of these applications is critically dependent upon the consistency and reliability of the underlying data for success. We have developed tools for the validation of the EC number classification scheme. In this paper, we present validated data of 3788 enzymatic reactions including 229 sub-subclasses of the EC classification system. Over 80% agreement was found between our assignment and the EC classification. For 61 (i.e., only 2.5%) reactions we found that their assignment was inconsistent with the rules of the nomenclature committee; they have to be transferred to other sub-subclasses. We demonstrate that our validation results can be used to initiate corrections and improvements to the EC number classification scheme.

Show MeSH
Processes for reconstruction of a metabolic network.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2813261&req=5

pcbi-1000661-g001: Processes for reconstruction of a metabolic network.

Mentions: With the several thousand proteins found in each organism a highly developed hierarchical and consistent classification scheme is absolutely essential for a comparison of metabolic capacities of the organisms. Unfortunately such a system exists only for the enzymes and not for the other protein classes but for the enzymes the classification scheme allows an immediate access or the enzyme functional properties including catalysed reaction, substrate specificity, etc. In this respect a quick comparative assessment of enzymatic pathways between organisms is possible even when the enzymes in the different organisms have totally different sequences as long as they belong to the same EC-class. A well reconstructed metabolic network provides a unified platform to integrate all the biological and medical information on genes, enzymes, metabolites, drugs and drug targets for a system level study of the relationship between metabolism and disease. Therefore an accurate representation of biochemical and metabolic networks by mathematical models is one of the major goals of integrative systems biology. Metabolic networks have been constructed for a number of genomes [1],[2]. An example for the reconstruction process of a metabolic network are schematically shown in Figure 1. It is essential to integrate information from different databases to get a more complete enzyme list for the reconstruction. The main databases to be taken into account to provide a complete cross-link between genes and their corresponding enzymes are NCBI EntrezGene [3], Ensembl [4], KEGG [5], MetaCyc [6] and BRENDA [7]. The second step of the reconstruction procedure is to fill the gaps resulting from the first step based on information from literature. This step is very time-consuming and it would be therefore highly desirable to make the first step an automatic and reliable procedure. One of the problems is the different substrate specificity of enzymes in different organisms a fact that cannot be really accounted for by any classification system [8]. A further problem is the wide-spread use of incomplete EC numbers such as 1.-.-.- (e.g. in UNIPROT entry AK1C3_HUMAN). This often occurs because an enzymatic function is inferred from the existence of a certain pair of metabolites or only experimentally shown from a cell extract without a full characterisation of the enzyme with biochemical methods, which is the requirement for the assignment of EC-numbers by the IUBMB Nomenclature Committee [9]. For example, in the UniProt database there are more than 800 proteins annotated with an incomplete EC number [10]. Applications like drug design, ligand docking, or systems biology require the EC number classification to be correct, consistent, and accurate. For these reasons the automatic assignment of EC numbers to enzymatic reactions is a current issue in bioinformatics and requires specific chemical knowledge, therefore just a few approaches have been published to handle the assignment problem. The Kyoto Encyclopedia of Genes and Genomes (KEGG) developed a tool for computational assignment of EC numbers published by Kotera et al. [11]. In this approach each reaction formula is decomposed by manual work into sets of corresponding substrate and product molecules, which are called reactant pairs. In the second step every reactant pair is analysed by the structure comparison method SIMCOMP developed by Hattori et al. [12]. Another approach proposed by Körner et al. [13] and Apostolakis et al. [14] considers reaction energetics to predict reaction sites. Lationa et al. [15] introduced an EC number classification method based on self-organizing maps. This approach allows to assign EC numbers at the sub-subclass levels for reactions with accuracies of 70%. One of the authors being the current chairman of the IUBMB nomenclature committee we felt the need to develop a system that allows for a highly reliable classification system that can help to identify the sub-subclass of any given enzyme-catalyzed reaction, allow a quick assignment of new reactions and additionally serve in a retrospective quality control of existing EC-numbers. With ca. 4000 existing EC-numbers this can certainly not be done by hand. In this article we present an efficient and reliable strategy for the automatic classification of enzyme-catalysed biochemical reactions based on the chemical structure of the involved substrates and products.


Automatic assignment of EC numbers.

Egelhofer V, Schomburg I, Schomburg D - PLoS Comput. Biol. (2010)

Processes for reconstruction of a metabolic network.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2813261&req=5

pcbi-1000661-g001: Processes for reconstruction of a metabolic network.
Mentions: With the several thousand proteins found in each organism a highly developed hierarchical and consistent classification scheme is absolutely essential for a comparison of metabolic capacities of the organisms. Unfortunately such a system exists only for the enzymes and not for the other protein classes but for the enzymes the classification scheme allows an immediate access or the enzyme functional properties including catalysed reaction, substrate specificity, etc. In this respect a quick comparative assessment of enzymatic pathways between organisms is possible even when the enzymes in the different organisms have totally different sequences as long as they belong to the same EC-class. A well reconstructed metabolic network provides a unified platform to integrate all the biological and medical information on genes, enzymes, metabolites, drugs and drug targets for a system level study of the relationship between metabolism and disease. Therefore an accurate representation of biochemical and metabolic networks by mathematical models is one of the major goals of integrative systems biology. Metabolic networks have been constructed for a number of genomes [1],[2]. An example for the reconstruction process of a metabolic network are schematically shown in Figure 1. It is essential to integrate information from different databases to get a more complete enzyme list for the reconstruction. The main databases to be taken into account to provide a complete cross-link between genes and their corresponding enzymes are NCBI EntrezGene [3], Ensembl [4], KEGG [5], MetaCyc [6] and BRENDA [7]. The second step of the reconstruction procedure is to fill the gaps resulting from the first step based on information from literature. This step is very time-consuming and it would be therefore highly desirable to make the first step an automatic and reliable procedure. One of the problems is the different substrate specificity of enzymes in different organisms a fact that cannot be really accounted for by any classification system [8]. A further problem is the wide-spread use of incomplete EC numbers such as 1.-.-.- (e.g. in UNIPROT entry AK1C3_HUMAN). This often occurs because an enzymatic function is inferred from the existence of a certain pair of metabolites or only experimentally shown from a cell extract without a full characterisation of the enzyme with biochemical methods, which is the requirement for the assignment of EC-numbers by the IUBMB Nomenclature Committee [9]. For example, in the UniProt database there are more than 800 proteins annotated with an incomplete EC number [10]. Applications like drug design, ligand docking, or systems biology require the EC number classification to be correct, consistent, and accurate. For these reasons the automatic assignment of EC numbers to enzymatic reactions is a current issue in bioinformatics and requires specific chemical knowledge, therefore just a few approaches have been published to handle the assignment problem. The Kyoto Encyclopedia of Genes and Genomes (KEGG) developed a tool for computational assignment of EC numbers published by Kotera et al. [11]. In this approach each reaction formula is decomposed by manual work into sets of corresponding substrate and product molecules, which are called reactant pairs. In the second step every reactant pair is analysed by the structure comparison method SIMCOMP developed by Hattori et al. [12]. Another approach proposed by Körner et al. [13] and Apostolakis et al. [14] considers reaction energetics to predict reaction sites. Lationa et al. [15] introduced an EC number classification method based on self-organizing maps. This approach allows to assign EC numbers at the sub-subclass levels for reactions with accuracies of 70%. One of the authors being the current chairman of the IUBMB nomenclature committee we felt the need to develop a system that allows for a highly reliable classification system that can help to identify the sub-subclass of any given enzyme-catalyzed reaction, allow a quick assignment of new reactions and additionally serve in a retrospective quality control of existing EC-numbers. With ca. 4000 existing EC-numbers this can certainly not be done by hand. In this article we present an efficient and reliable strategy for the automatic classification of enzyme-catalysed biochemical reactions based on the chemical structure of the involved substrates and products.

Bottom Line: Over 80% agreement was found between our assignment and the EC classification.For 61 (i.e., only 2.5%) reactions we found that their assignment was inconsistent with the rules of the nomenclature committee; they have to be transferred to other sub-subclasses.We demonstrate that our validation results can be used to initiate corrections and improvements to the EC number classification scheme.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioinformatics and Biochemistry, Technical University Braunschweig, Braunschweig, Germany. volker.egelhofer@univie.ac.at

ABSTRACT
A wide range of research areas in molecular biology and medical biochemistry require a reliable enzyme classification system, e.g., drug design, metabolic network reconstruction and system biology. When research scientists in the above mentioned areas wish to unambiguously refer to an enzyme and its function, the EC number introduced by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) is used. However, each and every one of these applications is critically dependent upon the consistency and reliability of the underlying data for success. We have developed tools for the validation of the EC number classification scheme. In this paper, we present validated data of 3788 enzymatic reactions including 229 sub-subclasses of the EC classification system. Over 80% agreement was found between our assignment and the EC classification. For 61 (i.e., only 2.5%) reactions we found that their assignment was inconsistent with the rules of the nomenclature committee; they have to be transferred to other sub-subclasses. We demonstrate that our validation results can be used to initiate corrections and improvements to the EC number classification scheme.

Show MeSH