Limits...
HydDB: A web tool for hydrogenase classification and analysis

View Article: PubMed Central - PubMed

ABSTRACT

H2 metabolism is proposed to be the most ancient and diverse mechanism of energy-conservation. The metalloenzymes mediating this metabolism, hydrogenases, are encoded by over 60 microbial phyla and are present in all major ecosystems. We developed a classification system and web tool, HydDB, for the structural and functional analysis of these enzymes. We show that hydrogenase function can be predicted by primary sequence alone using an expanded classification scheme (comprising 29 [NiFe], 8 [FeFe], and 1 [Fe] hydrogenase classes) that defines 11 new classes with distinct biological functions. Using this scheme, we built a web tool that rapidly and reliably classifies hydrogenase primary sequences using a combination of k-nearest neighbors’ algorithms and CDD referencing. Demonstrating its capacity, the tool reliably predicted hydrogenase content and function in 12 newly-sequenced bacteria, archaea, and eukaryotes. HydDB provides the capacity to browse the amino acid sequences of 3248 annotated hydrogenase catalytic subunits and also contains a detailed repository of physiological, biochemical, and structural information about the 38 hydrogenase classes defined here. The database and classifier are freely and publicly available at http://services.birc.au.dk/hyddb/

No MeSH data available.


Evaluating the k-NN classifier for k = 1…10.For each k, a 5-fold cross-validation was performed. The mean precision ± two standard deviations of the folds is shown in the figure (note the y-axis). k = 1 provides the most accurate classifier. However, k = 4 provides almost the same precision and is more robust to errors in the training set (reflected by the lower standard deviation). In general, the standard deviation is very small, indicating that the predictions are robust to changes in the training data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037454&req=5

f2: Evaluating the k-NN classifier for k = 1…10.For each k, a 5-fold cross-validation was performed. The mean precision ± two standard deviations of the folds is shown in the figure (note the y-axis). k = 1 provides the most accurate classifier. However, k = 4 provides almost the same precision and is more robust to errors in the training set (reflected by the lower standard deviation). In general, the standard deviation is very small, indicating that the predictions are robust to changes in the training data.

Mentions: In the final step, the sequence is classified through the k-NN method that determines the most similar sequences listed in the HydDB reference database. To determine the optimal k for the dataset, we performed a 5-fold cross-validation for k = 1…10 and computed the precision for each k. The results are shown in Fig. 2. The classifier predicted the classes of the 3248 hydrogenase sequences with 99.8% precision and high robustness when performing a 5-fold cross-validation (as described in the Methods section) for k = 4. The six sequences where there were discrepancies between the SSN and k-NN predictions are shown in Table S2. The classifier has also been trained to detect and exclude protein families that are homologous to hydrogenases but do not metabolize H2 (Nuo, Ehr, NARF, HmdII12) using reference sequences of these proteins (Table S1).


HydDB: A web tool for hydrogenase classification and analysis
Evaluating the k-NN classifier for k = 1…10.For each k, a 5-fold cross-validation was performed. The mean precision ± two standard deviations of the folds is shown in the figure (note the y-axis). k = 1 provides the most accurate classifier. However, k = 4 provides almost the same precision and is more robust to errors in the training set (reflected by the lower standard deviation). In general, the standard deviation is very small, indicating that the predictions are robust to changes in the training data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037454&req=5

f2: Evaluating the k-NN classifier for k = 1…10.For each k, a 5-fold cross-validation was performed. The mean precision ± two standard deviations of the folds is shown in the figure (note the y-axis). k = 1 provides the most accurate classifier. However, k = 4 provides almost the same precision and is more robust to errors in the training set (reflected by the lower standard deviation). In general, the standard deviation is very small, indicating that the predictions are robust to changes in the training data.
Mentions: In the final step, the sequence is classified through the k-NN method that determines the most similar sequences listed in the HydDB reference database. To determine the optimal k for the dataset, we performed a 5-fold cross-validation for k = 1…10 and computed the precision for each k. The results are shown in Fig. 2. The classifier predicted the classes of the 3248 hydrogenase sequences with 99.8% precision and high robustness when performing a 5-fold cross-validation (as described in the Methods section) for k = 4. The six sequences where there were discrepancies between the SSN and k-NN predictions are shown in Table S2. The classifier has also been trained to detect and exclude protein families that are homologous to hydrogenases but do not metabolize H2 (Nuo, Ehr, NARF, HmdII12) using reference sequences of these proteins (Table S1).

View Article: PubMed Central - PubMed

ABSTRACT

H2 metabolism is proposed to be the most ancient and diverse mechanism of energy-conservation. The metalloenzymes mediating this metabolism, hydrogenases, are encoded by over 60 microbial phyla and are present in all major ecosystems. We developed a classification system and web tool, HydDB, for the structural and functional analysis of these enzymes. We show that hydrogenase function can be predicted by primary sequence alone using an expanded classification scheme (comprising 29 [NiFe], 8 [FeFe], and 1 [Fe] hydrogenase classes) that defines 11 new classes with distinct biological functions. Using this scheme, we built a web tool that rapidly and reliably classifies hydrogenase primary sequences using a combination of k-nearest neighbors’ algorithms and CDD referencing. Demonstrating its capacity, the tool reliably predicted hydrogenase content and function in 12 newly-sequenced bacteria, archaea, and eukaryotes. HydDB provides the capacity to browse the amino acid sequences of 3248 annotated hydrogenase catalytic subunits and also contains a detailed repository of physiological, biochemical, and structural information about the 38 hydrogenase classes defined here. The database and classifier are freely and publicly available at http://services.birc.au.dk/hyddb/

No MeSH data available.