Limits...
Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes.

Kastenmüller G, Schenk ME, Gasteiger J, Mewes HW - Genome Biol. (2009)

Bottom Line: Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics.Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale.Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstrasse, Neuherberg, Germany. g.kastenmueller@helmholtz-muenchen.de

ABSTRACT
Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics. Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale. Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

Show MeSH

Related in: MedlinePlus

Estimating the significance of pathway rankings provided by pathway selection. For phenotypes that are weakly associated with the presence or absence of specific metabolic pathways, the classification quality should be within the same range for classification based on randomly picked pathways (red), all pathways (marked by a horizontal line), and pathways highly ranked in attribute subset selection (green, ReliefF; yellow, SVMAttributeEval; blue, wrapper (naïve Bayes)). As an example, the right diagram shows the classification quality for the phenotype 'habitat: soil' (depending on the number of top-ranking pathways used for classification). In this case, the top-ranking pathways provided by attribute subset selection are considered as not significant for the phenotype. The left diagram shows the classification quality values for the phenotype 'obligate intracellular'. Using the most relevant pathways for classification results in higher classification quality compared to using all pathways or randomly picked pathways. Furthermore, the quality values lie above 0.6. In this case, the most relevant pathways derived by attribute subset selection are considered as significant.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2690999&req=5

Figure 2: Estimating the significance of pathway rankings provided by pathway selection. For phenotypes that are weakly associated with the presence or absence of specific metabolic pathways, the classification quality should be within the same range for classification based on randomly picked pathways (red), all pathways (marked by a horizontal line), and pathways highly ranked in attribute subset selection (green, ReliefF; yellow, SVMAttributeEval; blue, wrapper (naïve Bayes)). As an example, the right diagram shows the classification quality for the phenotype 'habitat: soil' (depending on the number of top-ranking pathways used for classification). In this case, the top-ranking pathways provided by attribute subset selection are considered as not significant for the phenotype. The left diagram shows the classification quality values for the phenotype 'obligate intracellular'. Using the most relevant pathways for classification results in higher classification quality compared to using all pathways or randomly picked pathways. Furthermore, the quality values lie above 0.6. In this case, the most relevant pathways derived by attribute subset selection are considered as significant.

Mentions: Phenotypes that are not or only weakly associated with specific metabolic capabilities might, nonetheless, be developed by species that are similar in their complete metabolism. In this case any set of randomly picked pathways might have nearly the same (high) predictive power as the selected ones. Similarly, if a phenotype is due to any effect that is not covered by our method (for example, if there are many completely different metabolic patterns that lead to the same phenotype or if the phenotype is related to regulatory effects), we expect that the (in this case low) classification quality lies within the same range for classification based on randomly picked pathways, all pathways, and pathways highly ranked in pathway selection. We are not able to associate (significantly) relevant pathways with any of these types of phenotypes. The results for the phenotype 'habitat: soil' using the classifier IB1 are shown in Figure 2 (right) as an example of such cases. As a consequence, we considered the high-ranking pathways as relevant for the phenotype only if the following applied to at least one of the four classifications: the quality of classification based on the top-ranking pathways (i) was considerably better than random, (ii) at least reached the classification quality achieved for all pathways, and (iii) at least reached a value of 0.6. As an example, Figure 2 (left) shows the resulting classification quality values depending on the number of considered pathways for the phenotype 'obligate intracellular' using the nearest neighbor classifier (IB1).


Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes.

Kastenmüller G, Schenk ME, Gasteiger J, Mewes HW - Genome Biol. (2009)

Estimating the significance of pathway rankings provided by pathway selection. For phenotypes that are weakly associated with the presence or absence of specific metabolic pathways, the classification quality should be within the same range for classification based on randomly picked pathways (red), all pathways (marked by a horizontal line), and pathways highly ranked in attribute subset selection (green, ReliefF; yellow, SVMAttributeEval; blue, wrapper (naïve Bayes)). As an example, the right diagram shows the classification quality for the phenotype 'habitat: soil' (depending on the number of top-ranking pathways used for classification). In this case, the top-ranking pathways provided by attribute subset selection are considered as not significant for the phenotype. The left diagram shows the classification quality values for the phenotype 'obligate intracellular'. Using the most relevant pathways for classification results in higher classification quality compared to using all pathways or randomly picked pathways. Furthermore, the quality values lie above 0.6. In this case, the most relevant pathways derived by attribute subset selection are considered as significant.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2690999&req=5

Figure 2: Estimating the significance of pathway rankings provided by pathway selection. For phenotypes that are weakly associated with the presence or absence of specific metabolic pathways, the classification quality should be within the same range for classification based on randomly picked pathways (red), all pathways (marked by a horizontal line), and pathways highly ranked in attribute subset selection (green, ReliefF; yellow, SVMAttributeEval; blue, wrapper (naïve Bayes)). As an example, the right diagram shows the classification quality for the phenotype 'habitat: soil' (depending on the number of top-ranking pathways used for classification). In this case, the top-ranking pathways provided by attribute subset selection are considered as not significant for the phenotype. The left diagram shows the classification quality values for the phenotype 'obligate intracellular'. Using the most relevant pathways for classification results in higher classification quality compared to using all pathways or randomly picked pathways. Furthermore, the quality values lie above 0.6. In this case, the most relevant pathways derived by attribute subset selection are considered as significant.
Mentions: Phenotypes that are not or only weakly associated with specific metabolic capabilities might, nonetheless, be developed by species that are similar in their complete metabolism. In this case any set of randomly picked pathways might have nearly the same (high) predictive power as the selected ones. Similarly, if a phenotype is due to any effect that is not covered by our method (for example, if there are many completely different metabolic patterns that lead to the same phenotype or if the phenotype is related to regulatory effects), we expect that the (in this case low) classification quality lies within the same range for classification based on randomly picked pathways, all pathways, and pathways highly ranked in pathway selection. We are not able to associate (significantly) relevant pathways with any of these types of phenotypes. The results for the phenotype 'habitat: soil' using the classifier IB1 are shown in Figure 2 (right) as an example of such cases. As a consequence, we considered the high-ranking pathways as relevant for the phenotype only if the following applied to at least one of the four classifications: the quality of classification based on the top-ranking pathways (i) was considerably better than random, (ii) at least reached the classification quality achieved for all pathways, and (iii) at least reached a value of 0.6. As an example, Figure 2 (left) shows the resulting classification quality values depending on the number of considered pathways for the phenotype 'obligate intracellular' using the nearest neighbor classifier (IB1).

Bottom Line: Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics.Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale.Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstrasse, Neuherberg, Germany. g.kastenmueller@helmholtz-muenchen.de

ABSTRACT
Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics. Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale. Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

Show MeSH
Related in: MedlinePlus