Limits...
Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes.

Kastenmüller G, Schenk ME, Gasteiger J, Mewes HW - Genome Biol. (2009)

Bottom Line: Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics.Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale.Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstrasse, Neuherberg, Germany. g.kastenmueller@helmholtz-muenchen.de

ABSTRACT
Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics. Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale. Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

Show MeSH

Related in: MedlinePlus

Classification quality for the phenotype periodontal disease causing. Left: classification of all genomes (266) into genomes related and not related to periodontal disease using the nearest neighbor classifier (IB1). Right: classification of oral genomes (15) into genomes related and not related to periodontal disease using the nearest neighbor classifier (IB1). Compared to classification based on all pathways (marked by a horizontal line) and based on randomly picked pathways (red), the classification based on the most relevant pathways yields better separation of periodontal species and other species in both genome datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2690999&req=5

Figure 6: Classification quality for the phenotype periodontal disease causing. Left: classification of all genomes (266) into genomes related and not related to periodontal disease using the nearest neighbor classifier (IB1). Right: classification of oral genomes (15) into genomes related and not related to periodontal disease using the nearest neighbor classifier (IB1). Compared to classification based on all pathways (marked by a horizontal line) and based on randomly picked pathways (red), the classification based on the most relevant pathways yields better separation of periodontal species and other species in both genome datasets.

Mentions: Analogous to the previous example of methanogenesis, we applied our method to the complete set of pathway profiles (266 species) as well as to the reduced set of 15 oral genomes to focus on periodontal-related rather than oral cavity-related biochemical features. Figure 6 shows the resulting classification qualities achieved with the nearest neighbor classifier. According to the cross-check, the phenotype 'periodontal disease causing' is reflected by the identified relevant pathways. In contrast to the phenotype 'methanogenesis', more highly ranking pathways must be considered for classification to reach the maximum classification quality. Therefore, we focus on the ten most relevant pathways in the following. Using these pathways, we obtained 0.75 as the maximum classification quality value in both genome sets compared to a maximum of 0.50 for all pathways and maximums of 0.08 and 0.29, respectively, for randomly chosen pathways (Table 4).


Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes.

Kastenmüller G, Schenk ME, Gasteiger J, Mewes HW - Genome Biol. (2009)

Classification quality for the phenotype periodontal disease causing. Left: classification of all genomes (266) into genomes related and not related to periodontal disease using the nearest neighbor classifier (IB1). Right: classification of oral genomes (15) into genomes related and not related to periodontal disease using the nearest neighbor classifier (IB1). Compared to classification based on all pathways (marked by a horizontal line) and based on randomly picked pathways (red), the classification based on the most relevant pathways yields better separation of periodontal species and other species in both genome datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2690999&req=5

Figure 6: Classification quality for the phenotype periodontal disease causing. Left: classification of all genomes (266) into genomes related and not related to periodontal disease using the nearest neighbor classifier (IB1). Right: classification of oral genomes (15) into genomes related and not related to periodontal disease using the nearest neighbor classifier (IB1). Compared to classification based on all pathways (marked by a horizontal line) and based on randomly picked pathways (red), the classification based on the most relevant pathways yields better separation of periodontal species and other species in both genome datasets.
Mentions: Analogous to the previous example of methanogenesis, we applied our method to the complete set of pathway profiles (266 species) as well as to the reduced set of 15 oral genomes to focus on periodontal-related rather than oral cavity-related biochemical features. Figure 6 shows the resulting classification qualities achieved with the nearest neighbor classifier. According to the cross-check, the phenotype 'periodontal disease causing' is reflected by the identified relevant pathways. In contrast to the phenotype 'methanogenesis', more highly ranking pathways must be considered for classification to reach the maximum classification quality. Therefore, we focus on the ten most relevant pathways in the following. Using these pathways, we obtained 0.75 as the maximum classification quality value in both genome sets compared to a maximum of 0.50 for all pathways and maximums of 0.08 and 0.29, respectively, for randomly chosen pathways (Table 4).

Bottom Line: Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics.Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale.Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstrasse, Neuherberg, Germany. g.kastenmueller@helmholtz-muenchen.de

ABSTRACT
Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics. Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale. Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

Show MeSH
Related in: MedlinePlus