Limits...
Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes.

Kastenmüller G, Schenk ME, Gasteiger J, Mewes HW - Genome Biol. (2009)

Bottom Line: Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics.Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale.Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstrasse, Neuherberg, Germany. g.kastenmueller@helmholtz-muenchen.de

ABSTRACT
Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics. Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale. Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

Show MeSH

Related in: MedlinePlus

Classification quality for the phenotype member of red or orange cluster. Left: classification of all genomes (266) into genomes that are members and non-members of the 'red/orange' cluster using the nearest neighbor classifier (IB1). Right: classification of oral genomes (15) into genomes that are members and non-members of the 'red/orange' cluster related using the nearest neighbor classifier (IB1). Compared to classification based on all pathways (marked by a horizontal line) and based on randomly picked pathways (red), the classification based on the most relevant pathways yields better separation of the cluster members and other species in both genome datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2690999&req=5

Figure 8: Classification quality for the phenotype member of red or orange cluster. Left: classification of all genomes (266) into genomes that are members and non-members of the 'red/orange' cluster using the nearest neighbor classifier (IB1). Right: classification of oral genomes (15) into genomes that are members and non-members of the 'red/orange' cluster related using the nearest neighbor classifier (IB1). Compared to classification based on all pathways (marked by a horizontal line) and based on randomly picked pathways (red), the classification based on the most relevant pathways yields better separation of the cluster members and other species in both genome datasets.

Mentions: In order to get more specific insights for the three species of the 'red' and 'orange' clusters, we repeated the procedure described above for the phenotype 'member of the red or orange cluster'. (Since Socransky et al. [52] derived those clusters based on clinical measures for the co-occurrence of oral species, this phenotype can be considered as a clinical phenotype.) As expected, we received enhanced classification quality (Table 5 and Figure 8). The pathways that are among the ten most relevant pathways for at least one attribute selection method and for at least three of the four investigated datasets are listed in Table 6 and briefly described below (for all pathways, see Additional data file 2). In Table 6, these datasets are abbreviated by two characters. The first character denotes the phenotypic information used: 3 ='members of red or orange cluster' and 4 ='periodontal disease causing'. The second character denotes the set of genomes in the dataset: A = all genomes (266); O = oral cavity genomes (15). (This results in the following abbreviations for the four combinations of phenotypic information and genome sets that have been investigated: 4A, 'periodontal disease causing' genomes in the complete dataset (266 genomes); 4O, 'periodontal disease causing' genomes in the oral cavity dataset (15 genomes); 3A, 'members of red or orange cluster' in the complete dataset; 3O, 'members of red or orange cluster' in the oral cavity dataset.)


Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes.

Kastenmüller G, Schenk ME, Gasteiger J, Mewes HW - Genome Biol. (2009)

Classification quality for the phenotype member of red or orange cluster. Left: classification of all genomes (266) into genomes that are members and non-members of the 'red/orange' cluster using the nearest neighbor classifier (IB1). Right: classification of oral genomes (15) into genomes that are members and non-members of the 'red/orange' cluster related using the nearest neighbor classifier (IB1). Compared to classification based on all pathways (marked by a horizontal line) and based on randomly picked pathways (red), the classification based on the most relevant pathways yields better separation of the cluster members and other species in both genome datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2690999&req=5

Figure 8: Classification quality for the phenotype member of red or orange cluster. Left: classification of all genomes (266) into genomes that are members and non-members of the 'red/orange' cluster using the nearest neighbor classifier (IB1). Right: classification of oral genomes (15) into genomes that are members and non-members of the 'red/orange' cluster related using the nearest neighbor classifier (IB1). Compared to classification based on all pathways (marked by a horizontal line) and based on randomly picked pathways (red), the classification based on the most relevant pathways yields better separation of the cluster members and other species in both genome datasets.
Mentions: In order to get more specific insights for the three species of the 'red' and 'orange' clusters, we repeated the procedure described above for the phenotype 'member of the red or orange cluster'. (Since Socransky et al. [52] derived those clusters based on clinical measures for the co-occurrence of oral species, this phenotype can be considered as a clinical phenotype.) As expected, we received enhanced classification quality (Table 5 and Figure 8). The pathways that are among the ten most relevant pathways for at least one attribute selection method and for at least three of the four investigated datasets are listed in Table 6 and briefly described below (for all pathways, see Additional data file 2). In Table 6, these datasets are abbreviated by two characters. The first character denotes the phenotypic information used: 3 ='members of red or orange cluster' and 4 ='periodontal disease causing'. The second character denotes the set of genomes in the dataset: A = all genomes (266); O = oral cavity genomes (15). (This results in the following abbreviations for the four combinations of phenotypic information and genome sets that have been investigated: 4A, 'periodontal disease causing' genomes in the complete dataset (266 genomes); 4O, 'periodontal disease causing' genomes in the oral cavity dataset (15 genomes); 3A, 'members of red or orange cluster' in the complete dataset; 3O, 'members of red or orange cluster' in the oral cavity dataset.)

Bottom Line: Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics.Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale.Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstrasse, Neuherberg, Germany. g.kastenmueller@helmholtz-muenchen.de

ABSTRACT
Identifying the biochemical basis of microbial phenotypes is a main objective of comparative genomics. Here we present a novel method using multivariate machine learning techniques for comparing automatically derived metabolic reconstructions of sequenced genomes on a large scale. Applying our method to 266 genomes directly led to testable hypotheses such as the link between the potential of microorganisms to cause periodontal disease and their ability to degrade histidine, a link also supported by clinical studies.

Show MeSH
Related in: MedlinePlus