Limits...
An integrative genomic approach to uncover molecular mechanisms of prokaryotic traits.

Liu Y, Li J, Sam L, Goh CS, Gerstein M, Lussier YA - PLoS Comput. Biol. (2006)

Bottom Line: We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations.We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature.We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.

View Article: PubMed Central - PubMed

Affiliation: Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America.

ABSTRACT
With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes, current computational technologies either lack high throughput capacity for genomic scale analysis, or are limited in their capability to integrate and mine data across different scales of biology. Consequently, simultaneous analysis of associations among genomes, phenotypes, and gene functions is prohibited. Here, we developed a high throughput computational approach, and demonstrated for the first time the feasibility of integrating large quantities of prokaryotic phenotypes along with genomic datasets for mining across multiple scales of biology (protein domains, pathways, molecular functions, and cellular processes). Applying this method over 59 fully sequenced prokaryotic species, we identified genetic basis and molecular mechanisms underlying the phenotypes in bacteria. We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations. Manual evaluation of a random sample of these significant correlations showed a minimal precision of 30% (95% confidence interval: 20%-42%; n = 50). We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature. We furthermore unveiled 10 significant correlations between phenotypes and KEGG pathways, eight of which were corroborated in the evaluation, and 309 significant correlations between phenotypes and 166 GO concepts evaluated using a random sample (minimal precision = 72%; 95% confidence interval: 60%-80%; n = 50). Additionally, we conducted a novel large-scale phenomic visualization analysis to provide insight into the modular nature of common molecular mechanisms spanning multiple biological scales and reused by related phenotypes (metaphenotypes). We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.

Show MeSH
Scalar Network of Correlated Phenotypes, GO, Pathways, and Protein FamiliesAs predicted by our study, six phenotypes, taken from a phenotypic cluster in Figure 4 (highlighted there in a green box) are shown highly connected with their significantly correlated biological scales: KEGG pathways, GO concepts, and Pfam families. Every relationship (orange and green lines between concept nodes) has been derived from our study with the exception of relationships between GO and Pfam (blue lines) that were taken from public databases.D-mannose, acid production in a medium containing D-mannose; Facultative anaerobic, facultative anaerobic organism; Glucose fermenter, fermentation in a glucose medium; Glycerol, acid production in a medium containing glycerol; Maltose, acid production in a medium containing maltose; Trehalose, acid production in a medium containing trehalose; PF01904, unknown function; PF00401, ATP Synthase; PF00358, Phosphoenopyruvate-dependent sugar PTS (EIIA 1); PF00367, PTS (EIIB); PF02302, PTS Lactose/Cellobiose specific IIB subunit; PF02378, PTS (EIIC); PF02379, PTS system Fructose-specific IIB subunit.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1636675&req=5

pcbi-0020159-g005: Scalar Network of Correlated Phenotypes, GO, Pathways, and Protein FamiliesAs predicted by our study, six phenotypes, taken from a phenotypic cluster in Figure 4 (highlighted there in a green box) are shown highly connected with their significantly correlated biological scales: KEGG pathways, GO concepts, and Pfam families. Every relationship (orange and green lines between concept nodes) has been derived from our study with the exception of relationships between GO and Pfam (blue lines) that were taken from public databases.D-mannose, acid production in a medium containing D-mannose; Facultative anaerobic, facultative anaerobic organism; Glucose fermenter, fermentation in a glucose medium; Glycerol, acid production in a medium containing glycerol; Maltose, acid production in a medium containing maltose; Trehalose, acid production in a medium containing trehalose; PF01904, unknown function; PF00401, ATP Synthase; PF00358, Phosphoenopyruvate-dependent sugar PTS (EIIA 1); PF00367, PTS (EIIB); PF02302, PTS Lactose/Cellobiose specific IIB subunit; PF02378, PTS (EIIC); PF02379, PTS system Fructose-specific IIB subunit.

Mentions: The third cluster of phenotypes within the green boxes contains six phenotypes related to the catabolism of carbohydrates clustered in the following order: Glucose fermenter (fermentation in a glucose medium), Maltose (production of acid in a medium containing maltose), Facultative anaerobic, Glycerol (production of acid in a medium containing glycerol), Trehalose (production of acid in a medium containing trehalose), and D-mannose (production of acid in a medium containing D-mannose). Every one of these phenotypes is also related to glycolysis [43]. We illustrated this cluster of phenotypes with their significantly correlated Pfam families, GO concepts, and KEGG pathways in detail (shown as a multiscale network in the Figure 5). To constrain the network of cross-scale relationships to the most relevant ones, the criteria for displaying a molecular class were the following: 1) GO terms significantly correlated with at least four phenotypes in the cluster, 2) a KEGG pathway with significant correlations to three phenotypes, and 3) Pfam significantly correlated with at least two phenotypes in the cluster (with the exception of one uncharacterized Pfam that has only one link to Glycerol, to illustrate the use of the integrated view for possible predictions). The cross-scale relationships between Pfam and GO terms (Figure 5, blue lines) were retrieved from public databases (discussed in Materials and Methods). Using these visualization criteria, we observe that this phenotypic cluster is particularly networked together, as many phenotypes share common KEGG pathways, GO concepts, and Pfam families based on our previous analyses. For example, facultative anaerobic bacteria with ability to metabolize D-mannose share one common KEGG pathway, phosphotransferase system pathway (PTS) and two GO concepts, phosphoenolpyruvate-dependent sugar phosphotransferase system, and sugar porter activity. In addition, three molecular classes obviously related to the carbohydrate transport system in bacteria have been closely associated to the same phenotypic cluster: the KEGG pathway PTS, the cellular process phosphoenolpyruvate-dependent sugar phosphotransferase system PTS, and the molecular function sugar porter activity. Overall, five of the six phenotypes in this cluster share many common protein domain families (Pfam) intervening in the PTS system, as well as higher-level biological concepts, such as GO and KEGG pathways, strongly suggesting similarities or overlaps in their underlying molecular mechanism. In addition to the clustering of phenotypes, clustering of Pfam families based on their correlations to different phenotypes may also provide an informative view of the Pfam families, reflecting their activities in different phenotypes. Macroscopic phenotypes closely related to the catabolism of carbohydrates are thus also highly linked in this illustration with molecular classes closely related to the transport of carbohydrates. This visualization of cross-scale relationships, linked together across multiple biological scales and forming a multiscale nexus within the phenomic network, constitutes a proof of concept that the method could be applied to investigate less-understood regions of the network that we developed. We are in the process of further exploring this multiscale network in close collaboration with microbiologists. To our knowledge, this is the first phenomic study designed to predict and visualize cross-scale relationships between clusters of prokaryotic phenotypes (metaphenotypes) and their molecular mechanisms.


An integrative genomic approach to uncover molecular mechanisms of prokaryotic traits.

Liu Y, Li J, Sam L, Goh CS, Gerstein M, Lussier YA - PLoS Comput. Biol. (2006)

Scalar Network of Correlated Phenotypes, GO, Pathways, and Protein FamiliesAs predicted by our study, six phenotypes, taken from a phenotypic cluster in Figure 4 (highlighted there in a green box) are shown highly connected with their significantly correlated biological scales: KEGG pathways, GO concepts, and Pfam families. Every relationship (orange and green lines between concept nodes) has been derived from our study with the exception of relationships between GO and Pfam (blue lines) that were taken from public databases.D-mannose, acid production in a medium containing D-mannose; Facultative anaerobic, facultative anaerobic organism; Glucose fermenter, fermentation in a glucose medium; Glycerol, acid production in a medium containing glycerol; Maltose, acid production in a medium containing maltose; Trehalose, acid production in a medium containing trehalose; PF01904, unknown function; PF00401, ATP Synthase; PF00358, Phosphoenopyruvate-dependent sugar PTS (EIIA 1); PF00367, PTS (EIIB); PF02302, PTS Lactose/Cellobiose specific IIB subunit; PF02378, PTS (EIIC); PF02379, PTS system Fructose-specific IIB subunit.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1636675&req=5

pcbi-0020159-g005: Scalar Network of Correlated Phenotypes, GO, Pathways, and Protein FamiliesAs predicted by our study, six phenotypes, taken from a phenotypic cluster in Figure 4 (highlighted there in a green box) are shown highly connected with their significantly correlated biological scales: KEGG pathways, GO concepts, and Pfam families. Every relationship (orange and green lines between concept nodes) has been derived from our study with the exception of relationships between GO and Pfam (blue lines) that were taken from public databases.D-mannose, acid production in a medium containing D-mannose; Facultative anaerobic, facultative anaerobic organism; Glucose fermenter, fermentation in a glucose medium; Glycerol, acid production in a medium containing glycerol; Maltose, acid production in a medium containing maltose; Trehalose, acid production in a medium containing trehalose; PF01904, unknown function; PF00401, ATP Synthase; PF00358, Phosphoenopyruvate-dependent sugar PTS (EIIA 1); PF00367, PTS (EIIB); PF02302, PTS Lactose/Cellobiose specific IIB subunit; PF02378, PTS (EIIC); PF02379, PTS system Fructose-specific IIB subunit.
Mentions: The third cluster of phenotypes within the green boxes contains six phenotypes related to the catabolism of carbohydrates clustered in the following order: Glucose fermenter (fermentation in a glucose medium), Maltose (production of acid in a medium containing maltose), Facultative anaerobic, Glycerol (production of acid in a medium containing glycerol), Trehalose (production of acid in a medium containing trehalose), and D-mannose (production of acid in a medium containing D-mannose). Every one of these phenotypes is also related to glycolysis [43]. We illustrated this cluster of phenotypes with their significantly correlated Pfam families, GO concepts, and KEGG pathways in detail (shown as a multiscale network in the Figure 5). To constrain the network of cross-scale relationships to the most relevant ones, the criteria for displaying a molecular class were the following: 1) GO terms significantly correlated with at least four phenotypes in the cluster, 2) a KEGG pathway with significant correlations to three phenotypes, and 3) Pfam significantly correlated with at least two phenotypes in the cluster (with the exception of one uncharacterized Pfam that has only one link to Glycerol, to illustrate the use of the integrated view for possible predictions). The cross-scale relationships between Pfam and GO terms (Figure 5, blue lines) were retrieved from public databases (discussed in Materials and Methods). Using these visualization criteria, we observe that this phenotypic cluster is particularly networked together, as many phenotypes share common KEGG pathways, GO concepts, and Pfam families based on our previous analyses. For example, facultative anaerobic bacteria with ability to metabolize D-mannose share one common KEGG pathway, phosphotransferase system pathway (PTS) and two GO concepts, phosphoenolpyruvate-dependent sugar phosphotransferase system, and sugar porter activity. In addition, three molecular classes obviously related to the carbohydrate transport system in bacteria have been closely associated to the same phenotypic cluster: the KEGG pathway PTS, the cellular process phosphoenolpyruvate-dependent sugar phosphotransferase system PTS, and the molecular function sugar porter activity. Overall, five of the six phenotypes in this cluster share many common protein domain families (Pfam) intervening in the PTS system, as well as higher-level biological concepts, such as GO and KEGG pathways, strongly suggesting similarities or overlaps in their underlying molecular mechanism. In addition to the clustering of phenotypes, clustering of Pfam families based on their correlations to different phenotypes may also provide an informative view of the Pfam families, reflecting their activities in different phenotypes. Macroscopic phenotypes closely related to the catabolism of carbohydrates are thus also highly linked in this illustration with molecular classes closely related to the transport of carbohydrates. This visualization of cross-scale relationships, linked together across multiple biological scales and forming a multiscale nexus within the phenomic network, constitutes a proof of concept that the method could be applied to investigate less-understood regions of the network that we developed. We are in the process of further exploring this multiscale network in close collaboration with microbiologists. To our knowledge, this is the first phenomic study designed to predict and visualize cross-scale relationships between clusters of prokaryotic phenotypes (metaphenotypes) and their molecular mechanisms.

Bottom Line: We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations.We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature.We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.

View Article: PubMed Central - PubMed

Affiliation: Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America.

ABSTRACT
With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes, current computational technologies either lack high throughput capacity for genomic scale analysis, or are limited in their capability to integrate and mine data across different scales of biology. Consequently, simultaneous analysis of associations among genomes, phenotypes, and gene functions is prohibited. Here, we developed a high throughput computational approach, and demonstrated for the first time the feasibility of integrating large quantities of prokaryotic phenotypes along with genomic datasets for mining across multiple scales of biology (protein domains, pathways, molecular functions, and cellular processes). Applying this method over 59 fully sequenced prokaryotic species, we identified genetic basis and molecular mechanisms underlying the phenotypes in bacteria. We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations. Manual evaluation of a random sample of these significant correlations showed a minimal precision of 30% (95% confidence interval: 20%-42%; n = 50). We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature. We furthermore unveiled 10 significant correlations between phenotypes and KEGG pathways, eight of which were corroborated in the evaluation, and 309 significant correlations between phenotypes and 166 GO concepts evaluated using a random sample (minimal precision = 72%; 95% confidence interval: 60%-80%; n = 50). Additionally, we conducted a novel large-scale phenomic visualization analysis to provide insight into the modular nature of common molecular mechanisms spanning multiple biological scales and reused by related phenotypes (metaphenotypes). We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.

Show MeSH