Limits...
A flood-based information flow analysis and network minimization method for gene regulatory networks.

Pavlogiannis A, Mozhayskiy V, Tagkopoulos I - BMC Bioinformatics (2013)

Bottom Line: Scalability and sensitivity analysis show that the proposed method scales well with the size of the network, and is robust to noise and missing data.The method of network flooding proves to be a useful, practical approach towards information flow analysis in gene regulatory networks.Further extension of the proposed theory has the potential to lead in a unifying framework for the simultaneous network minimization and information flow analysis across various "omics" levels.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of California Davis, One Shields Avenue, Davis, CA 95616, USA.

ABSTRACT

Background: Biological networks tend to have high interconnectivity, complex topologies and multiple types of interactions. This renders difficult the identification of sub-networks that are involved in condition- specific responses. In addition, we generally lack scalable methods that can reveal the information flow in gene regulatory and biochemical pathways. Doing so will help us to identify key participants and paths under specific environmental and cellular context.

Results: This paper introduces the theory of network flooding, which aims to address the problem of network minimization and regulatory information flow in gene regulatory networks. Given a regulatory biological network, a set of source (input) nodes and optionally a set of sink (output) nodes, our task is to find (a) the minimal sub-network that encodes the regulatory program involving all input and output nodes and (b) the information flow from the source to the sink nodes of the network. Here, we describe a novel, scalable, network traversal algorithm and we assess its potential to achieve significant network size reduction in both synthetic and E. coli networks. Scalability and sensitivity analysis show that the proposed method scales well with the size of the network, and is robust to noise and missing data.

Conclusions: The method of network flooding proves to be a useful, practical approach towards information flow analysis in gene regulatory networks. Further extension of the proposed theory has the potential to lead in a unifying framework for the simultaneous network minimization and information flow analysis across various "omics" levels.

Show MeSH

Related in: MedlinePlus

Flood network minimization overview for the E. coli gene regulatory network in the “Stationary phase” scenario. (A) Sub-network “reachable” from the inputs, which contains all nodes directly or indirectly regulated by the input nodes; (B) and (C) reduced sub-networks of nodes with the flood above 0.25 and 0.65 thresholds, respectively. Nodes represent genes; dark blue nodes in the center are the regulator nodes; light blue nodes in the outer circles are the regulated nodes (regulated genes from the same transcription unit and identical regulation are grouped together); grey nodes are the genes which are not included into the sub-network.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3672003&req=5

Figure 7: Flood network minimization overview for the E. coli gene regulatory network in the “Stationary phase” scenario. (A) Sub-network “reachable” from the inputs, which contains all nodes directly or indirectly regulated by the input nodes; (B) and (C) reduced sub-networks of nodes with the flood above 0.25 and 0.65 thresholds, respectively. Nodes represent genes; dark blue nodes in the center are the regulator nodes; light blue nodes in the outer circles are the regulated nodes (regulated genes from the same transcription unit and identical regulation are grouped together); grey nodes are the genes which are not included into the sub-network.

Mentions: Next, we reconstructed the regulatory network of E. coli from data available in EcoCyc [30] and RegulonDB [31]. Compared to the networks in the synthetic dataset, the derived E. coli network is incomplete with many genes measured in different environmental conditions and of unknown function or regulation (more than 1000 genes are not connected to any sigma factor). Unit regulatory weights were used in this example and Gaussian noise with a standard deviation of 0.05 was added. To evaluate our network flooding algorithm, we considered several scenarios that are relevant to E. coli growth and stress response. We used sigma factors as information source nodes, as they act as master regulators. under various scenarios and their relative concentration ratios are known for each of the conditions we consider here [32]. To assess the performance of our algorithm, we created a set of reporter genes for each condition that are likely to be involved in the respective processes. This set includes genes that have been differentially expressed in these conditions (microarray data provided in [33]), and are implicated in cellular response as indicated by their GO terms. In this context, our network flooding analysis has been used to reveal regulatory information flow, and perform network minimization (Figure 6). A functional analysis of the genes in the minimized network show consistent patterns with what is biologically known for growth in the conditions that we focus on (See Additional file 2: Tables S1-3). More specifically, under “exponential growth” (Figure 6B and Table 2) among the over-represented terms where protein complex (p-value 2.7 10-12, GO:0043234), cellular respiration (2.6 10-9, GO:0045333), chemotaxis (2.2 10-8, GO:0006935), generation of precursor metabolites and energy (7.9 10-8, GO:0006091), carbohydrate transport (1.3 10-7, GO:0008643), membrane part (1.110-6, GO:0044425), amino acid transport (2.7 10-6, GO:0006865). Under the stationary phase scenario (Figure 6A and Table 2), over-represented terms include oxidative phosphorylation (5 10-7, GO:0006119), anaerobic respiration (2.610-7, GO:0009061), oxidation-reduction process (5.4 10-5, GO:0055114), nitrogen utilization (2.910-5, GO:0019740), carboxylic acid transport (3.4 10-4, GO:0046942). Table 2 summarizes the different scenarios that we consider, along with the sizes and average floods in each case. Network flooding is able to minimize the network, while still preserving statistical significance of the results (Figure 6). P-values can be viewed as the probability that the reporter nodes obtained in the minimized network are by random chance. Figure 7 depicts the minimal network for different flood thresholds.


A flood-based information flow analysis and network minimization method for gene regulatory networks.

Pavlogiannis A, Mozhayskiy V, Tagkopoulos I - BMC Bioinformatics (2013)

Flood network minimization overview for the E. coli gene regulatory network in the “Stationary phase” scenario. (A) Sub-network “reachable” from the inputs, which contains all nodes directly or indirectly regulated by the input nodes; (B) and (C) reduced sub-networks of nodes with the flood above 0.25 and 0.65 thresholds, respectively. Nodes represent genes; dark blue nodes in the center are the regulator nodes; light blue nodes in the outer circles are the regulated nodes (regulated genes from the same transcription unit and identical regulation are grouped together); grey nodes are the genes which are not included into the sub-network.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3672003&req=5

Figure 7: Flood network minimization overview for the E. coli gene regulatory network in the “Stationary phase” scenario. (A) Sub-network “reachable” from the inputs, which contains all nodes directly or indirectly regulated by the input nodes; (B) and (C) reduced sub-networks of nodes with the flood above 0.25 and 0.65 thresholds, respectively. Nodes represent genes; dark blue nodes in the center are the regulator nodes; light blue nodes in the outer circles are the regulated nodes (regulated genes from the same transcription unit and identical regulation are grouped together); grey nodes are the genes which are not included into the sub-network.
Mentions: Next, we reconstructed the regulatory network of E. coli from data available in EcoCyc [30] and RegulonDB [31]. Compared to the networks in the synthetic dataset, the derived E. coli network is incomplete with many genes measured in different environmental conditions and of unknown function or regulation (more than 1000 genes are not connected to any sigma factor). Unit regulatory weights were used in this example and Gaussian noise with a standard deviation of 0.05 was added. To evaluate our network flooding algorithm, we considered several scenarios that are relevant to E. coli growth and stress response. We used sigma factors as information source nodes, as they act as master regulators. under various scenarios and their relative concentration ratios are known for each of the conditions we consider here [32]. To assess the performance of our algorithm, we created a set of reporter genes for each condition that are likely to be involved in the respective processes. This set includes genes that have been differentially expressed in these conditions (microarray data provided in [33]), and are implicated in cellular response as indicated by their GO terms. In this context, our network flooding analysis has been used to reveal regulatory information flow, and perform network minimization (Figure 6). A functional analysis of the genes in the minimized network show consistent patterns with what is biologically known for growth in the conditions that we focus on (See Additional file 2: Tables S1-3). More specifically, under “exponential growth” (Figure 6B and Table 2) among the over-represented terms where protein complex (p-value 2.7 10-12, GO:0043234), cellular respiration (2.6 10-9, GO:0045333), chemotaxis (2.2 10-8, GO:0006935), generation of precursor metabolites and energy (7.9 10-8, GO:0006091), carbohydrate transport (1.3 10-7, GO:0008643), membrane part (1.110-6, GO:0044425), amino acid transport (2.7 10-6, GO:0006865). Under the stationary phase scenario (Figure 6A and Table 2), over-represented terms include oxidative phosphorylation (5 10-7, GO:0006119), anaerobic respiration (2.610-7, GO:0009061), oxidation-reduction process (5.4 10-5, GO:0055114), nitrogen utilization (2.910-5, GO:0019740), carboxylic acid transport (3.4 10-4, GO:0046942). Table 2 summarizes the different scenarios that we consider, along with the sizes and average floods in each case. Network flooding is able to minimize the network, while still preserving statistical significance of the results (Figure 6). P-values can be viewed as the probability that the reporter nodes obtained in the minimized network are by random chance. Figure 7 depicts the minimal network for different flood thresholds.

Bottom Line: Scalability and sensitivity analysis show that the proposed method scales well with the size of the network, and is robust to noise and missing data.The method of network flooding proves to be a useful, practical approach towards information flow analysis in gene regulatory networks.Further extension of the proposed theory has the potential to lead in a unifying framework for the simultaneous network minimization and information flow analysis across various "omics" levels.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of California Davis, One Shields Avenue, Davis, CA 95616, USA.

ABSTRACT

Background: Biological networks tend to have high interconnectivity, complex topologies and multiple types of interactions. This renders difficult the identification of sub-networks that are involved in condition- specific responses. In addition, we generally lack scalable methods that can reveal the information flow in gene regulatory and biochemical pathways. Doing so will help us to identify key participants and paths under specific environmental and cellular context.

Results: This paper introduces the theory of network flooding, which aims to address the problem of network minimization and regulatory information flow in gene regulatory networks. Given a regulatory biological network, a set of source (input) nodes and optionally a set of sink (output) nodes, our task is to find (a) the minimal sub-network that encodes the regulatory program involving all input and output nodes and (b) the information flow from the source to the sink nodes of the network. Here, we describe a novel, scalable, network traversal algorithm and we assess its potential to achieve significant network size reduction in both synthetic and E. coli networks. Scalability and sensitivity analysis show that the proposed method scales well with the size of the network, and is robust to noise and missing data.

Conclusions: The method of network flooding proves to be a useful, practical approach towards information flow analysis in gene regulatory networks. Further extension of the proposed theory has the potential to lead in a unifying framework for the simultaneous network minimization and information flow analysis across various "omics" levels.

Show MeSH
Related in: MedlinePlus