Limits...
DREAM4: Combining genetic and dynamic information to identify biological networks and dynamical models.

Greenfield A, Madar A, Ostrer H, Bonneau R - PLoS ONE (2010)

Bottom Line: We demonstrate complementarity between this method and the two methods that take advantage of time-series data by combining the three into a pipeline whose ability to rank regulatory interactions is markedly improved compared to either method alone.Moreover, the pipeline is able to accurately predict the response of the system to new conditions (in this case new double knock-out genetic perturbations).Our evaluation of the performance of multiple methods for network inference suggests avenues for future methods development and provides simple considerations for genomic experimental design.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Program, New York University Sackler School of Medicine, New York, New York, United States of America.

ABSTRACT

Background: Current technologies have lead to the availability of multiple genomic data types in sufficient quantity and quality to serve as a basis for automatic global network inference. Accordingly, there are currently a large variety of network inference methods that learn regulatory networks to varying degrees of detail. These methods have different strengths and weaknesses and thus can be complementary. However, combining different methods in a mutually reinforcing manner remains a challenge.

Methodology: We investigate how three scalable methods can be combined into a useful network inference pipeline. The first is a novel t-test-based method that relies on a comprehensive steady-state knock-out dataset to rank regulatory interactions. The remaining two are previously published mutual information and ordinary differential equation based methods (tlCLR and Inferelator 1.0, respectively) that use both time-series and steady-state data to rank regulatory interactions; the latter has the added advantage of also inferring dynamic models of gene regulation which can be used to predict the system's response to new perturbations.

Conclusion/significance: Our t-test based method proved powerful at ranking regulatory interactions, tying for first out of methods in the DREAM4 100-gene in-silico network inference challenge. We demonstrate complementarity between this method and the two methods that take advantage of time-series data by combining the three into a pipeline whose ability to rank regulatory interactions is markedly improved compared to either method alone. Moreover, the pipeline is able to accurately predict the response of the system to new conditions (in this case new double knock-out genetic perturbations). Our evaluation of the performance of multiple methods for network inference suggests avenues for future methods development and provides simple considerations for genomic experimental design. Our code is publicly available at http://err.bio.nyu.edu/inferelator/.

Show MeSH

Related in: MedlinePlus

Performance on double knock-out prediction.We assess the accuracy of predicting the system's response to the simultaneous removal (knock-out) of two genes . In total, there were one-hundred pairs of genes that were knocked out. We bin these pairs of genes based on the average of their respective median expression in the single-gene knock-out data. We made two predictions, which differ only in the choice of initial conditions. We compare the error (as evaluated by the mean squared error) of our prediction to the error made by using the respective initial condition as a prediction. A) We use the wild-type expression, , as the set of initial conditions (green boxplots). We see that our predictions (black and red boxplots) are more accurate than if we used the initial conditions as a prediction (this is more apparent for TFs with a larger median expression). B) We use a combination of the single-gene knock-outs to compute our initial conditions (eq. 25). We do this because the single-gene knock-out data represents a system state that is closer to the state we are trying to predict than wild-type (as can be observed by comparing the green boxplots in panel A to those in panel B). We show the error distributions using parameters calculated by either pipeline 3 (tlCLR-Inferelator+MCZ) or pipeline 4 (Resampling+MCZ), gray and red boxplots, respectively, are smaller than the error distributions if we used the initial conditions as a prediction. Regardless of the choice of initial conditions, the error distributions using parameters calculated by pipeline 4 (red boxplots) are similar to the error distribution obtained by pipeline 3.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2963605&req=5

pone-0013397-g005: Performance on double knock-out prediction.We assess the accuracy of predicting the system's response to the simultaneous removal (knock-out) of two genes . In total, there were one-hundred pairs of genes that were knocked out. We bin these pairs of genes based on the average of their respective median expression in the single-gene knock-out data. We made two predictions, which differ only in the choice of initial conditions. We compare the error (as evaluated by the mean squared error) of our prediction to the error made by using the respective initial condition as a prediction. A) We use the wild-type expression, , as the set of initial conditions (green boxplots). We see that our predictions (black and red boxplots) are more accurate than if we used the initial conditions as a prediction (this is more apparent for TFs with a larger median expression). B) We use a combination of the single-gene knock-outs to compute our initial conditions (eq. 25). We do this because the single-gene knock-out data represents a system state that is closer to the state we are trying to predict than wild-type (as can be observed by comparing the green boxplots in panel A to those in panel B). We show the error distributions using parameters calculated by either pipeline 3 (tlCLR-Inferelator+MCZ) or pipeline 4 (Resampling+MCZ), gray and red boxplots, respectively, are smaller than the error distributions if we used the initial conditions as a prediction. Regardless of the choice of initial conditions, the error distributions using parameters calculated by pipeline 4 (red boxplots) are similar to the error distribution obtained by pipeline 3.

Mentions: In Figure 5 we bin regulators based on their median expression and show the corresponding error distributions for our predictions. We compare our error to the error made if we used the initial conditions as a prediction of the response of the system. In Figure 5A we use the wild type expression, , as the set of initial conditions. We see that predictions made using either pipeline 3 (gray boxplots) or pipeline 4 (red boxplots) outperform the initial conditions (green boxplots). In Figure 5B we construct our initial conditions from the given single gene knock-out values and our MCZ confidence scores (eq. 25). We see that our predictions (black and red boxplots) outperform the initial conditions (green boxplots). Furthermore, by comparing the green boxplots in Figure 5B to those in Figure 5A, we see that predictions based on initial conditions derived from the single knock-out data have much lower error than predictions based on initial conditions derived from the wild type. Regardless of which initial conditions are chosen, predictions using parameters derived from pipeline 4 show almost identical performance as those made by using the parameterization derived from pipeline 3.


DREAM4: Combining genetic and dynamic information to identify biological networks and dynamical models.

Greenfield A, Madar A, Ostrer H, Bonneau R - PLoS ONE (2010)

Performance on double knock-out prediction.We assess the accuracy of predicting the system's response to the simultaneous removal (knock-out) of two genes . In total, there were one-hundred pairs of genes that were knocked out. We bin these pairs of genes based on the average of their respective median expression in the single-gene knock-out data. We made two predictions, which differ only in the choice of initial conditions. We compare the error (as evaluated by the mean squared error) of our prediction to the error made by using the respective initial condition as a prediction. A) We use the wild-type expression, , as the set of initial conditions (green boxplots). We see that our predictions (black and red boxplots) are more accurate than if we used the initial conditions as a prediction (this is more apparent for TFs with a larger median expression). B) We use a combination of the single-gene knock-outs to compute our initial conditions (eq. 25). We do this because the single-gene knock-out data represents a system state that is closer to the state we are trying to predict than wild-type (as can be observed by comparing the green boxplots in panel A to those in panel B). We show the error distributions using parameters calculated by either pipeline 3 (tlCLR-Inferelator+MCZ) or pipeline 4 (Resampling+MCZ), gray and red boxplots, respectively, are smaller than the error distributions if we used the initial conditions as a prediction. Regardless of the choice of initial conditions, the error distributions using parameters calculated by pipeline 4 (red boxplots) are similar to the error distribution obtained by pipeline 3.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2963605&req=5

pone-0013397-g005: Performance on double knock-out prediction.We assess the accuracy of predicting the system's response to the simultaneous removal (knock-out) of two genes . In total, there were one-hundred pairs of genes that were knocked out. We bin these pairs of genes based on the average of their respective median expression in the single-gene knock-out data. We made two predictions, which differ only in the choice of initial conditions. We compare the error (as evaluated by the mean squared error) of our prediction to the error made by using the respective initial condition as a prediction. A) We use the wild-type expression, , as the set of initial conditions (green boxplots). We see that our predictions (black and red boxplots) are more accurate than if we used the initial conditions as a prediction (this is more apparent for TFs with a larger median expression). B) We use a combination of the single-gene knock-outs to compute our initial conditions (eq. 25). We do this because the single-gene knock-out data represents a system state that is closer to the state we are trying to predict than wild-type (as can be observed by comparing the green boxplots in panel A to those in panel B). We show the error distributions using parameters calculated by either pipeline 3 (tlCLR-Inferelator+MCZ) or pipeline 4 (Resampling+MCZ), gray and red boxplots, respectively, are smaller than the error distributions if we used the initial conditions as a prediction. Regardless of the choice of initial conditions, the error distributions using parameters calculated by pipeline 4 (red boxplots) are similar to the error distribution obtained by pipeline 3.
Mentions: In Figure 5 we bin regulators based on their median expression and show the corresponding error distributions for our predictions. We compare our error to the error made if we used the initial conditions as a prediction of the response of the system. In Figure 5A we use the wild type expression, , as the set of initial conditions. We see that predictions made using either pipeline 3 (gray boxplots) or pipeline 4 (red boxplots) outperform the initial conditions (green boxplots). In Figure 5B we construct our initial conditions from the given single gene knock-out values and our MCZ confidence scores (eq. 25). We see that our predictions (black and red boxplots) outperform the initial conditions (green boxplots). Furthermore, by comparing the green boxplots in Figure 5B to those in Figure 5A, we see that predictions based on initial conditions derived from the single knock-out data have much lower error than predictions based on initial conditions derived from the wild type. Regardless of which initial conditions are chosen, predictions using parameters derived from pipeline 4 show almost identical performance as those made by using the parameterization derived from pipeline 3.

Bottom Line: We demonstrate complementarity between this method and the two methods that take advantage of time-series data by combining the three into a pipeline whose ability to rank regulatory interactions is markedly improved compared to either method alone.Moreover, the pipeline is able to accurately predict the response of the system to new conditions (in this case new double knock-out genetic perturbations).Our evaluation of the performance of multiple methods for network inference suggests avenues for future methods development and provides simple considerations for genomic experimental design.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Program, New York University Sackler School of Medicine, New York, New York, United States of America.

ABSTRACT

Background: Current technologies have lead to the availability of multiple genomic data types in sufficient quantity and quality to serve as a basis for automatic global network inference. Accordingly, there are currently a large variety of network inference methods that learn regulatory networks to varying degrees of detail. These methods have different strengths and weaknesses and thus can be complementary. However, combining different methods in a mutually reinforcing manner remains a challenge.

Methodology: We investigate how three scalable methods can be combined into a useful network inference pipeline. The first is a novel t-test-based method that relies on a comprehensive steady-state knock-out dataset to rank regulatory interactions. The remaining two are previously published mutual information and ordinary differential equation based methods (tlCLR and Inferelator 1.0, respectively) that use both time-series and steady-state data to rank regulatory interactions; the latter has the added advantage of also inferring dynamic models of gene regulation which can be used to predict the system's response to new perturbations.

Conclusion/significance: Our t-test based method proved powerful at ranking regulatory interactions, tying for first out of methods in the DREAM4 100-gene in-silico network inference challenge. We demonstrate complementarity between this method and the two methods that take advantage of time-series data by combining the three into a pipeline whose ability to rank regulatory interactions is markedly improved compared to either method alone. Moreover, the pipeline is able to accurately predict the response of the system to new conditions (in this case new double knock-out genetic perturbations). Our evaluation of the performance of multiple methods for network inference suggests avenues for future methods development and provides simple considerations for genomic experimental design. Our code is publicly available at http://err.bio.nyu.edu/inferelator/.

Show MeSH
Related in: MedlinePlus