Limits...
How difficult is inference of mammalian causal gene regulatory networks?

Djordjevic D, Yang A, Zadoorian A, Rungrugeecharoen K, Ho JW - PLoS ONE (2014)

Bottom Line: Our data have thorough annotation of tissue types and embryonic stages, as well as the type of regulation (activation, inhibition and no effect), which uniquely allows us to estimate both sensitivity and specificity of the inference of tissue specific causal GRN edges.Using these unprecedented datasets, we found that gene co-expression does not reliably distinguish true positive from false positive interactions, making inference of GRN in mammalian development very difficult.Our result supports the importance of using perturbation experimental data in causal network reconstruction.

View Article: PubMed Central - PubMed

Affiliation: Victor Chang Cardiac Research Institute, Sydney, New South Wales, Australia; The University of New South Wales, Sydney, New South Wales, Australia.

ABSTRACT
Gene regulatory networks (GRNs) play a central role in systems biology, especially in the study of mammalian organ development. One key question remains largely unanswered: Is it possible to infer mammalian causal GRNs using observable gene co-expression patterns alone? We assembled two mouse GRN datasets (embryonic tooth and heart) and matching microarray gene expression profiles to systematically investigate the difficulties of mammalian causal GRN inference. The GRNs were assembled based on > 2,000 pieces of experimental genetic perturbation evidence from manually reading > 150 primary research articles. Each piece of perturbation evidence records the qualitative change of the expression of one gene following knock-down or over-expression of another gene. Our data have thorough annotation of tissue types and embryonic stages, as well as the type of regulation (activation, inhibition and no effect), which uniquely allows us to estimate both sensitivity and specificity of the inference of tissue specific causal GRN edges. Using these unprecedented datasets, we found that gene co-expression does not reliably distinguish true positive from false positive interactions, making inference of GRN in mammalian development very difficult. Nonetheless, if we have expression profiling data from genetic or molecular perturbation experiments, such as gene knock-out or signalling stimulation, it is possible to use the set of differentially expressed genes to recover causal regulatory relationships with good sensitivity and specificity. Our result supports the importance of using perturbation experimental data in causal network reconstruction. Furthermore, we showed that causal gene regulatory relationship can be highly cell type or developmental stage specific, suggesting the importance of employing expression profiles from homogeneous cell populations. This study provides essential datasets and empirical evidence to guide the development of new GRN inference methods for mammalian organ development.

Show MeSH

Related in: MedlinePlus

Comparison of the true positive and false positive rates as determined by different network inference approaches on the tooth dataset: Pearson correlation, Pathway Commons database, protein-protein interactions (PPI), the union of the previous three methods and direct effect on genetic perturbation (log fold change cut-off or 0.5 and 1).Note: the TP and FP rates for the first 4 methods were calculated based on the subset of 686 RTPs that were represented in the microarray, PPI and pathway data. The TP and FP rates for perturbation data were based on the subset of 39 RTPs with a regulator matching the pathway being perturbed.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4219746&req=5

pone-0111661-g006: Comparison of the true positive and false positive rates as determined by different network inference approaches on the tooth dataset: Pearson correlation, Pathway Commons database, protein-protein interactions (PPI), the union of the previous three methods and direct effect on genetic perturbation (log fold change cut-off or 0.5 and 1).Note: the TP and FP rates for the first 4 methods were calculated based on the subset of 686 RTPs that were represented in the microarray, PPI and pathway data. The TP and FP rates for perturbation data were based on the subset of 39 RTPs with a regulator matching the pathway being perturbed.

Mentions: Using co-expression (as determined by Pearson correlation), we could achieve a true positive rate of 25%, but with almost a 30% false positive rate. We found that only 3–6% of the edges in Pathway Commons pathways or protein-protein interaction networks overlap with activating or inhibiting RTPs, however in all cases a similar proportion of false positives was also retrieved (Figure 6, Figure S8). By explicitly taking into account the perturbation design (as in Figure 4), we can significantly increase the true positive rate while keeping the false positive rate low (Figure 6, Figure S8).


How difficult is inference of mammalian causal gene regulatory networks?

Djordjevic D, Yang A, Zadoorian A, Rungrugeecharoen K, Ho JW - PLoS ONE (2014)

Comparison of the true positive and false positive rates as determined by different network inference approaches on the tooth dataset: Pearson correlation, Pathway Commons database, protein-protein interactions (PPI), the union of the previous three methods and direct effect on genetic perturbation (log fold change cut-off or 0.5 and 1).Note: the TP and FP rates for the first 4 methods were calculated based on the subset of 686 RTPs that were represented in the microarray, PPI and pathway data. The TP and FP rates for perturbation data were based on the subset of 39 RTPs with a regulator matching the pathway being perturbed.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4219746&req=5

pone-0111661-g006: Comparison of the true positive and false positive rates as determined by different network inference approaches on the tooth dataset: Pearson correlation, Pathway Commons database, protein-protein interactions (PPI), the union of the previous three methods and direct effect on genetic perturbation (log fold change cut-off or 0.5 and 1).Note: the TP and FP rates for the first 4 methods were calculated based on the subset of 686 RTPs that were represented in the microarray, PPI and pathway data. The TP and FP rates for perturbation data were based on the subset of 39 RTPs with a regulator matching the pathway being perturbed.
Mentions: Using co-expression (as determined by Pearson correlation), we could achieve a true positive rate of 25%, but with almost a 30% false positive rate. We found that only 3–6% of the edges in Pathway Commons pathways or protein-protein interaction networks overlap with activating or inhibiting RTPs, however in all cases a similar proportion of false positives was also retrieved (Figure 6, Figure S8). By explicitly taking into account the perturbation design (as in Figure 4), we can significantly increase the true positive rate while keeping the false positive rate low (Figure 6, Figure S8).

Bottom Line: Our data have thorough annotation of tissue types and embryonic stages, as well as the type of regulation (activation, inhibition and no effect), which uniquely allows us to estimate both sensitivity and specificity of the inference of tissue specific causal GRN edges.Using these unprecedented datasets, we found that gene co-expression does not reliably distinguish true positive from false positive interactions, making inference of GRN in mammalian development very difficult.Our result supports the importance of using perturbation experimental data in causal network reconstruction.

View Article: PubMed Central - PubMed

Affiliation: Victor Chang Cardiac Research Institute, Sydney, New South Wales, Australia; The University of New South Wales, Sydney, New South Wales, Australia.

ABSTRACT
Gene regulatory networks (GRNs) play a central role in systems biology, especially in the study of mammalian organ development. One key question remains largely unanswered: Is it possible to infer mammalian causal GRNs using observable gene co-expression patterns alone? We assembled two mouse GRN datasets (embryonic tooth and heart) and matching microarray gene expression profiles to systematically investigate the difficulties of mammalian causal GRN inference. The GRNs were assembled based on > 2,000 pieces of experimental genetic perturbation evidence from manually reading > 150 primary research articles. Each piece of perturbation evidence records the qualitative change of the expression of one gene following knock-down or over-expression of another gene. Our data have thorough annotation of tissue types and embryonic stages, as well as the type of regulation (activation, inhibition and no effect), which uniquely allows us to estimate both sensitivity and specificity of the inference of tissue specific causal GRN edges. Using these unprecedented datasets, we found that gene co-expression does not reliably distinguish true positive from false positive interactions, making inference of GRN in mammalian development very difficult. Nonetheless, if we have expression profiling data from genetic or molecular perturbation experiments, such as gene knock-out or signalling stimulation, it is possible to use the set of differentially expressed genes to recover causal regulatory relationships with good sensitivity and specificity. Our result supports the importance of using perturbation experimental data in causal network reconstruction. Furthermore, we showed that causal gene regulatory relationship can be highly cell type or developmental stage specific, suggesting the importance of employing expression profiles from homogeneous cell populations. This study provides essential datasets and empirical evidence to guide the development of new GRN inference methods for mammalian organ development.

Show MeSH
Related in: MedlinePlus