Limits...
Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations.

Zhu J, Wiener MC, Zhang C, Fridman A, Minch E, Lum PY, Sachs JR, Schadt EE - PLoS Comput. Biol. (2007)

Bottom Line: Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks.With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design.We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

View Article: PubMed Central - PubMed

Affiliation: Rosetta Inpharmatics, Seattle, Washington, United States of America.

ABSTRACT
To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

Show MeSH

Related in: MedlinePlus

Reconstruction accuracy with the genetic (dotted and solid lines) and without the genetic (dashed lines) information, using varying numbers of samples, and based on an overall genetic signal similar to that found in the BXD network, but with weaker interactions (see text for details)(A) Reconstruction accuracy for the entire network.(B) Reconstruction accuracy for the subnetwork comprising only the top layer of the network. The dotted lines reflect reconstructions that utilized cis QTL information as the only source of genetic information, whereas the solid lines reflect reconstructions that utilized all available genetic information.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1851982&req=5

pcbi-0030069-g004: Reconstruction accuracy with the genetic (dotted and solid lines) and without the genetic (dashed lines) information, using varying numbers of samples, and based on an overall genetic signal similar to that found in the BXD network, but with weaker interactions (see text for details)(A) Reconstruction accuracy for the entire network.(B) Reconstruction accuracy for the subnetwork comprising only the top layer of the network. The dotted lines reflect reconstructions that utilized cis QTL information as the only source of genetic information, whereas the solid lines reflect reconstructions that utilized all available genetic information.

Mentions: If all interactions between genes were strong, it would be relatively easy to distinguish the direct interactions from all others. Most correlations in the BXD network were strong (Figure S3), in part because the BXD network itself was reconstructed from the observed data and is therefore biased because it does not contain the weaker interactions that went undetected (due to lack of power given the modest sample size). To examine reconstruction accuracy in the presence of weaker interactions, we simulated a dataset using the same heritability (0.5) as in the BXD data, but with weaker correlations between nodes. The correlation coefficients for the gene–gene interactions were assumed to follow a normal distribution with mean 0.33 and standard deviation 0.11. Ten percent of the interactions were assumed to be nonlinear (i.e., they included a quadratic term); the correlations for the nonlinear interactions were weaker, on average, than those for linear interactions (although correlation is not an entirely appropriate measure for the nonlinear interactions). In the network as a whole, the improvement achieved by incorporating genetic information is small (Figure 4A). However, if we look only at the top layer of the network, the improvement is again much larger (Figure 4B). This is consistent with the results obtained above for the synthetic network. We have found that information on cis-acting eQTLs (excluding edges into certain nodes) and information on trans-acting eQTLs (increasing the likelihood of some edges over others) both improve the quality of reconstruction (Figure 4).


Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations.

Zhu J, Wiener MC, Zhang C, Fridman A, Minch E, Lum PY, Sachs JR, Schadt EE - PLoS Comput. Biol. (2007)

Reconstruction accuracy with the genetic (dotted and solid lines) and without the genetic (dashed lines) information, using varying numbers of samples, and based on an overall genetic signal similar to that found in the BXD network, but with weaker interactions (see text for details)(A) Reconstruction accuracy for the entire network.(B) Reconstruction accuracy for the subnetwork comprising only the top layer of the network. The dotted lines reflect reconstructions that utilized cis QTL information as the only source of genetic information, whereas the solid lines reflect reconstructions that utilized all available genetic information.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1851982&req=5

pcbi-0030069-g004: Reconstruction accuracy with the genetic (dotted and solid lines) and without the genetic (dashed lines) information, using varying numbers of samples, and based on an overall genetic signal similar to that found in the BXD network, but with weaker interactions (see text for details)(A) Reconstruction accuracy for the entire network.(B) Reconstruction accuracy for the subnetwork comprising only the top layer of the network. The dotted lines reflect reconstructions that utilized cis QTL information as the only source of genetic information, whereas the solid lines reflect reconstructions that utilized all available genetic information.
Mentions: If all interactions between genes were strong, it would be relatively easy to distinguish the direct interactions from all others. Most correlations in the BXD network were strong (Figure S3), in part because the BXD network itself was reconstructed from the observed data and is therefore biased because it does not contain the weaker interactions that went undetected (due to lack of power given the modest sample size). To examine reconstruction accuracy in the presence of weaker interactions, we simulated a dataset using the same heritability (0.5) as in the BXD data, but with weaker correlations between nodes. The correlation coefficients for the gene–gene interactions were assumed to follow a normal distribution with mean 0.33 and standard deviation 0.11. Ten percent of the interactions were assumed to be nonlinear (i.e., they included a quadratic term); the correlations for the nonlinear interactions were weaker, on average, than those for linear interactions (although correlation is not an entirely appropriate measure for the nonlinear interactions). In the network as a whole, the improvement achieved by incorporating genetic information is small (Figure 4A). However, if we look only at the top layer of the network, the improvement is again much larger (Figure 4B). This is consistent with the results obtained above for the synthetic network. We have found that information on cis-acting eQTLs (excluding edges into certain nodes) and information on trans-acting eQTLs (increasing the likelihood of some edges over others) both improve the quality of reconstruction (Figure 4).

Bottom Line: Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks.With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design.We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

View Article: PubMed Central - PubMed

Affiliation: Rosetta Inpharmatics, Seattle, Washington, United States of America.

ABSTRACT
To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

Show MeSH
Related in: MedlinePlus