Limits...
Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations.

Zhu J, Wiener MC, Zhang C, Fridman A, Minch E, Lum PY, Sachs JR, Schadt EE - PLoS Comput. Biol. (2007)

Bottom Line: Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks.With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design.We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

View Article: PubMed Central - PubMed

Affiliation: Rosetta Inpharmatics, Seattle, Washington, United States of America.

ABSTRACT
To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

Show MeSH

Related in: MedlinePlus

The Data Simulation Scheme with Genetic and Network Constraints(A) A segregating population (an F2 intercross in this case) is simulated using the QTL Cartographer software suite (Rqtl, Rcross, and Zmapqtl). The QTL model for a trait is defined using the Rqtl program, and the heritability of the QTL is defined using the Rcross program.(B) The traits simulated by Rcross are used as the head nodes in the simulated network. The remaining traits are simulated based on the values of the head nodes according to the DAG structure and the set of conditional probability density functions associated with this structure.(C) After traits for all nodes in the network are simulated, they are scanned for QTLs using the Zmapqtl program. The traits and the associated QTL are then input into the network reconstruction program.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1851982&req=5

pcbi-0030069-g001: The Data Simulation Scheme with Genetic and Network Constraints(A) A segregating population (an F2 intercross in this case) is simulated using the QTL Cartographer software suite (Rqtl, Rcross, and Zmapqtl). The QTL model for a trait is defined using the Rqtl program, and the heritability of the QTL is defined using the Rcross program.(B) The traits simulated by Rcross are used as the head nodes in the simulated network. The remaining traits are simulated based on the values of the head nodes according to the DAG structure and the set of conditional probability density functions associated with this structure.(C) After traits for all nodes in the network are simulated, they are scanned for QTLs using the Zmapqtl program. The traits and the associated QTL are then input into the network reconstruction program.

Mentions: Data were simulated following the scheme shown in Figure 1 (see Methods for details), using the Bayesian network structure derived from the BXD cross [5] as the true network, referred to here as the BXD network. The BXD network comprises 2,169 nodes (genes) and 1,676 directed connections, with 639 genes represented as singleton nodes (nodes with no connections to other nodes in the network). The general features of the BXD network, such as in-degree, out-degree, and connectivity distributions, are shown in Figure S1. The in-degree (out-degree) of a node is equal to the number of inward-directed (outward-directed) edges connected to the node, while the connectivity of a node (its degree) is equal to the number of edges connecting to the node. We also created a simple structure, referred to as the synthetic network, to allow comparison with previous results on network reconstruction accuracy obtained using networks with a small number of nodes. The synthetic network is an agglomeration of isolated three-node substructures (Figure S2). The synthetic structure has 2,160 nodes and 1,440 interactions, similar to the BXD network. More comprehensive examination of the effect of network structure on reconstruction accuracy and the use of genetic data is outside the scope of this study and will be explored in future work.


Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations.

Zhu J, Wiener MC, Zhang C, Fridman A, Minch E, Lum PY, Sachs JR, Schadt EE - PLoS Comput. Biol. (2007)

The Data Simulation Scheme with Genetic and Network Constraints(A) A segregating population (an F2 intercross in this case) is simulated using the QTL Cartographer software suite (Rqtl, Rcross, and Zmapqtl). The QTL model for a trait is defined using the Rqtl program, and the heritability of the QTL is defined using the Rcross program.(B) The traits simulated by Rcross are used as the head nodes in the simulated network. The remaining traits are simulated based on the values of the head nodes according to the DAG structure and the set of conditional probability density functions associated with this structure.(C) After traits for all nodes in the network are simulated, they are scanned for QTLs using the Zmapqtl program. The traits and the associated QTL are then input into the network reconstruction program.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1851982&req=5

pcbi-0030069-g001: The Data Simulation Scheme with Genetic and Network Constraints(A) A segregating population (an F2 intercross in this case) is simulated using the QTL Cartographer software suite (Rqtl, Rcross, and Zmapqtl). The QTL model for a trait is defined using the Rqtl program, and the heritability of the QTL is defined using the Rcross program.(B) The traits simulated by Rcross are used as the head nodes in the simulated network. The remaining traits are simulated based on the values of the head nodes according to the DAG structure and the set of conditional probability density functions associated with this structure.(C) After traits for all nodes in the network are simulated, they are scanned for QTLs using the Zmapqtl program. The traits and the associated QTL are then input into the network reconstruction program.
Mentions: Data were simulated following the scheme shown in Figure 1 (see Methods for details), using the Bayesian network structure derived from the BXD cross [5] as the true network, referred to here as the BXD network. The BXD network comprises 2,169 nodes (genes) and 1,676 directed connections, with 639 genes represented as singleton nodes (nodes with no connections to other nodes in the network). The general features of the BXD network, such as in-degree, out-degree, and connectivity distributions, are shown in Figure S1. The in-degree (out-degree) of a node is equal to the number of inward-directed (outward-directed) edges connected to the node, while the connectivity of a node (its degree) is equal to the number of edges connecting to the node. We also created a simple structure, referred to as the synthetic network, to allow comparison with previous results on network reconstruction accuracy obtained using networks with a small number of nodes. The synthetic network is an agglomeration of isolated three-node substructures (Figure S2). The synthetic structure has 2,160 nodes and 1,440 interactions, similar to the BXD network. More comprehensive examination of the effect of network structure on reconstruction accuracy and the use of genetic data is outside the scope of this study and will be explored in future work.

Bottom Line: Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks.With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design.We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

View Article: PubMed Central - PubMed

Affiliation: Rosetta Inpharmatics, Seattle, Washington, United States of America.

ABSTRACT
To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

Show MeSH
Related in: MedlinePlus