Limits...
Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations.

Zhu J, Wiener MC, Zhang C, Fridman A, Minch E, Lum PY, Sachs JR, Schadt EE - PLoS Comput. Biol. (2007)

Bottom Line: Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks.With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design.We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

View Article: PubMed Central - PubMed

Affiliation: Rosetta Inpharmatics, Seattle, Washington, United States of America.

ABSTRACT
To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

Show MeSH

Related in: MedlinePlus

Reconstruction Accuracy Based on 100-Sample Datasets Generated Using Parameters Similar to BXD DataAll accuracies are based on directed graphs unless indicated otherwise.(A) Accuracy of reconstructions with and without genetic information used as prior information.(B) Accuracy of reconstructions for the top-layer subnetwork, as defined in the text.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1851982&req=5

pcbi-0030069-g002: Reconstruction Accuracy Based on 100-Sample Datasets Generated Using Parameters Similar to BXD DataAll accuracies are based on directed graphs unless indicated otherwise.(A) Accuracy of reconstructions with and without genetic information used as prior information.(B) Accuracy of reconstructions for the top-layer subnetwork, as defined in the text.

Mentions: The average genetic heritability for the head nodes with cis-acting QTLs in the BXD data was 0.5. Correlation between gene expression levels for interacting genes tended to be high in this dataset (Figure S3A). The correlation distribution from this dataset was used to simulate a set of data comprising 100 samples. QTLs were then mapped for each node using a standard interval mapping method [11]. The distribution of QTL peaks is shown in Figure S3B. The QTL peaks for head nodes are evenly distributed along the chromosomes. The QTL peaks for all nodes are clustered into several hot spots, as was observed in the BXD data [9]. The ROC (receive operating curve)-like plots shown in Figure 2 demonstrate that a Bayesian network reconstructed with genetic information is more accurate than one constructed without genetic information. Each curve represents results from varying the consensus threshold; that is, the threshold for the number of individual MCMC networks in which an edge must be present to be included in the final reconstruction. The improvement in accuracy is relatively small for the full network (Figure 2A), but quite pronounced for the top layer of the network, where the top layer of the network is defined as the head nodes and their children (Figure 2B). For example, the network reconstructed with the genetics data achieved nearly 80% precision when recall was 50%, compared with 35% for the network reconstructed without genetic data.


Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations.

Zhu J, Wiener MC, Zhang C, Fridman A, Minch E, Lum PY, Sachs JR, Schadt EE - PLoS Comput. Biol. (2007)

Reconstruction Accuracy Based on 100-Sample Datasets Generated Using Parameters Similar to BXD DataAll accuracies are based on directed graphs unless indicated otherwise.(A) Accuracy of reconstructions with and without genetic information used as prior information.(B) Accuracy of reconstructions for the top-layer subnetwork, as defined in the text.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1851982&req=5

pcbi-0030069-g002: Reconstruction Accuracy Based on 100-Sample Datasets Generated Using Parameters Similar to BXD DataAll accuracies are based on directed graphs unless indicated otherwise.(A) Accuracy of reconstructions with and without genetic information used as prior information.(B) Accuracy of reconstructions for the top-layer subnetwork, as defined in the text.
Mentions: The average genetic heritability for the head nodes with cis-acting QTLs in the BXD data was 0.5. Correlation between gene expression levels for interacting genes tended to be high in this dataset (Figure S3A). The correlation distribution from this dataset was used to simulate a set of data comprising 100 samples. QTLs were then mapped for each node using a standard interval mapping method [11]. The distribution of QTL peaks is shown in Figure S3B. The QTL peaks for head nodes are evenly distributed along the chromosomes. The QTL peaks for all nodes are clustered into several hot spots, as was observed in the BXD data [9]. The ROC (receive operating curve)-like plots shown in Figure 2 demonstrate that a Bayesian network reconstructed with genetic information is more accurate than one constructed without genetic information. Each curve represents results from varying the consensus threshold; that is, the threshold for the number of individual MCMC networks in which an edge must be present to be included in the final reconstruction. The improvement in accuracy is relatively small for the full network (Figure 2A), but quite pronounced for the top layer of the network, where the top layer of the network is defined as the head nodes and their children (Figure 2B). For example, the network reconstructed with the genetics data achieved nearly 80% precision when recall was 50%, compared with 35% for the network reconstructed without genetic data.

Bottom Line: Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks.With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design.We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

View Article: PubMed Central - PubMed

Affiliation: Rosetta Inpharmatics, Seattle, Washington, United States of America.

ABSTRACT
To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

Show MeSH
Related in: MedlinePlus