Limits...
Seeded Bayesian Networks: constructing genetic networks from microarray data.

Djebbari A, Quackenbush J - BMC Syst Biol (2008)

Bottom Line: Although many techniques have been developed in an attempt to address these issues, to date their ability to extract meaningful and predictive network relationships has been limited.Here we describe a method that draws on prior information about gene-gene interactions to infer biologically relevant pathways from microarray data.Software implementing these methods has been included in the widely used TM4 microarray analysis package.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA. amirad@gmail.com

ABSTRACT

Background: DNA microarrays and other genomics-inspired technologies provide large datasets that often include hidden patterns of correlation between genes reflecting the complex processes that underlie cellular metabolism and physiology. The challenge in analyzing large-scale expression data has been to extract biologically meaningful inferences regarding these processes - often represented as networks - in an environment where the datasets are often imperfect and biological noise can obscure the actual signal. Although many techniques have been developed in an attempt to address these issues, to date their ability to extract meaningful and predictive network relationships has been limited. Here we describe a method that draws on prior information about gene-gene interactions to infer biologically relevant pathways from microarray data. Our approach consists of using preliminary networks derived from the literature and/or protein-protein interaction data as seeds for a Bayesian network analysis of microarray results.

Results: Through a bootstrap analysis of gene expression data derived from a number of leukemia studies, we demonstrate that seeded Bayesian Networks have the ability to identify high-confidence gene-gene interactions which can then be validated by comparison to other sources of pathway data.

Conclusion: The use of network seeds greatly improves the ability of Bayesian Network analysis to learn gene interaction networks from gene expression data. We demonstrate that the use of seeds derived from the biomedical literature or high-throughput protein-protein interaction data, or the combination, provides improvement over a standard Bayesian Network analysis, allowing networks involving dynamic processes to be deduced from the static snapshots of biological systems that represent the most common source of microarray data. Software implementing these methods has been included in the widely used TM4 microarray analysis package.

Show MeSH

Related in: MedlinePlus

ROC curve for Markov relations for networks deduced from the Ross et al.[18,19]data either with or without network seeds (literature plus PPI), based on 100 bootstrap iterations. The learned networks were compared to corresponding subgraphs of KEGG cell cycle pathway (KEGG ID: hsa04110) and indicate much better overall performance for networks derived using network seeds.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2474592&req=5

Figure 4: ROC curve for Markov relations for networks deduced from the Ross et al.[18,19]data either with or without network seeds (literature plus PPI), based on 100 bootstrap iterations. The learned networks were compared to corresponding subgraphs of KEGG cell cycle pathway (KEGG ID: hsa04110) and indicate much better overall performance for networks derived using network seeds.

Mentions: A Receiver-Operator Characteristic (ROC) curve, which compares sensitivity and specificity directly (TP rate vs. FP rate [35]), suggests that the identification of Markov relations using Bayesian networks is conservative as they are found with strong evidence only at low true positive rates. Figure 4 shows ROC curves for Markov relation detection using either microarray data alone (blue) or with seeds derived from combined literature and PPI priors (red); for both, bootstrap confidence decreases as sensitivity increases. As can be clearly seen, the use of prior network seeds greatly improves our ability to detect known interactions, particularly when considering those with strong bootstrap support.


Seeded Bayesian Networks: constructing genetic networks from microarray data.

Djebbari A, Quackenbush J - BMC Syst Biol (2008)

ROC curve for Markov relations for networks deduced from the Ross et al.[18,19]data either with or without network seeds (literature plus PPI), based on 100 bootstrap iterations. The learned networks were compared to corresponding subgraphs of KEGG cell cycle pathway (KEGG ID: hsa04110) and indicate much better overall performance for networks derived using network seeds.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2474592&req=5

Figure 4: ROC curve for Markov relations for networks deduced from the Ross et al.[18,19]data either with or without network seeds (literature plus PPI), based on 100 bootstrap iterations. The learned networks were compared to corresponding subgraphs of KEGG cell cycle pathway (KEGG ID: hsa04110) and indicate much better overall performance for networks derived using network seeds.
Mentions: A Receiver-Operator Characteristic (ROC) curve, which compares sensitivity and specificity directly (TP rate vs. FP rate [35]), suggests that the identification of Markov relations using Bayesian networks is conservative as they are found with strong evidence only at low true positive rates. Figure 4 shows ROC curves for Markov relation detection using either microarray data alone (blue) or with seeds derived from combined literature and PPI priors (red); for both, bootstrap confidence decreases as sensitivity increases. As can be clearly seen, the use of prior network seeds greatly improves our ability to detect known interactions, particularly when considering those with strong bootstrap support.

Bottom Line: Although many techniques have been developed in an attempt to address these issues, to date their ability to extract meaningful and predictive network relationships has been limited.Here we describe a method that draws on prior information about gene-gene interactions to infer biologically relevant pathways from microarray data.Software implementing these methods has been included in the widely used TM4 microarray analysis package.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA. amirad@gmail.com

ABSTRACT

Background: DNA microarrays and other genomics-inspired technologies provide large datasets that often include hidden patterns of correlation between genes reflecting the complex processes that underlie cellular metabolism and physiology. The challenge in analyzing large-scale expression data has been to extract biologically meaningful inferences regarding these processes - often represented as networks - in an environment where the datasets are often imperfect and biological noise can obscure the actual signal. Although many techniques have been developed in an attempt to address these issues, to date their ability to extract meaningful and predictive network relationships has been limited. Here we describe a method that draws on prior information about gene-gene interactions to infer biologically relevant pathways from microarray data. Our approach consists of using preliminary networks derived from the literature and/or protein-protein interaction data as seeds for a Bayesian network analysis of microarray results.

Results: Through a bootstrap analysis of gene expression data derived from a number of leukemia studies, we demonstrate that seeded Bayesian Networks have the ability to identify high-confidence gene-gene interactions which can then be validated by comparison to other sources of pathway data.

Conclusion: The use of network seeds greatly improves the ability of Bayesian Network analysis to learn gene interaction networks from gene expression data. We demonstrate that the use of seeds derived from the biomedical literature or high-throughput protein-protein interaction data, or the combination, provides improvement over a standard Bayesian Network analysis, allowing networks involving dynamic processes to be deduced from the static snapshots of biological systems that represent the most common source of microarray data. Software implementing these methods has been included in the widely used TM4 microarray analysis package.

Show MeSH
Related in: MedlinePlus