Limits...
A reproducible approach to high-throughput biological data acquisition and integration.

Börnigen D, Moon YS, Rahnavard G, Waldron L, McIver L, Shafquat A, Franzosa EA, Miropolsky L, Sweeney C, Morgan XC, Garrett WS, Huttenhower C - PeerJ (2015)

Bottom Line: Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone.To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data.Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

View Article: PubMed Central - HTML - PubMed

Affiliation: Biostatistics Department, Harvard School of Public Health , Boston, MA , USA ; The Broad Institute of MIT and Harvard , Cambridge, MA , USA.

ABSTRACT
Modern biological research requires rapid, complex, and reproducible integration of multiple experimental results generated both internally and externally (e.g., from public repositories). Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone. To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data. This allowed, first, novel biomolecular network reconstruction in human prostate cancer, which correctly recovered and extended the NFκB signaling pathway. Next, we investigated host-microbiome interactions. In less than an hour of analysis time, the system retrieved data and integrated six germ-free murine intestinal gene expression datasets to identify the genes most influenced by the gut microbiota, which comprised a set of immune-response and carbohydrate metabolism processes. Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

No MeSH data available.


Related in: MedlinePlus

MEN1 and ACBD6 associated with the NFκB signaling pathway in human prostate cancer.High confidence subgraph extracted from a functional network integrating ten prostate cancer specific gene expression data sets from GEO (Table S1). This subnetwork was generated using a seed gene set of ten genes from the NFκB signaling pathway in BioCarta (blue circles). Nine genes (black circles) were immediately recovered that are also known to be involved in NfκB signaling. Additional genes represent candidates implicated in NFκB involvement during prostate cancer, in particular MEN1 and ACBD6.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493686&req=5

fig-2: MEN1 and ACBD6 associated with the NFκB signaling pathway in human prostate cancer.High confidence subgraph extracted from a functional network integrating ten prostate cancer specific gene expression data sets from GEO (Table S1). This subnetwork was generated using a seed gene set of ten genes from the NFκB signaling pathway in BioCarta (blue circles). Nine genes (black circles) were immediately recovered that are also known to be involved in NfκB signaling. Additional genes represent candidates implicated in NFκB involvement during prostate cancer, in particular MEN1 and ACBD6.

Mentions: We screened ARepA’s GEO metadata database for human prostate cancer and prostate tissue conditions matching “prostate cancer” in human gene expression profile studies with at least 6 samples each, identifying ten relevant datasets from six different platforms (Affymetrix, Agilent, and CNIO, Table S1). Identifying, processing, and standardizing these datasets in ARepA, accompanied by computing co-expression networks, required <1 h running time in total (on a 2.9 GHz Intel Core i7 16 GB machine). These data were then meta-analyzed to a single prostate cancer specific functional network using unsupervised data integration averaging across normalized co-expression values (z-scores) (Huttenhower et al., 2009) (Text S1). Next, we used this integrated network to predict genes highly functionally related to NFκB gene family (NFκB1, NFκB2, RELA, RELB, REL) in prostate cancer by extracting a high-confidence subgraph based on the probabilistic graph search algorithm as described in Huttenhower et al. (2009); Myers et al. (2005). This starts from a user-defined set of query genes and identifies k additional neighbours in the network that are connected with high confidence to the original query genes (Huttenhower et al., 2008). Here, we defined a query gene set of twelve genes from the NFκB signaling pathway in BioCarta (Table S4) and a neighbourhood size of k = 10, resulting in a high-confidence NFκB signaling network containing 22 genes in total as illustrated in Fig. 2.


A reproducible approach to high-throughput biological data acquisition and integration.

Börnigen D, Moon YS, Rahnavard G, Waldron L, McIver L, Shafquat A, Franzosa EA, Miropolsky L, Sweeney C, Morgan XC, Garrett WS, Huttenhower C - PeerJ (2015)

MEN1 and ACBD6 associated with the NFκB signaling pathway in human prostate cancer.High confidence subgraph extracted from a functional network integrating ten prostate cancer specific gene expression data sets from GEO (Table S1). This subnetwork was generated using a seed gene set of ten genes from the NFκB signaling pathway in BioCarta (blue circles). Nine genes (black circles) were immediately recovered that are also known to be involved in NfκB signaling. Additional genes represent candidates implicated in NFκB involvement during prostate cancer, in particular MEN1 and ACBD6.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493686&req=5

fig-2: MEN1 and ACBD6 associated with the NFκB signaling pathway in human prostate cancer.High confidence subgraph extracted from a functional network integrating ten prostate cancer specific gene expression data sets from GEO (Table S1). This subnetwork was generated using a seed gene set of ten genes from the NFκB signaling pathway in BioCarta (blue circles). Nine genes (black circles) were immediately recovered that are also known to be involved in NfκB signaling. Additional genes represent candidates implicated in NFκB involvement during prostate cancer, in particular MEN1 and ACBD6.
Mentions: We screened ARepA’s GEO metadata database for human prostate cancer and prostate tissue conditions matching “prostate cancer” in human gene expression profile studies with at least 6 samples each, identifying ten relevant datasets from six different platforms (Affymetrix, Agilent, and CNIO, Table S1). Identifying, processing, and standardizing these datasets in ARepA, accompanied by computing co-expression networks, required <1 h running time in total (on a 2.9 GHz Intel Core i7 16 GB machine). These data were then meta-analyzed to a single prostate cancer specific functional network using unsupervised data integration averaging across normalized co-expression values (z-scores) (Huttenhower et al., 2009) (Text S1). Next, we used this integrated network to predict genes highly functionally related to NFκB gene family (NFκB1, NFκB2, RELA, RELB, REL) in prostate cancer by extracting a high-confidence subgraph based on the probabilistic graph search algorithm as described in Huttenhower et al. (2009); Myers et al. (2005). This starts from a user-defined set of query genes and identifies k additional neighbours in the network that are connected with high confidence to the original query genes (Huttenhower et al., 2008). Here, we defined a query gene set of twelve genes from the NFκB signaling pathway in BioCarta (Table S4) and a neighbourhood size of k = 10, resulting in a high-confidence NFκB signaling network containing 22 genes in total as illustrated in Fig. 2.

Bottom Line: Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone.To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data.Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

View Article: PubMed Central - HTML - PubMed

Affiliation: Biostatistics Department, Harvard School of Public Health , Boston, MA , USA ; The Broad Institute of MIT and Harvard , Cambridge, MA , USA.

ABSTRACT
Modern biological research requires rapid, complex, and reproducible integration of multiple experimental results generated both internally and externally (e.g., from public repositories). Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone. To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data. This allowed, first, novel biomolecular network reconstruction in human prostate cancer, which correctly recovered and extended the NFκB signaling pathway. Next, we investigated host-microbiome interactions. In less than an hour of analysis time, the system retrieved data and integrated six germ-free murine intestinal gene expression datasets to identify the genes most influenced by the gut microbiota, which comprised a set of immune-response and carbohydrate metabolism processes. Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

No MeSH data available.


Related in: MedlinePlus