Limits...
A reproducible approach to high-throughput biological data acquisition and integration.

Börnigen D, Moon YS, Rahnavard G, Waldron L, McIver L, Shafquat A, Franzosa EA, Miropolsky L, Sweeney C, Morgan XC, Garrett WS, Huttenhower C - PeerJ (2015)

Bottom Line: Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone.To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data.Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

View Article: PubMed Central - HTML - PubMed

Affiliation: Biostatistics Department, Harvard School of Public Health , Boston, MA , USA ; The Broad Institute of MIT and Harvard , Cambridge, MA , USA.

ABSTRACT
Modern biological research requires rapid, complex, and reproducible integration of multiple experimental results generated both internally and externally (e.g., from public repositories). Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone. To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data. This allowed, first, novel biomolecular network reconstruction in human prostate cancer, which correctly recovered and extended the NFκB signaling pathway. Next, we investigated host-microbiome interactions. In less than an hour of analysis time, the system retrieved data and integrated six germ-free murine intestinal gene expression datasets to identify the genes most influenced by the gut microbiota, which comprised a set of immune-response and carbohydrate metabolism processes. Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

No MeSH data available.


Related in: MedlinePlus

Differential expression meta-analysis of germ-free versus conventional mice.ARepA metadata allowed the identification of six murine gene expression datasets with intestinal tissue from paired germ-free and conventional mice (Table S1). The automatically generated R expression sets were meta-analyzed using R/limma (Smith, 2005) and R/metafor (Viechtbauer, 2010) through a random-effects model, revealing the Ppar-α signaling pathway as one of several differentially regulated gene sets. In (A) the fold changes are presented for all significantly differentially expressed genes from this pathway in individual datasets, and (B/C) show the corresponding forest plots for the Ppar-α and Rxr-α genes, which are consistently upregulated in these datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493686&req=5

fig-3: Differential expression meta-analysis of germ-free versus conventional mice.ARepA metadata allowed the identification of six murine gene expression datasets with intestinal tissue from paired germ-free and conventional mice (Table S1). The automatically generated R expression sets were meta-analyzed using R/limma (Smith, 2005) and R/metafor (Viechtbauer, 2010) through a random-effects model, revealing the Ppar-α signaling pathway as one of several differentially regulated gene sets. In (A) the fold changes are presented for all significantly differentially expressed genes from this pathway in individual datasets, and (B/C) show the corresponding forest plots for the Ppar-α and Rxr-α genes, which are consistently upregulated in these datasets.

Mentions: We conducted a differential expression meta-analysis of genes and pathways up- or down-regulated in the germ-free (without microbes) murine gut (Fig. S1 and Text S1). We used six case-control datasets containing intestinal tissue and germ-free versus wild type comparisons, identified through ARepA’s metadata screen by matching “germ-free,” “wild type,” and “intestinal tract” in mouse gene expression profiles with at least four samples each (Table S2). This again required <1 h total running time (on a 2.9 GHz Intel Core i7 16 GB machine). To integrate the resulting ARepA R data files, we first computed log fold changes and confidence intervals (2.5% and 97.5%) for all genes between germ-free gut and wild type gut within each dataset using the R/limma package (Smith, 2005), resulting in ∼3,600 differentially expressed genes. Next, we performed a meta-analysis using the R/metafor package (Viechtbauer, 2010) on the six datasets by applying a random-effects model on the fold changes with default options, fitting the model with the restricted maximum-likelihood estimator (REML). The false discovery rate was controlled by the Benjamini–Hochberg method (Benjamini & Hochberg, 1995). We finally tested all genes and their resulting meta-p-values for gene set enrichment (Subramanian et al., 2005) in KEGG and BioCarta pathways. This resulted in two pathways that were significantly enriched for upregulated genes under germ-free conditions (using 1,000 permutations, Fig. 3), while 15 pathways were enriched for downregulated genes (Table S3).


A reproducible approach to high-throughput biological data acquisition and integration.

Börnigen D, Moon YS, Rahnavard G, Waldron L, McIver L, Shafquat A, Franzosa EA, Miropolsky L, Sweeney C, Morgan XC, Garrett WS, Huttenhower C - PeerJ (2015)

Differential expression meta-analysis of germ-free versus conventional mice.ARepA metadata allowed the identification of six murine gene expression datasets with intestinal tissue from paired germ-free and conventional mice (Table S1). The automatically generated R expression sets were meta-analyzed using R/limma (Smith, 2005) and R/metafor (Viechtbauer, 2010) through a random-effects model, revealing the Ppar-α signaling pathway as one of several differentially regulated gene sets. In (A) the fold changes are presented for all significantly differentially expressed genes from this pathway in individual datasets, and (B/C) show the corresponding forest plots for the Ppar-α and Rxr-α genes, which are consistently upregulated in these datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493686&req=5

fig-3: Differential expression meta-analysis of germ-free versus conventional mice.ARepA metadata allowed the identification of six murine gene expression datasets with intestinal tissue from paired germ-free and conventional mice (Table S1). The automatically generated R expression sets were meta-analyzed using R/limma (Smith, 2005) and R/metafor (Viechtbauer, 2010) through a random-effects model, revealing the Ppar-α signaling pathway as one of several differentially regulated gene sets. In (A) the fold changes are presented for all significantly differentially expressed genes from this pathway in individual datasets, and (B/C) show the corresponding forest plots for the Ppar-α and Rxr-α genes, which are consistently upregulated in these datasets.
Mentions: We conducted a differential expression meta-analysis of genes and pathways up- or down-regulated in the germ-free (without microbes) murine gut (Fig. S1 and Text S1). We used six case-control datasets containing intestinal tissue and germ-free versus wild type comparisons, identified through ARepA’s metadata screen by matching “germ-free,” “wild type,” and “intestinal tract” in mouse gene expression profiles with at least four samples each (Table S2). This again required <1 h total running time (on a 2.9 GHz Intel Core i7 16 GB machine). To integrate the resulting ARepA R data files, we first computed log fold changes and confidence intervals (2.5% and 97.5%) for all genes between germ-free gut and wild type gut within each dataset using the R/limma package (Smith, 2005), resulting in ∼3,600 differentially expressed genes. Next, we performed a meta-analysis using the R/metafor package (Viechtbauer, 2010) on the six datasets by applying a random-effects model on the fold changes with default options, fitting the model with the restricted maximum-likelihood estimator (REML). The false discovery rate was controlled by the Benjamini–Hochberg method (Benjamini & Hochberg, 1995). We finally tested all genes and their resulting meta-p-values for gene set enrichment (Subramanian et al., 2005) in KEGG and BioCarta pathways. This resulted in two pathways that were significantly enriched for upregulated genes under germ-free conditions (using 1,000 permutations, Fig. 3), while 15 pathways were enriched for downregulated genes (Table S3).

Bottom Line: Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone.To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data.Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

View Article: PubMed Central - HTML - PubMed

Affiliation: Biostatistics Department, Harvard School of Public Health , Boston, MA , USA ; The Broad Institute of MIT and Harvard , Cambridge, MA , USA.

ABSTRACT
Modern biological research requires rapid, complex, and reproducible integration of multiple experimental results generated both internally and externally (e.g., from public repositories). Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone. To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data. This allowed, first, novel biomolecular network reconstruction in human prostate cancer, which correctly recovered and extended the NFκB signaling pathway. Next, we investigated host-microbiome interactions. In less than an hour of analysis time, the system retrieved data and integrated six germ-free murine intestinal gene expression datasets to identify the genes most influenced by the gut microbiota, which comprised a set of immune-response and carbohydrate metabolism processes. Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

No MeSH data available.


Related in: MedlinePlus