Limits...
A reproducible approach to high-throughput biological data acquisition and integration.

Börnigen D, Moon YS, Rahnavard G, Waldron L, McIver L, Shafquat A, Franzosa EA, Miropolsky L, Sweeney C, Morgan XC, Garrett WS, Huttenhower C - PeerJ (2015)

Bottom Line: Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone.To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data.Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

View Article: PubMed Central - HTML - PubMed

Affiliation: Biostatistics Department, Harvard School of Public Health , Boston, MA , USA ; The Broad Institute of MIT and Harvard , Cambridge, MA , USA.

ABSTRACT
Modern biological research requires rapid, complex, and reproducible integration of multiple experimental results generated both internally and externally (e.g., from public repositories). Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone. To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data. This allowed, first, novel biomolecular network reconstruction in human prostate cancer, which correctly recovered and extended the NFκB signaling pathway. Next, we investigated host-microbiome interactions. In less than an hour of analysis time, the system retrieved data and integrated six germ-free murine intestinal gene expression datasets to identify the genes most influenced by the gut microbiota, which comprised a set of immune-response and carbohydrate metabolism processes. Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

No MeSH data available.


Related in: MedlinePlus

Integrated molecular networks for comparative microbial functional genomics.ARepA allowed the retrieval of standardized gene expression and interaction data for three microbial species based on a shared gene identifier to assess functional differences in conserved and non-conserved secretion pathways. High-confidence subgraphs were extracted from species-specific integrated functional networks around genes from species-specific secretion pathways to identify highly functionally related gene clusters within each individual system. These subgraphs represent gene clusters of Sec and Tat genes in B. subtilis (A), sec, tat, and Type II genes in E. coli (B), and sec, tat, Type II, Type III, and Type VI genes in P. Aeruginosa (C). From each of these species-specific molecular networks we recovered highly functionally related gene clusters and conserved and non-conserved components from the peptide secretion system.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493686&req=5

fig-4: Integrated molecular networks for comparative microbial functional genomics.ARepA allowed the retrieval of standardized gene expression and interaction data for three microbial species based on a shared gene identifier to assess functional differences in conserved and non-conserved secretion pathways. High-confidence subgraphs were extracted from species-specific integrated functional networks around genes from species-specific secretion pathways to identify highly functionally related gene clusters within each individual system. These subgraphs represent gene clusters of Sec and Tat genes in B. subtilis (A), sec, tat, and Type II genes in E. coli (B), and sec, tat, Type II, Type III, and Type VI genes in P. Aeruginosa (C). From each of these species-specific molecular networks we recovered highly functionally related gene clusters and conserved and non-conserved components from the peptide secretion system.

Mentions: We retrieved all gene expression data (GEO) and gene interaction data (IntAct, MPIDB, RegulonDB, Bacteriome and STRING, Table 2A) from ARepA for Bacillus subtilis, Escherichia coli and Pseudomonas aeruginosa. These were processed as above and standardized onto a shared gene identifier (UniRef90). Within each bacterium, these data were integrated into a species-specific functional network using an unsupervised data integration averaging across normalized co-expression values (z-scores) (Huttenhower et al., 2009) and extracting a high-confidence subgraph based on the probabilistic graph search algorithm (Huttenhower et al., 2009; Myers et al., 2005) (Text S1). This query (Huttenhower et al., 2008) started from a user-defined set of genes of the sec and tat genes and genes from the Type I, Type II, Type III, Type V, and Type VI secretion systems (Table S4) and used a neighbourhood size of k = 5 for each subgraph extraction. As Gram-positive bacteria use only the general secretory (sec) pathway and the tat pathway, we used two tat genes and nine sec genes as the query gene set in B. subtilis. In contrast, E. coli, a Gram-negative bacterium,  employs an additional secretion system (Type II secretion system) and has one type I gene (TolC); thus, we used four tat genes, twelve sec genes, one type I gene, and twelve type II genes as a query gene set in E. coli. P. aeruginosa, another Gram-negative bacterium, additionally utilizes the type I, type III , type V, and type VI secretion systems, so we defined our query gene set of three tat genes, nine sec genes, two type I genes, eleven type II genes, twelve type III genes, one type V gene, and eight type VI genes for retrieving a species-specific high-confidence network in P. aeruginosa. We recovered networks of 16 genes for B. subtilis (Fig. 4A), 34 genes for E. coli (Fig. 4B), and 49 genes for P. aeruginosa (Fig. 4C), which showed clustering of secretion types conserved across species.


A reproducible approach to high-throughput biological data acquisition and integration.

Börnigen D, Moon YS, Rahnavard G, Waldron L, McIver L, Shafquat A, Franzosa EA, Miropolsky L, Sweeney C, Morgan XC, Garrett WS, Huttenhower C - PeerJ (2015)

Integrated molecular networks for comparative microbial functional genomics.ARepA allowed the retrieval of standardized gene expression and interaction data for three microbial species based on a shared gene identifier to assess functional differences in conserved and non-conserved secretion pathways. High-confidence subgraphs were extracted from species-specific integrated functional networks around genes from species-specific secretion pathways to identify highly functionally related gene clusters within each individual system. These subgraphs represent gene clusters of Sec and Tat genes in B. subtilis (A), sec, tat, and Type II genes in E. coli (B), and sec, tat, Type II, Type III, and Type VI genes in P. Aeruginosa (C). From each of these species-specific molecular networks we recovered highly functionally related gene clusters and conserved and non-conserved components from the peptide secretion system.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493686&req=5

fig-4: Integrated molecular networks for comparative microbial functional genomics.ARepA allowed the retrieval of standardized gene expression and interaction data for three microbial species based on a shared gene identifier to assess functional differences in conserved and non-conserved secretion pathways. High-confidence subgraphs were extracted from species-specific integrated functional networks around genes from species-specific secretion pathways to identify highly functionally related gene clusters within each individual system. These subgraphs represent gene clusters of Sec and Tat genes in B. subtilis (A), sec, tat, and Type II genes in E. coli (B), and sec, tat, Type II, Type III, and Type VI genes in P. Aeruginosa (C). From each of these species-specific molecular networks we recovered highly functionally related gene clusters and conserved and non-conserved components from the peptide secretion system.
Mentions: We retrieved all gene expression data (GEO) and gene interaction data (IntAct, MPIDB, RegulonDB, Bacteriome and STRING, Table 2A) from ARepA for Bacillus subtilis, Escherichia coli and Pseudomonas aeruginosa. These were processed as above and standardized onto a shared gene identifier (UniRef90). Within each bacterium, these data were integrated into a species-specific functional network using an unsupervised data integration averaging across normalized co-expression values (z-scores) (Huttenhower et al., 2009) and extracting a high-confidence subgraph based on the probabilistic graph search algorithm (Huttenhower et al., 2009; Myers et al., 2005) (Text S1). This query (Huttenhower et al., 2008) started from a user-defined set of genes of the sec and tat genes and genes from the Type I, Type II, Type III, Type V, and Type VI secretion systems (Table S4) and used a neighbourhood size of k = 5 for each subgraph extraction. As Gram-positive bacteria use only the general secretory (sec) pathway and the tat pathway, we used two tat genes and nine sec genes as the query gene set in B. subtilis. In contrast, E. coli, a Gram-negative bacterium,  employs an additional secretion system (Type II secretion system) and has one type I gene (TolC); thus, we used four tat genes, twelve sec genes, one type I gene, and twelve type II genes as a query gene set in E. coli. P. aeruginosa, another Gram-negative bacterium, additionally utilizes the type I, type III , type V, and type VI secretion systems, so we defined our query gene set of three tat genes, nine sec genes, two type I genes, eleven type II genes, twelve type III genes, one type V gene, and eight type VI genes for retrieving a species-specific high-confidence network in P. aeruginosa. We recovered networks of 16 genes for B. subtilis (Fig. 4A), 34 genes for E. coli (Fig. 4B), and 49 genes for P. aeruginosa (Fig. 4C), which showed clustering of secretion types conserved across species.

Bottom Line: Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone.To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data.Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

View Article: PubMed Central - HTML - PubMed

Affiliation: Biostatistics Department, Harvard School of Public Health , Boston, MA , USA ; The Broad Institute of MIT and Harvard , Cambridge, MA , USA.

ABSTRACT
Modern biological research requires rapid, complex, and reproducible integration of multiple experimental results generated both internally and externally (e.g., from public repositories). Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone. To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data. This allowed, first, novel biomolecular network reconstruction in human prostate cancer, which correctly recovered and extended the NFκB signaling pathway. Next, we investigated host-microbiome interactions. In less than an hour of analysis time, the system retrieved data and integrated six germ-free murine intestinal gene expression datasets to identify the genes most influenced by the gut microbiota, which comprised a set of immune-response and carbohydrate metabolism processes. Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.

No MeSH data available.


Related in: MedlinePlus