Limits...
Cell-type-specific predictive network yields novel insights into mouse embryonic stem cell self-renewal and cell fate.

Dowell KG, Simons AK, Wang ZZ, Yun K, Hibbs MA - PLoS ONE (2013)

Bottom Line: We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination.Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant.Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies.

View Article: PubMed Central - PubMed

Affiliation: The Jackson Laboratory, Bar Harbor, Maine, USA.

ABSTRACT
Self-renewal, the ability of a stem cell to divide repeatedly while maintaining an undifferentiated state, is a defining characteristic of all stem cells. Here, we clarify the molecular foundations of mouse embryonic stem cell (mESC) self-renewal by applying a proven Bayesian network machine learning approach to integrate high-throughput data for protein function discovery. By focusing on a single stem-cell system, at a specific developmental stage, within the context of well-defined biological processes known to be active in that cell type, we produce a consensus predictive network that reflects biological reality more closely than those made by prior efforts using more generalized, context-independent methods. In addition, we show how machine learning efforts may be misled if the tissue specific role of mammalian proteins is not defined in the training set and circumscribed in the evidential data. For this study, we assembled an extensive compendium of mESC data: ∼2.2 million data points, collected from 60 different studies, under 992 conditions. We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination. Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant. Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies. This network can be used by stem cell researchers (at http://StemSight.org) to explore hypotheses about gene function in the context of self-renewal and to prioritize genes of interest for experimental validation.

Show MeSH

Related in: MedlinePlus

Cell-Type-Specific Data Integration and Machine-Learning Methodology.Our approach is designed to generate reliable and relevant predictive biological networks using high-throughput data limited to a specific cell type and a training gold standard focused on biological processes active in that cell type. This process can be distilled into four basic steps: 1. Collect and standardize cell-type specific data from studies using diverse high-throughput experimental techniques, including microarray gene expression, chromatin immunoprecipitation (ChIP) on chip (ChIP-Chip), ChIP followed by high-throughput-sequencing (ChIP-Seq), affinity purification followed by mass spectrometry (AP-MS), whole-genome small interfering RNA (siRNA) screens, and phylogenetic sequence similarity. For this case study, we focused on mouse embryonic stem cell (mESC) data. 2. Curate a process-specific gold standard training set to provide a baseline for assessing data reliability and significance for related biological processes known to be active in the cell type of interest. Our gold standard training set consists of experimentally validated pair-wise associations between genes and proteins known to be involved in mESC self-renewal, pluripotency, and cell fate determination. 3. Iteratively test and validate networks. A. Use a naïve Bayesian network classifier to perform inference and predict novel gene and protein relationships. Our network predicts pairwise functional associations that influence mESC self-renewal and early developmental processes. B. Validate the accuracy of predicted functional relationships using standard machine learning performance metrics, cross validation, and bootstrapping, followed by evaluation of biological content. Our protocol for assessing networks ensures our results are highly reliable and relevant to mESC self-renewal. 4. Provide community access to analyses and tools. Through StemSight.org, we provide access to network analyses and visualization tools to enable users to further explore networks centered on their genes of interest.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3585227&req=5

pone-0056810-g002: Cell-Type-Specific Data Integration and Machine-Learning Methodology.Our approach is designed to generate reliable and relevant predictive biological networks using high-throughput data limited to a specific cell type and a training gold standard focused on biological processes active in that cell type. This process can be distilled into four basic steps: 1. Collect and standardize cell-type specific data from studies using diverse high-throughput experimental techniques, including microarray gene expression, chromatin immunoprecipitation (ChIP) on chip (ChIP-Chip), ChIP followed by high-throughput-sequencing (ChIP-Seq), affinity purification followed by mass spectrometry (AP-MS), whole-genome small interfering RNA (siRNA) screens, and phylogenetic sequence similarity. For this case study, we focused on mouse embryonic stem cell (mESC) data. 2. Curate a process-specific gold standard training set to provide a baseline for assessing data reliability and significance for related biological processes known to be active in the cell type of interest. Our gold standard training set consists of experimentally validated pair-wise associations between genes and proteins known to be involved in mESC self-renewal, pluripotency, and cell fate determination. 3. Iteratively test and validate networks. A. Use a naïve Bayesian network classifier to perform inference and predict novel gene and protein relationships. Our network predicts pairwise functional associations that influence mESC self-renewal and early developmental processes. B. Validate the accuracy of predicted functional relationships using standard machine learning performance metrics, cross validation, and bootstrapping, followed by evaluation of biological content. Our protocol for assessing networks ensures our results are highly reliable and relevant to mESC self-renewal. 4. Provide community access to analyses and tools. Through StemSight.org, we provide access to network analyses and visualization tools to enable users to further explore networks centered on their genes of interest.

Mentions: We used a naïve Bayesian network methodology (Figure 2) to create a cell-type-specific predictive biological network of protein-coding genes in the context of self-renewal and closely related processes (e.g. pluripotency and cell fate determination) in mESCs. For our training set, we manually curated a positive reference of 2056 pair-wise gene relationships (with a prior of 1) among 354 genes associated with mESC self-renewal or annotated to signaling pathways involved in early embryonic development (Table S1), based on information extracted from 98 recent journal articles (Table S2). We automatically generated a negative reference of 20,560 protein gene pairs (with a prior of 0) not documented to be associated mESC self-renewal. We joined these references together to produce a mESC self-renewal gold standard with a class distribution of 1∶10 (positive:negative) that was used to train the Bayes net. For evidential data, we assembled a compendium of high-throughput mESC data, representing 60 independent research studies, including all mouse data used in prior mESC-focused computational efforts (Table 1; Table S3). This mESC data compendium consisted of ∼2.2 million data points, collected under 992 conditions, using 6 different high-throughput experimental techniques, and encompassing more than 6 billion gene-pair measurements. We used the trained Bayes net to make posterior predictions of functional relationships among 21,291 protein-coding mouse genes based on patterns observed in the integrated evidential data.


Cell-type-specific predictive network yields novel insights into mouse embryonic stem cell self-renewal and cell fate.

Dowell KG, Simons AK, Wang ZZ, Yun K, Hibbs MA - PLoS ONE (2013)

Cell-Type-Specific Data Integration and Machine-Learning Methodology.Our approach is designed to generate reliable and relevant predictive biological networks using high-throughput data limited to a specific cell type and a training gold standard focused on biological processes active in that cell type. This process can be distilled into four basic steps: 1. Collect and standardize cell-type specific data from studies using diverse high-throughput experimental techniques, including microarray gene expression, chromatin immunoprecipitation (ChIP) on chip (ChIP-Chip), ChIP followed by high-throughput-sequencing (ChIP-Seq), affinity purification followed by mass spectrometry (AP-MS), whole-genome small interfering RNA (siRNA) screens, and phylogenetic sequence similarity. For this case study, we focused on mouse embryonic stem cell (mESC) data. 2. Curate a process-specific gold standard training set to provide a baseline for assessing data reliability and significance for related biological processes known to be active in the cell type of interest. Our gold standard training set consists of experimentally validated pair-wise associations between genes and proteins known to be involved in mESC self-renewal, pluripotency, and cell fate determination. 3. Iteratively test and validate networks. A. Use a naïve Bayesian network classifier to perform inference and predict novel gene and protein relationships. Our network predicts pairwise functional associations that influence mESC self-renewal and early developmental processes. B. Validate the accuracy of predicted functional relationships using standard machine learning performance metrics, cross validation, and bootstrapping, followed by evaluation of biological content. Our protocol for assessing networks ensures our results are highly reliable and relevant to mESC self-renewal. 4. Provide community access to analyses and tools. Through StemSight.org, we provide access to network analyses and visualization tools to enable users to further explore networks centered on their genes of interest.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3585227&req=5

pone-0056810-g002: Cell-Type-Specific Data Integration and Machine-Learning Methodology.Our approach is designed to generate reliable and relevant predictive biological networks using high-throughput data limited to a specific cell type and a training gold standard focused on biological processes active in that cell type. This process can be distilled into four basic steps: 1. Collect and standardize cell-type specific data from studies using diverse high-throughput experimental techniques, including microarray gene expression, chromatin immunoprecipitation (ChIP) on chip (ChIP-Chip), ChIP followed by high-throughput-sequencing (ChIP-Seq), affinity purification followed by mass spectrometry (AP-MS), whole-genome small interfering RNA (siRNA) screens, and phylogenetic sequence similarity. For this case study, we focused on mouse embryonic stem cell (mESC) data. 2. Curate a process-specific gold standard training set to provide a baseline for assessing data reliability and significance for related biological processes known to be active in the cell type of interest. Our gold standard training set consists of experimentally validated pair-wise associations between genes and proteins known to be involved in mESC self-renewal, pluripotency, and cell fate determination. 3. Iteratively test and validate networks. A. Use a naïve Bayesian network classifier to perform inference and predict novel gene and protein relationships. Our network predicts pairwise functional associations that influence mESC self-renewal and early developmental processes. B. Validate the accuracy of predicted functional relationships using standard machine learning performance metrics, cross validation, and bootstrapping, followed by evaluation of biological content. Our protocol for assessing networks ensures our results are highly reliable and relevant to mESC self-renewal. 4. Provide community access to analyses and tools. Through StemSight.org, we provide access to network analyses and visualization tools to enable users to further explore networks centered on their genes of interest.
Mentions: We used a naïve Bayesian network methodology (Figure 2) to create a cell-type-specific predictive biological network of protein-coding genes in the context of self-renewal and closely related processes (e.g. pluripotency and cell fate determination) in mESCs. For our training set, we manually curated a positive reference of 2056 pair-wise gene relationships (with a prior of 1) among 354 genes associated with mESC self-renewal or annotated to signaling pathways involved in early embryonic development (Table S1), based on information extracted from 98 recent journal articles (Table S2). We automatically generated a negative reference of 20,560 protein gene pairs (with a prior of 0) not documented to be associated mESC self-renewal. We joined these references together to produce a mESC self-renewal gold standard with a class distribution of 1∶10 (positive:negative) that was used to train the Bayes net. For evidential data, we assembled a compendium of high-throughput mESC data, representing 60 independent research studies, including all mouse data used in prior mESC-focused computational efforts (Table 1; Table S3). This mESC data compendium consisted of ∼2.2 million data points, collected under 992 conditions, using 6 different high-throughput experimental techniques, and encompassing more than 6 billion gene-pair measurements. We used the trained Bayes net to make posterior predictions of functional relationships among 21,291 protein-coding mouse genes based on patterns observed in the integrated evidential data.

Bottom Line: We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination.Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant.Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies.

View Article: PubMed Central - PubMed

Affiliation: The Jackson Laboratory, Bar Harbor, Maine, USA.

ABSTRACT
Self-renewal, the ability of a stem cell to divide repeatedly while maintaining an undifferentiated state, is a defining characteristic of all stem cells. Here, we clarify the molecular foundations of mouse embryonic stem cell (mESC) self-renewal by applying a proven Bayesian network machine learning approach to integrate high-throughput data for protein function discovery. By focusing on a single stem-cell system, at a specific developmental stage, within the context of well-defined biological processes known to be active in that cell type, we produce a consensus predictive network that reflects biological reality more closely than those made by prior efforts using more generalized, context-independent methods. In addition, we show how machine learning efforts may be misled if the tissue specific role of mammalian proteins is not defined in the training set and circumscribed in the evidential data. For this study, we assembled an extensive compendium of mESC data: ∼2.2 million data points, collected from 60 different studies, under 992 conditions. We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination. Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant. Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies. This network can be used by stem cell researchers (at http://StemSight.org) to explore hypotheses about gene function in the context of self-renewal and to prioritize genes of interest for experimental validation.

Show MeSH
Related in: MedlinePlus