Limits...
Cell-type-specific predictive network yields novel insights into mouse embryonic stem cell self-renewal and cell fate.

Dowell KG, Simons AK, Wang ZZ, Yun K, Hibbs MA - PLoS ONE (2013)

Bottom Line: We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination.Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant.Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies.

View Article: PubMed Central - PubMed

Affiliation: The Jackson Laboratory, Bar Harbor, Maine, USA.

ABSTRACT
Self-renewal, the ability of a stem cell to divide repeatedly while maintaining an undifferentiated state, is a defining characteristic of all stem cells. Here, we clarify the molecular foundations of mouse embryonic stem cell (mESC) self-renewal by applying a proven Bayesian network machine learning approach to integrate high-throughput data for protein function discovery. By focusing on a single stem-cell system, at a specific developmental stage, within the context of well-defined biological processes known to be active in that cell type, we produce a consensus predictive network that reflects biological reality more closely than those made by prior efforts using more generalized, context-independent methods. In addition, we show how machine learning efforts may be misled if the tissue specific role of mammalian proteins is not defined in the training set and circumscribed in the evidential data. For this study, we assembled an extensive compendium of mESC data: ∼2.2 million data points, collected from 60 different studies, under 992 conditions. We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination. Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant. Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies. This network can be used by stem cell researchers (at http://StemSight.org) to explore hypotheses about gene function in the context of self-renewal and to prioritize genes of interest for experimental validation.

Show MeSH

Related in: MedlinePlus

Network Performance Evaluations.A. Computational assessment of network performance using standard machine learning metrics showed that precision at 10% recall was 90%, and 60% at 25% recall, before and after regularization and out of bag averaging to correct for overfitting to noise. The area under the Receiver Operating Characteristic (ROC) curve (AUC) for the mESC network was 0.7479; after regularization and out of bag averaging, the AUC was 0.7165. B. We conducted 4-fold network cross validations by removing 25% of edges in the gold standard (4-fold Gold Standard). ROC curves showed a small amount of overfitting, most apparent in cross validations for which we removed 25% of genes (rather than edges) from the network training set (Figure S1). C. We conducted 20 bootstrap runs, using 70–30 split (training to test) of the gold standard answer file, and performed out-of-bag averaging to produce a single network. The relatively flat trend of AUC over out-of-bag-averaging runs confirms the minimal amount of overfitting and produced a single network with high confidence inference scores.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3585227&req=5

pone-0056810-g003: Network Performance Evaluations.A. Computational assessment of network performance using standard machine learning metrics showed that precision at 10% recall was 90%, and 60% at 25% recall, before and after regularization and out of bag averaging to correct for overfitting to noise. The area under the Receiver Operating Characteristic (ROC) curve (AUC) for the mESC network was 0.7479; after regularization and out of bag averaging, the AUC was 0.7165. B. We conducted 4-fold network cross validations by removing 25% of edges in the gold standard (4-fold Gold Standard). ROC curves showed a small amount of overfitting, most apparent in cross validations for which we removed 25% of genes (rather than edges) from the network training set (Figure S1). C. We conducted 20 bootstrap runs, using 70–30 split (training to test) of the gold standard answer file, and performed out-of-bag averaging to produce a single network. The relatively flat trend of AUC over out-of-bag-averaging runs confirms the minimal amount of overfitting and produced a single network with high confidence inference scores.

Mentions: Notes: A total of 77 high-throughput datasets were collected from various public sources to create a compendium of mESC-specific data that included 992 conditions (e.g. columns in a microarray matrix) and ∼2.2 million data points (Table S3). These data were standardized and integrated into ∼6 billion gene/protein pairs, and used as evidential data to generate a predictive mESC-specific network focused on mESC self-renewal and cell fate. Datasets were weighted based on the amount of shared mutual information contained in each as compared to all evidential datasets used by the Bayes net. A low mean redundancy indicates the dataset is highly unique. As observed in other similar Bayesian network data integration efforts (including integration of human data), genetic and physical interaction data were the most reliable, but also the least common [11]. We strove to assemble a diverse and comprehensive set of mESC data that would provide the most coverage and be highly informative. Protein-DNA Interaction data included chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-Chip) and ChIP followed by high-throughput RNA sequencing (ChIP-Seq). Top ranked edges were the 639 edges with a rank order of 1 and an inferred edge weight ≥0.9999 (Figure 3A, Table S11); the top 0.01% of the network consists of the 22,664 edges with an inferred edge weight ≥0.9966 (Figure 3B, dataset contributions to top 0.01% edges available at StemSight.org/stemdata.html).


Cell-type-specific predictive network yields novel insights into mouse embryonic stem cell self-renewal and cell fate.

Dowell KG, Simons AK, Wang ZZ, Yun K, Hibbs MA - PLoS ONE (2013)

Network Performance Evaluations.A. Computational assessment of network performance using standard machine learning metrics showed that precision at 10% recall was 90%, and 60% at 25% recall, before and after regularization and out of bag averaging to correct for overfitting to noise. The area under the Receiver Operating Characteristic (ROC) curve (AUC) for the mESC network was 0.7479; after regularization and out of bag averaging, the AUC was 0.7165. B. We conducted 4-fold network cross validations by removing 25% of edges in the gold standard (4-fold Gold Standard). ROC curves showed a small amount of overfitting, most apparent in cross validations for which we removed 25% of genes (rather than edges) from the network training set (Figure S1). C. We conducted 20 bootstrap runs, using 70–30 split (training to test) of the gold standard answer file, and performed out-of-bag averaging to produce a single network. The relatively flat trend of AUC over out-of-bag-averaging runs confirms the minimal amount of overfitting and produced a single network with high confidence inference scores.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3585227&req=5

pone-0056810-g003: Network Performance Evaluations.A. Computational assessment of network performance using standard machine learning metrics showed that precision at 10% recall was 90%, and 60% at 25% recall, before and after regularization and out of bag averaging to correct for overfitting to noise. The area under the Receiver Operating Characteristic (ROC) curve (AUC) for the mESC network was 0.7479; after regularization and out of bag averaging, the AUC was 0.7165. B. We conducted 4-fold network cross validations by removing 25% of edges in the gold standard (4-fold Gold Standard). ROC curves showed a small amount of overfitting, most apparent in cross validations for which we removed 25% of genes (rather than edges) from the network training set (Figure S1). C. We conducted 20 bootstrap runs, using 70–30 split (training to test) of the gold standard answer file, and performed out-of-bag averaging to produce a single network. The relatively flat trend of AUC over out-of-bag-averaging runs confirms the minimal amount of overfitting and produced a single network with high confidence inference scores.
Mentions: Notes: A total of 77 high-throughput datasets were collected from various public sources to create a compendium of mESC-specific data that included 992 conditions (e.g. columns in a microarray matrix) and ∼2.2 million data points (Table S3). These data were standardized and integrated into ∼6 billion gene/protein pairs, and used as evidential data to generate a predictive mESC-specific network focused on mESC self-renewal and cell fate. Datasets were weighted based on the amount of shared mutual information contained in each as compared to all evidential datasets used by the Bayes net. A low mean redundancy indicates the dataset is highly unique. As observed in other similar Bayesian network data integration efforts (including integration of human data), genetic and physical interaction data were the most reliable, but also the least common [11]. We strove to assemble a diverse and comprehensive set of mESC data that would provide the most coverage and be highly informative. Protein-DNA Interaction data included chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-Chip) and ChIP followed by high-throughput RNA sequencing (ChIP-Seq). Top ranked edges were the 639 edges with a rank order of 1 and an inferred edge weight ≥0.9999 (Figure 3A, Table S11); the top 0.01% of the network consists of the 22,664 edges with an inferred edge weight ≥0.9966 (Figure 3B, dataset contributions to top 0.01% edges available at StemSight.org/stemdata.html).

Bottom Line: We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination.Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant.Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies.

View Article: PubMed Central - PubMed

Affiliation: The Jackson Laboratory, Bar Harbor, Maine, USA.

ABSTRACT
Self-renewal, the ability of a stem cell to divide repeatedly while maintaining an undifferentiated state, is a defining characteristic of all stem cells. Here, we clarify the molecular foundations of mouse embryonic stem cell (mESC) self-renewal by applying a proven Bayesian network machine learning approach to integrate high-throughput data for protein function discovery. By focusing on a single stem-cell system, at a specific developmental stage, within the context of well-defined biological processes known to be active in that cell type, we produce a consensus predictive network that reflects biological reality more closely than those made by prior efforts using more generalized, context-independent methods. In addition, we show how machine learning efforts may be misled if the tissue specific role of mammalian proteins is not defined in the training set and circumscribed in the evidential data. For this study, we assembled an extensive compendium of mESC data: ∼2.2 million data points, collected from 60 different studies, under 992 conditions. We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination. Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant. Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies. This network can be used by stem cell researchers (at http://StemSight.org) to explore hypotheses about gene function in the context of self-renewal and to prioritize genes of interest for experimental validation.

Show MeSH
Related in: MedlinePlus