Cell-type-specific predictive network yields novel insights into mouse embryonic stem cell self-renewal and cell fate.
Bottom Line: We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination.Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant.Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies.
Affiliation: The Jackson Laboratory, Bar Harbor, Maine, USA.
Self-renewal, the ability of a stem cell to divide repeatedly while maintaining an undifferentiated state, is a defining characteristic of all stem cells. Here, we clarify the molecular foundations of mouse embryonic stem cell (mESC) self-renewal by applying a proven Bayesian network machine learning approach to integrate high-throughput data for protein function discovery. By focusing on a single stem-cell system, at a specific developmental stage, within the context of well-defined biological processes known to be active in that cell type, we produce a consensus predictive network that reflects biological reality more closely than those made by prior efforts using more generalized, context-independent methods. In addition, we show how machine learning efforts may be misled if the tissue specific role of mammalian proteins is not defined in the training set and circumscribed in the evidential data. For this study, we assembled an extensive compendium of mESC data: ∼2.2 million data points, collected from 60 different studies, under 992 conditions. We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination. Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant. Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies. This network can be used by stem cell researchers (at http://StemSight.org) to explore hypotheses about gene function in the context of self-renewal and to prioritize genes of interest for experimental validation.
Related in: MedlinePlus
Mentions: We used a naïve Bayesian network methodology (Figure 2) to create a cell-type-specific predictive biological network of protein-coding genes in the context of self-renewal and closely related processes (e.g. pluripotency and cell fate determination) in mESCs. For our training set, we manually curated a positive reference of 2056 pair-wise gene relationships (with a prior of 1) among 354 genes associated with mESC self-renewal or annotated to signaling pathways involved in early embryonic development (Table S1), based on information extracted from 98 recent journal articles (Table S2). We automatically generated a negative reference of 20,560 protein gene pairs (with a prior of 0) not documented to be associated mESC self-renewal. We joined these references together to produce a mESC self-renewal gold standard with a class distribution of 1∶10 (positive:negative) that was used to train the Bayes net. For evidential data, we assembled a compendium of high-throughput mESC data, representing 60 independent research studies, including all mouse data used in prior mESC-focused computational efforts (Table 1; Table S3). This mESC data compendium consisted of ∼2.2 million data points, collected under 992 conditions, using 6 different high-throughput experimental techniques, and encompassing more than 6 billion gene-pair measurements. We used the trained Bayes net to make posterior predictions of functional relationships among 21,291 protein-coding mouse genes based on patterns observed in the integrated evidential data.