Cell-type-specific predictive network yields novel insights into mouse embryonic stem cell self-renewal and cell fate.
Bottom Line: We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination.Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant.Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies.
Affiliation: The Jackson Laboratory, Bar Harbor, Maine, USA.
Self-renewal, the ability of a stem cell to divide repeatedly while maintaining an undifferentiated state, is a defining characteristic of all stem cells. Here, we clarify the molecular foundations of mouse embryonic stem cell (mESC) self-renewal by applying a proven Bayesian network machine learning approach to integrate high-throughput data for protein function discovery. By focusing on a single stem-cell system, at a specific developmental stage, within the context of well-defined biological processes known to be active in that cell type, we produce a consensus predictive network that reflects biological reality more closely than those made by prior efforts using more generalized, context-independent methods. In addition, we show how machine learning efforts may be misled if the tissue specific role of mammalian proteins is not defined in the training set and circumscribed in the evidential data. For this study, we assembled an extensive compendium of mESC data: ∼2.2 million data points, collected from 60 different studies, under 992 conditions. We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination. Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant. Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies. This network can be used by stem cell researchers (at http://StemSight.org) to explore hypotheses about gene function in the context of self-renewal and to prioritize genes of interest for experimental validation.
Related in: MedlinePlus
Mentions: The supervised machine learning approach we used minimizes bias by statistically evaluating data relevance, including which experimental designs are most appropriate and which conditions are most informative. Bayes nets assigned a statistical level of confidence for each input dataset; regularization filtered redundant mutual information shared among datasets. Our method enabled us to report both the strength of the relationship between gene pairs (edge weight) and statistics that describe which evidential data contributed to each edge (Figure 6A, B). Using this information, we could “cross validate” predictions made in other studies, such as functional linkages between transcription factors that have been computationally validated as essential for mESC pluripotency ,  (Figure 7A). For example, our results predict a strong functional linkage between Suz12– Sox2 (edge weight: 0.9998), but a weak connection between Suz12– Myc (edge weight: 0.0007). Closer inspection of these edges reveals that although more than half of the top supporting datasets are the same for each edge, the contribution strength of evidence often differs significantly (Figure 6C) because our approach includes degrees of co-expression, binding affinity, etc., that are not considered when using a binary network construction approach. In this way, our weighted network provides a view that is closer to biological reality as few genes function in a binary fashion in any system context.