Limits...
Extrapolating histone marks across developmental stages, tissues, and species: an enhancer prediction case study.

Capra JA - BMC Genomics (2015)

Bottom Line: Using a machine-learning approach to integrate the data from different contexts, I found that E11.5 heart enhancers can often be predicted accurately from data from other contexts, and I quantified the contribution of each data source to the predictions.The utility of each dataset correlated with nearness in developmental time and tissue to the target context: data from late developmental stages and adult heart tissues were most informative for predicting E11.5 enhancers, while marks from stem cells and early developmental stages were less informative.The ability of these algorithms to accurately predict developmental enhancers based on data from related, but distinct, cellular contexts suggests that combining computational models with epigenetic data sampled from relevant contexts may be sufficient to enable functional characterization of many cellular contexts of interest.

View Article: PubMed Central - PubMed

Affiliation: Departments of Biological Sciences and Biomedical Informatics, Vanderbilt University, VU Station B, Box 35-1634, Nashville, 37235-1634, TN, USA. tony.capra@vanderbilt.edu.

ABSTRACT

Background: Dynamic activation and inactivation of gene regulatory DNA produce the expression changes that drive the differentiation of cellular lineages. Identifying regulatory regions active during developmental transitions is necessary to understand how the genome specifies complex developmental programs and how these processes are disrupted in disease. Gene regulatory dynamics are mediated by many factors, including the binding of transcription factors (TFs) and the methylation and acetylation of DNA and histones. Genome-wide maps of TF binding and DNA and histone modifications have been generated for many cellular contexts; however, given the diversity and complexity of animal development, these data cover only a small fraction of the cellular and developmental contexts of interest. Thus, there is a need for methods that use existing epigenetic and functional genomics data to analyze the thousands of contexts that remain uncharacterized.

Results: To investigate the utility of histone modification data in the analysis of cellular contexts without such data, I evaluated how well genome-wide H3K27ac and H3K4me1 data collected in different developmental stages, tissues, and species were able to predict experimentally validated heart enhancers active at embryonic day 11.5 (E11.5) in mouse. Using a machine-learning approach to integrate the data from different contexts, I found that E11.5 heart enhancers can often be predicted accurately from data from other contexts, and I quantified the contribution of each data source to the predictions. The utility of each dataset correlated with nearness in developmental time and tissue to the target context: data from late developmental stages and adult heart tissues were most informative for predicting E11.5 enhancers, while marks from stem cells and early developmental stages were less informative. Predictions based on data collected in non-heart tissues and in human hearts were better than random, but worse than using data from mouse hearts.

Conclusions: The ability of these algorithms to accurately predict developmental enhancers based on data from related, but distinct, cellular contexts suggests that combining computational models with epigenetic data sampled from relevant contexts may be sufficient to enable functional characterization of many cellular contexts of interest.

Show MeSH

Related in: MedlinePlus

Overview of the data and analyses.(A) I collected existing genome-wide maps of two histone marks, H3K4me1 and H3K27ac, from stages of a directed differentiation of mouse embryonic stem (ES) cells into cardiomyocytes, from heart tissues collected from several life stages, and from several other tissues. I evaluated how well these marks, which are associated with enhancer activity, could predict experimentally validated heart enhancers in E11.5 mice (“Target”). (B) I took a supervised machine learning approach to this problem by constructing feature vectors for validated enhancers and control regions based on the presence or absence of these histone modifications at their genomic locations. I created classifiers based on different subsets of the data from the cellular contexts given in (A) and evaluated them using cross validation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4342796&req=5

Fig1: Overview of the data and analyses.(A) I collected existing genome-wide maps of two histone marks, H3K4me1 and H3K27ac, from stages of a directed differentiation of mouse embryonic stem (ES) cells into cardiomyocytes, from heart tissues collected from several life stages, and from several other tissues. I evaluated how well these marks, which are associated with enhancer activity, could predict experimentally validated heart enhancers in E11.5 mice (“Target”). (B) I took a supervised machine learning approach to this problem by constructing feature vectors for validated enhancers and control regions based on the presence or absence of these histone modifications at their genomic locations. I created classifiers based on different subsets of the data from the cellular contexts given in (A) and evaluated them using cross validation.

Mentions: My goal was to evaluate the ability of two enhancer-associated histone modifications, H3K4me1 and H3K 27ac, collected from different cellular, developmental, and organismal contexts to identify known mouse developmental enhancers (Figure 1). I used H3K4me1 and H3K27ac sites identified via ChIP-Seq on four stages of a directed differentiation of ES cells (E0) to mesoderm (E4) to cardiac precursors (E5.8) to cardiomyocytes (E10) [9]. All other histone mark data I used, including marks from embryonic day 14.5 (E14.5) and eight week old (adult) hearts, were collected by the ENCODE project [6]. Note that the heart data from the first four contexts were collected from a single cell type, while the last two are from full hearts (see Discussion). Other histone modifications are likely informative about enhancer activity [16]; however, we only consider H3Kme1 and H3K27ac, because they have been associated with enhancer activity in many studies and have both been collected in a consistent manner across a range of cellular and developmental contexts.Figure 1


Extrapolating histone marks across developmental stages, tissues, and species: an enhancer prediction case study.

Capra JA - BMC Genomics (2015)

Overview of the data and analyses.(A) I collected existing genome-wide maps of two histone marks, H3K4me1 and H3K27ac, from stages of a directed differentiation of mouse embryonic stem (ES) cells into cardiomyocytes, from heart tissues collected from several life stages, and from several other tissues. I evaluated how well these marks, which are associated with enhancer activity, could predict experimentally validated heart enhancers in E11.5 mice (“Target”). (B) I took a supervised machine learning approach to this problem by constructing feature vectors for validated enhancers and control regions based on the presence or absence of these histone modifications at their genomic locations. I created classifiers based on different subsets of the data from the cellular contexts given in (A) and evaluated them using cross validation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4342796&req=5

Fig1: Overview of the data and analyses.(A) I collected existing genome-wide maps of two histone marks, H3K4me1 and H3K27ac, from stages of a directed differentiation of mouse embryonic stem (ES) cells into cardiomyocytes, from heart tissues collected from several life stages, and from several other tissues. I evaluated how well these marks, which are associated with enhancer activity, could predict experimentally validated heart enhancers in E11.5 mice (“Target”). (B) I took a supervised machine learning approach to this problem by constructing feature vectors for validated enhancers and control regions based on the presence or absence of these histone modifications at their genomic locations. I created classifiers based on different subsets of the data from the cellular contexts given in (A) and evaluated them using cross validation.
Mentions: My goal was to evaluate the ability of two enhancer-associated histone modifications, H3K4me1 and H3K 27ac, collected from different cellular, developmental, and organismal contexts to identify known mouse developmental enhancers (Figure 1). I used H3K4me1 and H3K27ac sites identified via ChIP-Seq on four stages of a directed differentiation of ES cells (E0) to mesoderm (E4) to cardiac precursors (E5.8) to cardiomyocytes (E10) [9]. All other histone mark data I used, including marks from embryonic day 14.5 (E14.5) and eight week old (adult) hearts, were collected by the ENCODE project [6]. Note that the heart data from the first four contexts were collected from a single cell type, while the last two are from full hearts (see Discussion). Other histone modifications are likely informative about enhancer activity [16]; however, we only consider H3Kme1 and H3K27ac, because they have been associated with enhancer activity in many studies and have both been collected in a consistent manner across a range of cellular and developmental contexts.Figure 1

Bottom Line: Using a machine-learning approach to integrate the data from different contexts, I found that E11.5 heart enhancers can often be predicted accurately from data from other contexts, and I quantified the contribution of each data source to the predictions.The utility of each dataset correlated with nearness in developmental time and tissue to the target context: data from late developmental stages and adult heart tissues were most informative for predicting E11.5 enhancers, while marks from stem cells and early developmental stages were less informative.The ability of these algorithms to accurately predict developmental enhancers based on data from related, but distinct, cellular contexts suggests that combining computational models with epigenetic data sampled from relevant contexts may be sufficient to enable functional characterization of many cellular contexts of interest.

View Article: PubMed Central - PubMed

Affiliation: Departments of Biological Sciences and Biomedical Informatics, Vanderbilt University, VU Station B, Box 35-1634, Nashville, 37235-1634, TN, USA. tony.capra@vanderbilt.edu.

ABSTRACT

Background: Dynamic activation and inactivation of gene regulatory DNA produce the expression changes that drive the differentiation of cellular lineages. Identifying regulatory regions active during developmental transitions is necessary to understand how the genome specifies complex developmental programs and how these processes are disrupted in disease. Gene regulatory dynamics are mediated by many factors, including the binding of transcription factors (TFs) and the methylation and acetylation of DNA and histones. Genome-wide maps of TF binding and DNA and histone modifications have been generated for many cellular contexts; however, given the diversity and complexity of animal development, these data cover only a small fraction of the cellular and developmental contexts of interest. Thus, there is a need for methods that use existing epigenetic and functional genomics data to analyze the thousands of contexts that remain uncharacterized.

Results: To investigate the utility of histone modification data in the analysis of cellular contexts without such data, I evaluated how well genome-wide H3K27ac and H3K4me1 data collected in different developmental stages, tissues, and species were able to predict experimentally validated heart enhancers active at embryonic day 11.5 (E11.5) in mouse. Using a machine-learning approach to integrate the data from different contexts, I found that E11.5 heart enhancers can often be predicted accurately from data from other contexts, and I quantified the contribution of each data source to the predictions. The utility of each dataset correlated with nearness in developmental time and tissue to the target context: data from late developmental stages and adult heart tissues were most informative for predicting E11.5 enhancers, while marks from stem cells and early developmental stages were less informative. Predictions based on data collected in non-heart tissues and in human hearts were better than random, but worse than using data from mouse hearts.

Conclusions: The ability of these algorithms to accurately predict developmental enhancers based on data from related, but distinct, cellular contexts suggests that combining computational models with epigenetic data sampled from relevant contexts may be sufficient to enable functional characterization of many cellular contexts of interest.

Show MeSH
Related in: MedlinePlus