Limits...
Inferring dynamic gene regulatory networks in cardiac differentiation through the integration of multi-dimensional data.

Gong W, Koyano-Nakagawa N, Li T, Garry DJ - BMC Bioinformatics (2015)

Bottom Line: Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes.Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions.This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.

View Article: PubMed Central - PubMed

Affiliation: Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN, 55114, USA. gongx030@umn.edu.

ABSTRACT

Background: Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion.

Results: We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes. The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p <10(-100)), suggesting that a high LR score is a reliable indicator for functional TF binding sites. Our network inference model identified a region with an elevated LR score approximately -9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (-9435 to -8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions.

Conclusion: We report a novel method to systematically integrate multi-dimensional -omics data and reconstruct the gene regulatory networks. This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.

Show MeSH

Related in: MedlinePlus

Overview of our two-step strategy to infer dynamic regulatory networks during cardiac differentiation. (A) Training a general logistic regression model to predict the probability being bound by any transcription factor (LR score) in 40 kb cis-regions surrounding transcriptional start sites (TSS) of expressed genes. The response variables of the model indicate whether the hit of the PWM of a specific TF coincides with the peak region of the corresponding ChIP-seq data. 15 context-independent (e.g. conservation) and 54 context-dependent (e.g. mean intensity of H3K27ac in 1 kb surrounding any base) features were used to train the logistic regression model. (B) Context-dependent LR score, temporal expression profiles and perturbation networks were used to infer the dynamic regulatory networks based on a time-varying dynamic Bayesian network model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359553&req=5

Fig1: Overview of our two-step strategy to infer dynamic regulatory networks during cardiac differentiation. (A) Training a general logistic regression model to predict the probability being bound by any transcription factor (LR score) in 40 kb cis-regions surrounding transcriptional start sites (TSS) of expressed genes. The response variables of the model indicate whether the hit of the PWM of a specific TF coincides with the peak region of the corresponding ChIP-seq data. 15 context-independent (e.g. conservation) and 54 context-dependent (e.g. mean intensity of H3K27ac in 1 kb surrounding any base) features were used to train the logistic regression model. (B) Context-dependent LR score, temporal expression profiles and perturbation networks were used to infer the dynamic regulatory networks based on a time-varying dynamic Bayesian network model.

Mentions: We developed a two-step strategy to infer the dynamic GRN during cardiac differentiation (Figure 1). In the first step, based on 17 TFs whose ChIP-seq data are available for either mouse ESCs or cardiomyocyte HL-1 cells (Table 1), we trained a logistic regression model to predict the probability for any base being bound by any TFs with known PWMs, at a specific differentiation stage. The model included the context independent features that do not change during differentiation (e.g. base conservation) and context dependent features such as the expression levels of nearby genes, the intensity of histone modifications within defined distances, as well as histone modification changes between adjacent time points. This concept was modified from the work by Ernst et al. that infers a score quantifying the general binding preferences of TFBS [48]. However, it should be noted that, for any given sequence in the genome, the output of our model, the logistic regression (LR) score, was dependent upon the differentiation stage, and not specific TFs. Specifically, the stage-specific LR score was designed to capture the stage-specific TFBS. In the second step, we used the following information: (1) the temporal LR score defined in step 1, (2) the temporal expression profiles of cardiac differentiation at four developmental stages in the mouse [ESCs, mesoderm (MES), cardiac progenitors (CP), and cardiomyocytes (CM) [1]], and (3) the perturbed network we compiled from the perturbation experiments performed in mouse ESCs and HL-1 cells [2,4,20,50,51], which were combined under the framework of a time-varying dynamic Bayesian network model to infer the dynamic networks during cardiac differentiation.Figure 1


Inferring dynamic gene regulatory networks in cardiac differentiation through the integration of multi-dimensional data.

Gong W, Koyano-Nakagawa N, Li T, Garry DJ - BMC Bioinformatics (2015)

Overview of our two-step strategy to infer dynamic regulatory networks during cardiac differentiation. (A) Training a general logistic regression model to predict the probability being bound by any transcription factor (LR score) in 40 kb cis-regions surrounding transcriptional start sites (TSS) of expressed genes. The response variables of the model indicate whether the hit of the PWM of a specific TF coincides with the peak region of the corresponding ChIP-seq data. 15 context-independent (e.g. conservation) and 54 context-dependent (e.g. mean intensity of H3K27ac in 1 kb surrounding any base) features were used to train the logistic regression model. (B) Context-dependent LR score, temporal expression profiles and perturbation networks were used to infer the dynamic regulatory networks based on a time-varying dynamic Bayesian network model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359553&req=5

Fig1: Overview of our two-step strategy to infer dynamic regulatory networks during cardiac differentiation. (A) Training a general logistic regression model to predict the probability being bound by any transcription factor (LR score) in 40 kb cis-regions surrounding transcriptional start sites (TSS) of expressed genes. The response variables of the model indicate whether the hit of the PWM of a specific TF coincides with the peak region of the corresponding ChIP-seq data. 15 context-independent (e.g. conservation) and 54 context-dependent (e.g. mean intensity of H3K27ac in 1 kb surrounding any base) features were used to train the logistic regression model. (B) Context-dependent LR score, temporal expression profiles and perturbation networks were used to infer the dynamic regulatory networks based on a time-varying dynamic Bayesian network model.
Mentions: We developed a two-step strategy to infer the dynamic GRN during cardiac differentiation (Figure 1). In the first step, based on 17 TFs whose ChIP-seq data are available for either mouse ESCs or cardiomyocyte HL-1 cells (Table 1), we trained a logistic regression model to predict the probability for any base being bound by any TFs with known PWMs, at a specific differentiation stage. The model included the context independent features that do not change during differentiation (e.g. base conservation) and context dependent features such as the expression levels of nearby genes, the intensity of histone modifications within defined distances, as well as histone modification changes between adjacent time points. This concept was modified from the work by Ernst et al. that infers a score quantifying the general binding preferences of TFBS [48]. However, it should be noted that, for any given sequence in the genome, the output of our model, the logistic regression (LR) score, was dependent upon the differentiation stage, and not specific TFs. Specifically, the stage-specific LR score was designed to capture the stage-specific TFBS. In the second step, we used the following information: (1) the temporal LR score defined in step 1, (2) the temporal expression profiles of cardiac differentiation at four developmental stages in the mouse [ESCs, mesoderm (MES), cardiac progenitors (CP), and cardiomyocytes (CM) [1]], and (3) the perturbed network we compiled from the perturbation experiments performed in mouse ESCs and HL-1 cells [2,4,20,50,51], which were combined under the framework of a time-varying dynamic Bayesian network model to infer the dynamic networks during cardiac differentiation.Figure 1

Bottom Line: Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes.Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions.This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.

View Article: PubMed Central - PubMed

Affiliation: Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN, 55114, USA. gongx030@umn.edu.

ABSTRACT

Background: Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion.

Results: We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes. The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p <10(-100)), suggesting that a high LR score is a reliable indicator for functional TF binding sites. Our network inference model identified a region with an elevated LR score approximately -9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (-9435 to -8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions.

Conclusion: We report a novel method to systematically integrate multi-dimensional -omics data and reconstruct the gene regulatory networks. This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.

Show MeSH
Related in: MedlinePlus