Limits...
Inferring dynamic gene regulatory networks in cardiac differentiation through the integration of multi-dimensional data.

Gong W, Koyano-Nakagawa N, Li T, Garry DJ - BMC Bioinformatics (2015)

Bottom Line: Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes.Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions.This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.

View Article: PubMed Central - PubMed

Affiliation: Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN, 55114, USA. gongx030@umn.edu.

ABSTRACT

Background: Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion.

Results: We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes. The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p <10(-100)), suggesting that a high LR score is a reliable indicator for functional TF binding sites. Our network inference model identified a region with an elevated LR score approximately -9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (-9435 to -8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions.

Conclusion: We report a novel method to systematically integrate multi-dimensional -omics data and reconstruct the gene regulatory networks. This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.

Show MeSH

Related in: MedlinePlus

Stage specific transcription factor (TF) binding probability (LR score). (A) Performance of leave-one-TF-out cross-validation of predicting binding sites of 12 ESC TFs and 5 cardiac TFs, as measured by area under the Receiver Operating Characteristics curve (AUC). (B) Distribution of the mean LR score of ESC transcriptional regulatory modules identified by functional identification of regulatory elements within accessible chromatin (FIREWACh) and the LR score of one million randomly selected bases in the cis-region [35]. (C) Distribution of the mean LR score of heart enhancers and the LR score of one million randomly selected bases in the respective cis-regions [34]. (D) Number of significantly enriched TFs in high LR score regions (>0.1) in the three stage transition.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359553&req=5

Fig2: Stage specific transcription factor (TF) binding probability (LR score). (A) Performance of leave-one-TF-out cross-validation of predicting binding sites of 12 ESC TFs and 5 cardiac TFs, as measured by area under the Receiver Operating Characteristics curve (AUC). (B) Distribution of the mean LR score of ESC transcriptional regulatory modules identified by functional identification of regulatory elements within accessible chromatin (FIREWACh) and the LR score of one million randomly selected bases in the cis-region [35]. (C) Distribution of the mean LR score of heart enhancers and the LR score of one million randomly selected bases in the respective cis-regions [34]. (D) Number of significantly enriched TFs in high LR score regions (>0.1) in the three stage transition.

Mentions: As our goal was to train a general stage-specific model to predict the binding probability of any TF with PWM, we used a leave-one-TF-out cross-validation (LOTFOCV) to evaluate the generalizability of the model. In short, at each stage (time point), we used the data from 16 of the 17 TFs to train a model, and tested its performance on the remaining TFs. The sensitivity and specificity of the predictions were determined by the overlap between the PWM hit and the ChIP-seq peaks. The performance was measured by Area Under Receiver Operating Characteristics Curve (AUC) (Figure 2A). The AUC ranged from 0.961 (Mycn) to 0.702 (Nkx2-5) with the 40 kb cis-region and PWM score cutoff at 90%, while the mean AUC of 17 TFs was 0.860 (Figure 2A). We also checked the AUC with distinct parameters (cis-region = 20 kb or 40 kb, PWM score cutoff = 90% or 95%), and noted that the performance was similar between these conditions (Additional file 1: Figure S1A-C). We compared the performance of the full model by using all features (features #1 - #69), models without features defined at t + 1 (features #1 - #15, #16, #20 - #24, #30 - #34, #40 - #44, #50 - #54, #60 - #64), and models with only sequence features (features #1 - #15) (Additional file 1: Figure S1D). The full model demonstrated the best performance for 14 out of 17 TFs, with the exception of Myc, Nkx2-5 and Tbx5. The results suggested that the model was able to predict the stage-specific TFBSs by using context independent sequence features combined with context dependent expression and histone modification features. Because the features per se were independent of the PWMs of any specific TF, this model can be used to predict the binding probability of other TFs, whose ChIP-seq data are not available during the cardiac differentiation process yet the PWMs of which have already been defined.Figure 2


Inferring dynamic gene regulatory networks in cardiac differentiation through the integration of multi-dimensional data.

Gong W, Koyano-Nakagawa N, Li T, Garry DJ - BMC Bioinformatics (2015)

Stage specific transcription factor (TF) binding probability (LR score). (A) Performance of leave-one-TF-out cross-validation of predicting binding sites of 12 ESC TFs and 5 cardiac TFs, as measured by area under the Receiver Operating Characteristics curve (AUC). (B) Distribution of the mean LR score of ESC transcriptional regulatory modules identified by functional identification of regulatory elements within accessible chromatin (FIREWACh) and the LR score of one million randomly selected bases in the cis-region [35]. (C) Distribution of the mean LR score of heart enhancers and the LR score of one million randomly selected bases in the respective cis-regions [34]. (D) Number of significantly enriched TFs in high LR score regions (>0.1) in the three stage transition.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359553&req=5

Fig2: Stage specific transcription factor (TF) binding probability (LR score). (A) Performance of leave-one-TF-out cross-validation of predicting binding sites of 12 ESC TFs and 5 cardiac TFs, as measured by area under the Receiver Operating Characteristics curve (AUC). (B) Distribution of the mean LR score of ESC transcriptional regulatory modules identified by functional identification of regulatory elements within accessible chromatin (FIREWACh) and the LR score of one million randomly selected bases in the cis-region [35]. (C) Distribution of the mean LR score of heart enhancers and the LR score of one million randomly selected bases in the respective cis-regions [34]. (D) Number of significantly enriched TFs in high LR score regions (>0.1) in the three stage transition.
Mentions: As our goal was to train a general stage-specific model to predict the binding probability of any TF with PWM, we used a leave-one-TF-out cross-validation (LOTFOCV) to evaluate the generalizability of the model. In short, at each stage (time point), we used the data from 16 of the 17 TFs to train a model, and tested its performance on the remaining TFs. The sensitivity and specificity of the predictions were determined by the overlap between the PWM hit and the ChIP-seq peaks. The performance was measured by Area Under Receiver Operating Characteristics Curve (AUC) (Figure 2A). The AUC ranged from 0.961 (Mycn) to 0.702 (Nkx2-5) with the 40 kb cis-region and PWM score cutoff at 90%, while the mean AUC of 17 TFs was 0.860 (Figure 2A). We also checked the AUC with distinct parameters (cis-region = 20 kb or 40 kb, PWM score cutoff = 90% or 95%), and noted that the performance was similar between these conditions (Additional file 1: Figure S1A-C). We compared the performance of the full model by using all features (features #1 - #69), models without features defined at t + 1 (features #1 - #15, #16, #20 - #24, #30 - #34, #40 - #44, #50 - #54, #60 - #64), and models with only sequence features (features #1 - #15) (Additional file 1: Figure S1D). The full model demonstrated the best performance for 14 out of 17 TFs, with the exception of Myc, Nkx2-5 and Tbx5. The results suggested that the model was able to predict the stage-specific TFBSs by using context independent sequence features combined with context dependent expression and histone modification features. Because the features per se were independent of the PWMs of any specific TF, this model can be used to predict the binding probability of other TFs, whose ChIP-seq data are not available during the cardiac differentiation process yet the PWMs of which have already been defined.Figure 2

Bottom Line: Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes.Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions.This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.

View Article: PubMed Central - PubMed

Affiliation: Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN, 55114, USA. gongx030@umn.edu.

ABSTRACT

Background: Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion.

Results: We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes. The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p <10(-100)), suggesting that a high LR score is a reliable indicator for functional TF binding sites. Our network inference model identified a region with an elevated LR score approximately -9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (-9435 to -8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions.

Conclusion: We report a novel method to systematically integrate multi-dimensional -omics data and reconstruct the gene regulatory networks. This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.

Show MeSH
Related in: MedlinePlus