Limits...
TherMos: Estimating protein-DNA binding energies from in vivo binding profiles.

Sun W, Hu X, Lim MH, Ng CK, Choo SH, Castro DS, Drechsel D, Guillemot F, Kolatkar PR, Jauch R, Prabhakar S - Nucleic Acids Res. (2013)

Bottom Line: We experimentally validated TherMos binding energy models for Klf4 and Esrrb, using a novel protocol to measure PSEMs in vitro.Strikingly, our measurements revealed strong non-additivity at multiple positions within the two PSEMs. Among the algorithms tested, only TherMos was able to model the entire binding energy landscape of Klf4 and Esrrb.Our study reveals new insights into the energetics of TF-DNA binding in vivo and provides an accurate first-principles approach to binding energy inference from ChIP-seq and ChIP-exo data.

View Article: PubMed Central - PubMed

Affiliation: Computational and Systems Biology, Genome Institute of Singapore, 60 Biopolis St, Singapore 138672, Singapore.

ABSTRACT
Accurately characterizing transcription factor (TF)-DNA affinity is a central goal of regulatory genomics. Although thermodynamics provides the most natural language for describing the continuous range of TF-DNA affinity, traditional motif discovery algorithms focus instead on classification paradigms that aim to discriminate 'bound' and 'unbound' sequences. Moreover, these algorithms do not directly model the distribution of tags in ChIP-seq data. Here, we present a new algorithm named Thermodynamic Modeling of ChIP-seq (TherMos), which directly estimates a position-specific binding energy matrix (PSEM) from ChIP-seq/exo tag profiles. In cross-validation tests on seven genome-wide TF-DNA binding profiles, one of which we generated via ChIP-seq on a complex developing tissue, TherMos predicted quantitative TF-DNA binding with greater accuracy than five well-known algorithms. We experimentally validated TherMos binding energy models for Klf4 and Esrrb, using a novel protocol to measure PSEMs in vitro. Strikingly, our measurements revealed strong non-additivity at multiple positions within the two PSEMs. Among the algorithms tested, only TherMos was able to model the entire binding energy landscape of Klf4 and Esrrb. Our study reveals new insights into the energetics of TF-DNA binding in vivo and provides an accurate first-principles approach to binding energy inference from ChIP-seq and ChIP-exo data.

Show MeSH

Related in: MedlinePlus

(A) Mash1 in vivo ChIP-seq profile (E12.5 mouse spinal cord) shows strong peaks at known targets of Mash1. (B, C) Performance of TherMos and other algorithms in 10-fold cross-validation testing on the seven whole-genome TF binding profiles. For each algorithm and each TF, the bar height indicates the average SPE or rank correlation coefficient across the 10 test sets. The summary bars at the end indicate average performance across all seven TFs. (B) SPE is calculated between predicted (motif) and observed (experimental data) ChIP-seq binding profile. Smaller SPE indicates higher accuracy. (C) Rank correlation coefficient is calculated between predicted (motif) and observed (experimental data) ChIP-seq tag counts. Average rank correlation coefficients below zero for some of the algorithms are not shown.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3675472&req=5

gkt250-F2: (A) Mash1 in vivo ChIP-seq profile (E12.5 mouse spinal cord) shows strong peaks at known targets of Mash1. (B, C) Performance of TherMos and other algorithms in 10-fold cross-validation testing on the seven whole-genome TF binding profiles. For each algorithm and each TF, the bar height indicates the average SPE or rank correlation coefficient across the 10 test sets. The summary bars at the end indicate average performance across all seven TFs. (B) SPE is calculated between predicted (motif) and observed (experimental data) ChIP-seq binding profile. Smaller SPE indicates higher accuracy. (C) Rank correlation coefficient is calculated between predicted (motif) and observed (experimental data) ChIP-seq tag counts. Average rank correlation coefficients below zero for some of the algorithms are not shown.

Mentions: We used TherMos to derive PSEMs for six TFs spanning a broad range of DNA-binding domains, based on ChIP-seq data from mES cells (Esrrb, Klf4, Stat3, Zfx and n-Myc) (14) and ChIP-exo data from S. cerevisiae (Reb1) (13). In addition, to evaluate TherMos on data from a heterogeneous tissue, we generated and analyzed Mash1 ChIP-seq data from mouse spinal cord at embryonic day 12.5 (‘Materials and Methods’ section). Only the dorsal region of the spinal cord was analyzed, as Mash1 expression is restricted to the dorsal domain at this time point (26). As seen in Figure 2A, the Mash1 ChIP-seq profile showed strong peaks at Fbxw7 and Dll1, two known targets of the TF (18).Figure 2.


TherMos: Estimating protein-DNA binding energies from in vivo binding profiles.

Sun W, Hu X, Lim MH, Ng CK, Choo SH, Castro DS, Drechsel D, Guillemot F, Kolatkar PR, Jauch R, Prabhakar S - Nucleic Acids Res. (2013)

(A) Mash1 in vivo ChIP-seq profile (E12.5 mouse spinal cord) shows strong peaks at known targets of Mash1. (B, C) Performance of TherMos and other algorithms in 10-fold cross-validation testing on the seven whole-genome TF binding profiles. For each algorithm and each TF, the bar height indicates the average SPE or rank correlation coefficient across the 10 test sets. The summary bars at the end indicate average performance across all seven TFs. (B) SPE is calculated between predicted (motif) and observed (experimental data) ChIP-seq binding profile. Smaller SPE indicates higher accuracy. (C) Rank correlation coefficient is calculated between predicted (motif) and observed (experimental data) ChIP-seq tag counts. Average rank correlation coefficients below zero for some of the algorithms are not shown.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3675472&req=5

gkt250-F2: (A) Mash1 in vivo ChIP-seq profile (E12.5 mouse spinal cord) shows strong peaks at known targets of Mash1. (B, C) Performance of TherMos and other algorithms in 10-fold cross-validation testing on the seven whole-genome TF binding profiles. For each algorithm and each TF, the bar height indicates the average SPE or rank correlation coefficient across the 10 test sets. The summary bars at the end indicate average performance across all seven TFs. (B) SPE is calculated between predicted (motif) and observed (experimental data) ChIP-seq binding profile. Smaller SPE indicates higher accuracy. (C) Rank correlation coefficient is calculated between predicted (motif) and observed (experimental data) ChIP-seq tag counts. Average rank correlation coefficients below zero for some of the algorithms are not shown.
Mentions: We used TherMos to derive PSEMs for six TFs spanning a broad range of DNA-binding domains, based on ChIP-seq data from mES cells (Esrrb, Klf4, Stat3, Zfx and n-Myc) (14) and ChIP-exo data from S. cerevisiae (Reb1) (13). In addition, to evaluate TherMos on data from a heterogeneous tissue, we generated and analyzed Mash1 ChIP-seq data from mouse spinal cord at embryonic day 12.5 (‘Materials and Methods’ section). Only the dorsal region of the spinal cord was analyzed, as Mash1 expression is restricted to the dorsal domain at this time point (26). As seen in Figure 2A, the Mash1 ChIP-seq profile showed strong peaks at Fbxw7 and Dll1, two known targets of the TF (18).Figure 2.

Bottom Line: We experimentally validated TherMos binding energy models for Klf4 and Esrrb, using a novel protocol to measure PSEMs in vitro.Strikingly, our measurements revealed strong non-additivity at multiple positions within the two PSEMs. Among the algorithms tested, only TherMos was able to model the entire binding energy landscape of Klf4 and Esrrb.Our study reveals new insights into the energetics of TF-DNA binding in vivo and provides an accurate first-principles approach to binding energy inference from ChIP-seq and ChIP-exo data.

View Article: PubMed Central - PubMed

Affiliation: Computational and Systems Biology, Genome Institute of Singapore, 60 Biopolis St, Singapore 138672, Singapore.

ABSTRACT
Accurately characterizing transcription factor (TF)-DNA affinity is a central goal of regulatory genomics. Although thermodynamics provides the most natural language for describing the continuous range of TF-DNA affinity, traditional motif discovery algorithms focus instead on classification paradigms that aim to discriminate 'bound' and 'unbound' sequences. Moreover, these algorithms do not directly model the distribution of tags in ChIP-seq data. Here, we present a new algorithm named Thermodynamic Modeling of ChIP-seq (TherMos), which directly estimates a position-specific binding energy matrix (PSEM) from ChIP-seq/exo tag profiles. In cross-validation tests on seven genome-wide TF-DNA binding profiles, one of which we generated via ChIP-seq on a complex developing tissue, TherMos predicted quantitative TF-DNA binding with greater accuracy than five well-known algorithms. We experimentally validated TherMos binding energy models for Klf4 and Esrrb, using a novel protocol to measure PSEMs in vitro. Strikingly, our measurements revealed strong non-additivity at multiple positions within the two PSEMs. Among the algorithms tested, only TherMos was able to model the entire binding energy landscape of Klf4 and Esrrb. Our study reveals new insights into the energetics of TF-DNA binding in vivo and provides an accurate first-principles approach to binding energy inference from ChIP-seq and ChIP-exo data.

Show MeSH
Related in: MedlinePlus