Limits...
Statistical modeling of transcription factor binding affinities predicts regulatory interactions.

Manke T, Roider HG, Vingron M - PLoS Comput. Biol. (2008)

Bottom Line: We demonstrate that the affinity distribution of almost all known transcription factors can be effectively parametrized by a class of generalized extreme value distributions.The combination of physical model and statistical normalization provides a quantitative measure which ranks transcription factors for a given sequence, and which can be compared directly with large-scale binding data.Its successful application to human promoter sequences serves as an encouraging example of how the method can be applied to other sequences.

View Article: PubMed Central - PubMed

Affiliation: Max Planck Institute for Molecular Genetics, Berlin, Germany. manke@molgen.mpg.de

ABSTRACT
Recent experimental and theoretical efforts have highlighted the fact that binding of transcription factors to DNA can be more accurately described by continuous measures of their binding affinities, rather than a discrete description in terms of binding sites. While the binding affinities can be predicted from a physical model, it is often desirable to know the distribution of binding affinities for specific sequence backgrounds. In this paper, we present a statistical approach to derive the exact distribution for sequence models with fixed GC content. We demonstrate that the affinity distribution of almost all known transcription factors can be effectively parametrized by a class of generalized extreme value distributions. Moreover, this parameterization also describes the affinity distribution for sequence backgrounds with variable GC content, such as human promoter sequences. Our approach is applicable to arbitrary sequences and all transcription factors with known binding preferences that can be described in terms of a motif matrix. The statistical treatment also provides a proper framework to directly compare transcription factors with very different affinity distributions. This is illustrated by our analysis of human promoters with known binding sites, for many of which we could identify the known regulators as those with the highest affinity. The combination of physical model and statistical normalization provides a quantitative measure which ranks transcription factors for a given sequence, and which can be compared directly with large-scale binding data. Its successful application to human promoter sequences serves as an encouraging example of how the method can be applied to other sequences.

Show MeSH

Related in: MedlinePlus

TRAP approach.The left-hand side illustrates how a given motif matrix (W = 5) is scanned against a longer DNA sequence region of length L. At each position the binding energy for the adjacent site is calculated as in Eq. (3), which assumes independence of all positions within a site. The binding energy is converted into a local affinity using Eq. (1) and the parametrization from [6]. This results in the schematized red curve of position-dependent binding affinities. Two selected sites are shown as red boxes, because they correspond to relatively high affinities in this toy example. In our framework we do not annotate them as “hits,” but rather sum the different contributions from all possible positions (and strand orientations) – see Eq. (4). This gives rise to a total affinity of the sequence region with length L. Initially this approach was developed to rationalize the ChIP-chip data, where L corresponds to the experimental fragment length [6]. On the other hand, the summation in Eq. (4) also amounts to a smoothing of the noisy binding signal over larger sequence regions. This is shown on the right-hand side, where the affinity of transcription factor SRF (W = 15) is calculated around its own promoter region. Here the red line denotes the local affinities which fluctuate strongly, and the black curve denotes the combined affinities for longer regions of length L = 500.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2266803&req=5

pcbi-1000039-g001: TRAP approach.The left-hand side illustrates how a given motif matrix (W = 5) is scanned against a longer DNA sequence region of length L. At each position the binding energy for the adjacent site is calculated as in Eq. (3), which assumes independence of all positions within a site. The binding energy is converted into a local affinity using Eq. (1) and the parametrization from [6]. This results in the schematized red curve of position-dependent binding affinities. Two selected sites are shown as red boxes, because they correspond to relatively high affinities in this toy example. In our framework we do not annotate them as “hits,” but rather sum the different contributions from all possible positions (and strand orientations) – see Eq. (4). This gives rise to a total affinity of the sequence region with length L. Initially this approach was developed to rationalize the ChIP-chip data, where L corresponds to the experimental fragment length [6]. On the other hand, the summation in Eq. (4) also amounts to a smoothing of the noisy binding signal over larger sequence regions. This is shown on the right-hand side, where the affinity of transcription factor SRF (W = 15) is calculated around its own promoter region. Here the red line denotes the local affinities which fluctuate strongly, and the black curve denotes the combined affinities for longer regions of length L = 500.

Mentions: In our earlier work [6] we were primarily interested in comparing different promoter sequences with respect to their binding affinities for a fixed transcription factor. This has been successfully applied to account for much of the observed variation of binding strength in ChIP-chip experiments. Here we will briefly review the TRAP model and its biophysical background. The key concepts are also illustrated in Figure 1. First consider many copies of some DNA site, Sl, which extends from sequence position l to l+W−1. In the following we assume that the fraction of such sites, which are bound to a given transcription factor, T, can be calculated using an equilibrium approach. We call this fraction the local affinity, al,(1)


Statistical modeling of transcription factor binding affinities predicts regulatory interactions.

Manke T, Roider HG, Vingron M - PLoS Comput. Biol. (2008)

TRAP approach.The left-hand side illustrates how a given motif matrix (W = 5) is scanned against a longer DNA sequence region of length L. At each position the binding energy for the adjacent site is calculated as in Eq. (3), which assumes independence of all positions within a site. The binding energy is converted into a local affinity using Eq. (1) and the parametrization from [6]. This results in the schematized red curve of position-dependent binding affinities. Two selected sites are shown as red boxes, because they correspond to relatively high affinities in this toy example. In our framework we do not annotate them as “hits,” but rather sum the different contributions from all possible positions (and strand orientations) – see Eq. (4). This gives rise to a total affinity of the sequence region with length L. Initially this approach was developed to rationalize the ChIP-chip data, where L corresponds to the experimental fragment length [6]. On the other hand, the summation in Eq. (4) also amounts to a smoothing of the noisy binding signal over larger sequence regions. This is shown on the right-hand side, where the affinity of transcription factor SRF (W = 15) is calculated around its own promoter region. Here the red line denotes the local affinities which fluctuate strongly, and the black curve denotes the combined affinities for longer regions of length L = 500.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2266803&req=5

pcbi-1000039-g001: TRAP approach.The left-hand side illustrates how a given motif matrix (W = 5) is scanned against a longer DNA sequence region of length L. At each position the binding energy for the adjacent site is calculated as in Eq. (3), which assumes independence of all positions within a site. The binding energy is converted into a local affinity using Eq. (1) and the parametrization from [6]. This results in the schematized red curve of position-dependent binding affinities. Two selected sites are shown as red boxes, because they correspond to relatively high affinities in this toy example. In our framework we do not annotate them as “hits,” but rather sum the different contributions from all possible positions (and strand orientations) – see Eq. (4). This gives rise to a total affinity of the sequence region with length L. Initially this approach was developed to rationalize the ChIP-chip data, where L corresponds to the experimental fragment length [6]. On the other hand, the summation in Eq. (4) also amounts to a smoothing of the noisy binding signal over larger sequence regions. This is shown on the right-hand side, where the affinity of transcription factor SRF (W = 15) is calculated around its own promoter region. Here the red line denotes the local affinities which fluctuate strongly, and the black curve denotes the combined affinities for longer regions of length L = 500.
Mentions: In our earlier work [6] we were primarily interested in comparing different promoter sequences with respect to their binding affinities for a fixed transcription factor. This has been successfully applied to account for much of the observed variation of binding strength in ChIP-chip experiments. Here we will briefly review the TRAP model and its biophysical background. The key concepts are also illustrated in Figure 1. First consider many copies of some DNA site, Sl, which extends from sequence position l to l+W−1. In the following we assume that the fraction of such sites, which are bound to a given transcription factor, T, can be calculated using an equilibrium approach. We call this fraction the local affinity, al,(1)

Bottom Line: We demonstrate that the affinity distribution of almost all known transcription factors can be effectively parametrized by a class of generalized extreme value distributions.The combination of physical model and statistical normalization provides a quantitative measure which ranks transcription factors for a given sequence, and which can be compared directly with large-scale binding data.Its successful application to human promoter sequences serves as an encouraging example of how the method can be applied to other sequences.

View Article: PubMed Central - PubMed

Affiliation: Max Planck Institute for Molecular Genetics, Berlin, Germany. manke@molgen.mpg.de

ABSTRACT
Recent experimental and theoretical efforts have highlighted the fact that binding of transcription factors to DNA can be more accurately described by continuous measures of their binding affinities, rather than a discrete description in terms of binding sites. While the binding affinities can be predicted from a physical model, it is often desirable to know the distribution of binding affinities for specific sequence backgrounds. In this paper, we present a statistical approach to derive the exact distribution for sequence models with fixed GC content. We demonstrate that the affinity distribution of almost all known transcription factors can be effectively parametrized by a class of generalized extreme value distributions. Moreover, this parameterization also describes the affinity distribution for sequence backgrounds with variable GC content, such as human promoter sequences. Our approach is applicable to arbitrary sequences and all transcription factors with known binding preferences that can be described in terms of a motif matrix. The statistical treatment also provides a proper framework to directly compare transcription factors with very different affinity distributions. This is illustrated by our analysis of human promoters with known binding sites, for many of which we could identify the known regulators as those with the highest affinity. The combination of physical model and statistical normalization provides a quantitative measure which ranks transcription factors for a given sequence, and which can be compared directly with large-scale binding data. Its successful application to human promoter sequences serves as an encouraging example of how the method can be applied to other sequences.

Show MeSH
Related in: MedlinePlus