Limits...
Coding limits on the number of transcription factors.

Itzkovitz S, Tlusty T, Alon U - BMC Genomics (2006)

Bottom Line: For example, the number of winged helix factors does not generally exceed 300, even in very large genomes.The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family.This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dept Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel. shalev.itzkovitz@weizmann.ac.il

ABSTRACT

Background: Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms.

Results: We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of transcription factors should exist, in order to minimize cross-binding errors between transcription factors. This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal. We present evidence that transcription factors with similar binding sequences tend to regulate genes with similar biological functions, supporting this prediction.

Conclusion: The present study suggests limits on the transcription factor repertoire of cells, and suggests coding constraints that might apply more generally to the mapping between binding sites and biological function.

Show MeSH
Conceptual coding schemes for the assignment of binding sequences to TFs. Binding sequences are displayed as points, TFs as colored spheres. Colors correspond to the biological function of each TF. a) A sphere-packing code – code-words are covered by non-overlapping spheres. The TFs do not share binding sequences. b) A smooth code – code-words are covered by overlapping spheres with similar function. TFs can share binding sequences with neighbor TFs. This type of code is predicted to be smooth, that is where TFs with shared binding sequences tend to have similar biological function, represented by spheres of similar color in the figure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1590034&req=5

Figure 3: Conceptual coding schemes for the assignment of binding sequences to TFs. Binding sequences are displayed as points, TFs as colored spheres. Colors correspond to the biological function of each TF. a) A sphere-packing code – code-words are covered by non-overlapping spheres. The TFs do not share binding sequences. b) A smooth code – code-words are covered by overlapping spheres with similar function. TFs can share binding sequences with neighbor TFs. This type of code is predicted to be smooth, that is where TFs with shared binding sequences tend to have similar biological function, represented by spheres of similar color in the figure.

Mentions: In the theoretical case of perfectly non-overlapping sequences, in which each sequence is assigned to only one TF, Sengupta et al [6] have noted that the coding problem is similar to the problem of packing non-overlapping spheres in the space of sequences, each sphere corresponding to the sequences belonging to one TF (Fig 3a). Let us make a simple estimate of the number of TFs according to this picture. As an example, suppose that a TF can on average recognize sequences that are one letter different from the consensus (Hamming distance of one away from the consensus sequence). For winged helix proteins, there are six positions in the sequence, each of which can be changed to one of 3 other letters, resulting in 6 × 3 = 18 neighbours that are a Hamming distance of one away. Thus, there can be at most 2048/19~100 distinct TFs with non-overlapping sequences [27]. This is on the same order of magnitude, although lower than the observed maximal number of about 300 (Table 1).


Coding limits on the number of transcription factors.

Itzkovitz S, Tlusty T, Alon U - BMC Genomics (2006)

Conceptual coding schemes for the assignment of binding sequences to TFs. Binding sequences are displayed as points, TFs as colored spheres. Colors correspond to the biological function of each TF. a) A sphere-packing code – code-words are covered by non-overlapping spheres. The TFs do not share binding sequences. b) A smooth code – code-words are covered by overlapping spheres with similar function. TFs can share binding sequences with neighbor TFs. This type of code is predicted to be smooth, that is where TFs with shared binding sequences tend to have similar biological function, represented by spheres of similar color in the figure.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1590034&req=5

Figure 3: Conceptual coding schemes for the assignment of binding sequences to TFs. Binding sequences are displayed as points, TFs as colored spheres. Colors correspond to the biological function of each TF. a) A sphere-packing code – code-words are covered by non-overlapping spheres. The TFs do not share binding sequences. b) A smooth code – code-words are covered by overlapping spheres with similar function. TFs can share binding sequences with neighbor TFs. This type of code is predicted to be smooth, that is where TFs with shared binding sequences tend to have similar biological function, represented by spheres of similar color in the figure.
Mentions: In the theoretical case of perfectly non-overlapping sequences, in which each sequence is assigned to only one TF, Sengupta et al [6] have noted that the coding problem is similar to the problem of packing non-overlapping spheres in the space of sequences, each sphere corresponding to the sequences belonging to one TF (Fig 3a). Let us make a simple estimate of the number of TFs according to this picture. As an example, suppose that a TF can on average recognize sequences that are one letter different from the consensus (Hamming distance of one away from the consensus sequence). For winged helix proteins, there are six positions in the sequence, each of which can be changed to one of 3 other letters, resulting in 6 × 3 = 18 neighbours that are a Hamming distance of one away. Thus, there can be at most 2048/19~100 distinct TFs with non-overlapping sequences [27]. This is on the same order of magnitude, although lower than the observed maximal number of about 300 (Table 1).

Bottom Line: For example, the number of winged helix factors does not generally exceed 300, even in very large genomes.The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family.This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dept Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel. shalev.itzkovitz@weizmann.ac.il

ABSTRACT

Background: Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms.

Results: We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of transcription factors should exist, in order to minimize cross-binding errors between transcription factors. This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal. We present evidence that transcription factors with similar binding sequences tend to regulate genes with similar biological functions, supporting this prediction.

Conclusion: The present study suggests limits on the transcription factor repertoire of cells, and suggests coding constraints that might apply more generally to the mapping between binding sites and biological function.

Show MeSH