Limits...
Computational prediction of transcription-factor binding site locations.

Bulyk ML - Genome Biol. (2003)

Bottom Line: Identifying genomic locations of transcription-factor binding sites, particularly in higher eukaryotic genomes, has been an enormous challenge.Various experimental and computational approaches have been used to detect these sites; methods involving computational comparisons of related genomes have been particularly successful.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, New Research Building, 77 Avenue Louis Pasteur, Boston, MA 02115, USA. mlbulyk@rascal.med.harvard.edu

ABSTRACT
Identifying genomic locations of transcription-factor binding sites, particularly in higher eukaryotic genomes, has been an enormous challenge. Various experimental and computational approaches have been used to detect these sites; methods involving computational comparisons of related genomes have been particularly successful.

Show MeSH
Representation of transcription-factor binding sites. (a) An example of six sequences and the consensus sequence that can be derived from them. The consensus simply gives the nucleotide that is found most often in each position; the alternate (or degenerate) consensus sequence gives the possible nucleotides in each position; R represents A or G; N represents any nucleotide. (b) A position weight matrix for the -10 region of E. coli promoters, as an example of a well-studied regulatory element. The boxed elements correspond to the consensus sequence (TATAAT). The score for each nucleotide at each position is derived from the observed frequency of that nucleotide at the corresponding position in the input set of promoters. The score for any particular site is the sum of the individual matrix values for that site's sequence; for example, the score for TATAAT is 85. Note that the matrix values in (b) do not come from the example shown in (a) but rather are derived from a much larger collection of -10 promoter regions. Adapted, with permission, from [3].
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC395725&req=5

Figure 1: Representation of transcription-factor binding sites. (a) An example of six sequences and the consensus sequence that can be derived from them. The consensus simply gives the nucleotide that is found most often in each position; the alternate (or degenerate) consensus sequence gives the possible nucleotides in each position; R represents A or G; N represents any nucleotide. (b) A position weight matrix for the -10 region of E. coli promoters, as an example of a well-studied regulatory element. The boxed elements correspond to the consensus sequence (TATAAT). The score for each nucleotide at each position is derived from the observed frequency of that nucleotide at the corresponding position in the input set of promoters. The score for any particular site is the sum of the individual matrix values for that site's sequence; for example, the score for TATAAT is 85. Note that the matrix values in (b) do not come from the example shown in (a) but rather are derived from a much larger collection of -10 promoter regions. Adapted, with permission, from [3].

Mentions: The binding specificities of only a small number of TFs are well characterized. Transcription-factor binding sites (TFBSs) are usually short (around 5-15 base-pairs (bp)) and they are frequently degenerate sequence motifs (Figure 1a); potential binding sites thus can occur very frequently in larger genomes such as the human genome. The sequence degeneracy of TFBSs has been selected through evolution and is beneficial, because it confers different levels of activity upon different promoters, thus causing some genes to be transcribed at higher levels than others, as may be required by the cell [3]. The function of TFBSs is often independent of their orientation. In yeast, their position within a promoter can vary, and in higher eukaryotes they can occur upstream, downstream, or in the introns of the genes that they regulate; in addition, they can be close to or far away from regulated gene(s). Moreover, the human genome is about 200 times larger than yeast genome, and approximately 95-99% of it does not encode proteins. For all these reasons, it can be very difficult to find TFBSs in noncoding sequences using relatively simple sequence-searching tools like BLASTN or CLUSTALW [4].


Computational prediction of transcription-factor binding site locations.

Bulyk ML - Genome Biol. (2003)

Representation of transcription-factor binding sites. (a) An example of six sequences and the consensus sequence that can be derived from them. The consensus simply gives the nucleotide that is found most often in each position; the alternate (or degenerate) consensus sequence gives the possible nucleotides in each position; R represents A or G; N represents any nucleotide. (b) A position weight matrix for the -10 region of E. coli promoters, as an example of a well-studied regulatory element. The boxed elements correspond to the consensus sequence (TATAAT). The score for each nucleotide at each position is derived from the observed frequency of that nucleotide at the corresponding position in the input set of promoters. The score for any particular site is the sum of the individual matrix values for that site's sequence; for example, the score for TATAAT is 85. Note that the matrix values in (b) do not come from the example shown in (a) but rather are derived from a much larger collection of -10 promoter regions. Adapted, with permission, from [3].
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC395725&req=5

Figure 1: Representation of transcription-factor binding sites. (a) An example of six sequences and the consensus sequence that can be derived from them. The consensus simply gives the nucleotide that is found most often in each position; the alternate (or degenerate) consensus sequence gives the possible nucleotides in each position; R represents A or G; N represents any nucleotide. (b) A position weight matrix for the -10 region of E. coli promoters, as an example of a well-studied regulatory element. The boxed elements correspond to the consensus sequence (TATAAT). The score for each nucleotide at each position is derived from the observed frequency of that nucleotide at the corresponding position in the input set of promoters. The score for any particular site is the sum of the individual matrix values for that site's sequence; for example, the score for TATAAT is 85. Note that the matrix values in (b) do not come from the example shown in (a) but rather are derived from a much larger collection of -10 promoter regions. Adapted, with permission, from [3].
Mentions: The binding specificities of only a small number of TFs are well characterized. Transcription-factor binding sites (TFBSs) are usually short (around 5-15 base-pairs (bp)) and they are frequently degenerate sequence motifs (Figure 1a); potential binding sites thus can occur very frequently in larger genomes such as the human genome. The sequence degeneracy of TFBSs has been selected through evolution and is beneficial, because it confers different levels of activity upon different promoters, thus causing some genes to be transcribed at higher levels than others, as may be required by the cell [3]. The function of TFBSs is often independent of their orientation. In yeast, their position within a promoter can vary, and in higher eukaryotes they can occur upstream, downstream, or in the introns of the genes that they regulate; in addition, they can be close to or far away from regulated gene(s). Moreover, the human genome is about 200 times larger than yeast genome, and approximately 95-99% of it does not encode proteins. For all these reasons, it can be very difficult to find TFBSs in noncoding sequences using relatively simple sequence-searching tools like BLASTN or CLUSTALW [4].

Bottom Line: Identifying genomic locations of transcription-factor binding sites, particularly in higher eukaryotic genomes, has been an enormous challenge.Various experimental and computational approaches have been used to detect these sites; methods involving computational comparisons of related genomes have been particularly successful.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, New Research Building, 77 Avenue Louis Pasteur, Boston, MA 02115, USA. mlbulyk@rascal.med.harvard.edu

ABSTRACT
Identifying genomic locations of transcription-factor binding sites, particularly in higher eukaryotic genomes, has been an enormous challenge. Various experimental and computational approaches have been used to detect these sites; methods involving computational comparisons of related genomes have been particularly successful.

Show MeSH