Limits...
WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences.

Pavesi G, Zambelli F, Pesole G - BMC Bioinformatics (2007)

Bottom Line: This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available.Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors.The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dipartimento di Scienze Biomolecolari e Biotecnologie, University of Milan, Milan, Italy. giulio.pavesi@unimi.it

ABSTRACT

Background: This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available.

Results: We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers.

Conclusion: Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

Show MeSH
WeederH identifies the promoter and the three annotated enhancers upstream of the mouse actin, alpha cardiac gene. Highest scoring motifs predicted by WeederH in the 10,000 bps region upstream of the ATG codon of the mouse actin alpha cardiac gene, displayed within the UCSC genome browser. Track "Weeder H motifs" shows the location of the motifs; the track "Weeder-H" 500 shows 500 bps regions in which the average 12-mer χ2 score is greater than 1. Track "Your sequence from Blat Search" shows the location of the original promoter retrieved from the ABS database. The three regions, other than the just upstream of the TSS (the promoter), match three experimentally known enhancers of the gene.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1803799&req=5

Figure 7: WeederH identifies the promoter and the three annotated enhancers upstream of the mouse actin, alpha cardiac gene. Highest scoring motifs predicted by WeederH in the 10,000 bps region upstream of the ATG codon of the mouse actin alpha cardiac gene, displayed within the UCSC genome browser. Track "Weeder H motifs" shows the location of the motifs; the track "Weeder-H" 500 shows 500 bps regions in which the average 12-mer χ2 score is greater than 1. Track "Your sequence from Blat Search" shows the location of the original promoter retrieved from the ABS database. The three regions, other than the just upstream of the TSS (the promoter), match three experimentally known enhancers of the gene.

Mentions: The Actin cardiac alpha chain gene (ACTC) has the ATG codon located within the second exon, with a fully non-coding first exon. WeederH successfully identified all the 7 sites contained in the 500 bp promoter region retrieved from the ABS database (ABS 3 in Additional file 1). We repeated the experiment, but this time retrieving the 10,000 base pairs upstream of the ATG codon of the mouse and human genes. The results are shown in Figure 7, displayed within a UCSC genome browser window. The topmost track (WeederH motifs) shows the location of the highest scoring motifs. It can be seen that they are clustered around the TSS of the gene, falling within the 500 bp promoter of the ABS database (indicated by the "Your sequence from BLAT search" track). Motifs shown in this area cover all the ABS annotated sites. Also interestingly enough, other clusters of motifs are visible, namely at around -2000, -6000, and -8000 from the TSS. As a matter of fact, three distal enhancers are annotated for the ACTC gene, driving developmental and cardiac-muscle specific expression of the gene [46].


WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences.

Pavesi G, Zambelli F, Pesole G - BMC Bioinformatics (2007)

WeederH identifies the promoter and the three annotated enhancers upstream of the mouse actin, alpha cardiac gene. Highest scoring motifs predicted by WeederH in the 10,000 bps region upstream of the ATG codon of the mouse actin alpha cardiac gene, displayed within the UCSC genome browser. Track "Weeder H motifs" shows the location of the motifs; the track "Weeder-H" 500 shows 500 bps regions in which the average 12-mer χ2 score is greater than 1. Track "Your sequence from Blat Search" shows the location of the original promoter retrieved from the ABS database. The three regions, other than the just upstream of the TSS (the promoter), match three experimentally known enhancers of the gene.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1803799&req=5

Figure 7: WeederH identifies the promoter and the three annotated enhancers upstream of the mouse actin, alpha cardiac gene. Highest scoring motifs predicted by WeederH in the 10,000 bps region upstream of the ATG codon of the mouse actin alpha cardiac gene, displayed within the UCSC genome browser. Track "Weeder H motifs" shows the location of the motifs; the track "Weeder-H" 500 shows 500 bps regions in which the average 12-mer χ2 score is greater than 1. Track "Your sequence from Blat Search" shows the location of the original promoter retrieved from the ABS database. The three regions, other than the just upstream of the TSS (the promoter), match three experimentally known enhancers of the gene.
Mentions: The Actin cardiac alpha chain gene (ACTC) has the ATG codon located within the second exon, with a fully non-coding first exon. WeederH successfully identified all the 7 sites contained in the 500 bp promoter region retrieved from the ABS database (ABS 3 in Additional file 1). We repeated the experiment, but this time retrieving the 10,000 base pairs upstream of the ATG codon of the mouse and human genes. The results are shown in Figure 7, displayed within a UCSC genome browser window. The topmost track (WeederH motifs) shows the location of the highest scoring motifs. It can be seen that they are clustered around the TSS of the gene, falling within the 500 bp promoter of the ABS database (indicated by the "Your sequence from BLAT search" track). Motifs shown in this area cover all the ABS annotated sites. Also interestingly enough, other clusters of motifs are visible, namely at around -2000, -6000, and -8000 from the TSS. As a matter of fact, three distal enhancers are annotated for the ACTC gene, driving developmental and cardiac-muscle specific expression of the gene [46].

Bottom Line: This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available.Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors.The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dipartimento di Scienze Biomolecolari e Biotecnologie, University of Milan, Milan, Italy. giulio.pavesi@unimi.it

ABSTRACT

Background: This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available.

Results: We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers.

Conclusion: Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

Show MeSH