Limits...
WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences.

Pavesi G, Zambelli F, Pesole G - BMC Bioinformatics (2007)

Bottom Line: This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available.Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors.The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dipartimento di Scienze Biomolecolari e Biotecnologie, University of Milan, Milan, Italy. giulio.pavesi@unimi.it

ABSTRACT

Background: This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available.

Results: We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers.

Conclusion: Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

Show MeSH
WeederH identifies the promoter and the annotated enhancer upstream of the human skeletal actin gene. Highest scoring motifs predicted by WeederH in the intergenic region upstream of the ATG codon of the human skeletal actin gene, displayed within the UCSC genome browser. Track "Weeder H motifs" shows the location of the motifs; the track "Weeder-H" 500 shows 500 bps regions in which the average 12-mer χ2 score is higher than 1. The two regions selected, are the promoter and an annotated enhancer located at about 1500 bps upstream of the TSS [47]. Track "Your sequence from Blat Search" shows the location of the original promoter retrieved from the ABS database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1803799&req=5

Figure 8: WeederH identifies the promoter and the annotated enhancer upstream of the human skeletal actin gene. Highest scoring motifs predicted by WeederH in the intergenic region upstream of the ATG codon of the human skeletal actin gene, displayed within the UCSC genome browser. Track "Weeder H motifs" shows the location of the motifs; the track "Weeder-H" 500 shows 500 bps regions in which the average 12-mer χ2 score is higher than 1. The two regions selected, are the promoter and an annotated enhancer located at about 1500 bps upstream of the TSS [47]. Track "Your sequence from Blat Search" shows the location of the original promoter retrieved from the ABS database.

Mentions: Another example is the Actin, skeletal muscle gene (ACTA1, ABS 4 in Additional file 1). In this case, we retrieved for human, mouse, and rat the whole intergenic region (of about 7000 bps) upstream of the gene. In this case, two regions were selected as densely populated of significant motifs (see Figure 8): the core promoter, again, and another region at around -1,500 matching an experimentally validated enhancer [47].


WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences.

Pavesi G, Zambelli F, Pesole G - BMC Bioinformatics (2007)

WeederH identifies the promoter and the annotated enhancer upstream of the human skeletal actin gene. Highest scoring motifs predicted by WeederH in the intergenic region upstream of the ATG codon of the human skeletal actin gene, displayed within the UCSC genome browser. Track "Weeder H motifs" shows the location of the motifs; the track "Weeder-H" 500 shows 500 bps regions in which the average 12-mer χ2 score is higher than 1. The two regions selected, are the promoter and an annotated enhancer located at about 1500 bps upstream of the TSS [47]. Track "Your sequence from Blat Search" shows the location of the original promoter retrieved from the ABS database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1803799&req=5

Figure 8: WeederH identifies the promoter and the annotated enhancer upstream of the human skeletal actin gene. Highest scoring motifs predicted by WeederH in the intergenic region upstream of the ATG codon of the human skeletal actin gene, displayed within the UCSC genome browser. Track "Weeder H motifs" shows the location of the motifs; the track "Weeder-H" 500 shows 500 bps regions in which the average 12-mer χ2 score is higher than 1. The two regions selected, are the promoter and an annotated enhancer located at about 1500 bps upstream of the TSS [47]. Track "Your sequence from Blat Search" shows the location of the original promoter retrieved from the ABS database.
Mentions: Another example is the Actin, skeletal muscle gene (ACTA1, ABS 4 in Additional file 1). In this case, we retrieved for human, mouse, and rat the whole intergenic region (of about 7000 bps) upstream of the gene. In this case, two regions were selected as densely populated of significant motifs (see Figure 8): the core promoter, again, and another region at around -1,500 matching an experimentally validated enhancer [47].

Bottom Line: This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available.Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors.The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dipartimento di Scienze Biomolecolari e Biotecnologie, University of Milan, Milan, Italy. giulio.pavesi@unimi.it

ABSTRACT

Background: This work addresses the problem of detecting conserved transcription factor binding sites and in general regulatory regions through the analysis of sequences from homologous genes, an approach that is becoming more and more widely used given the ever increasing amount of genomic data available.

Results: We present an algorithm that identifies conserved transcription factor binding sites in a given sequence by comparing it to one or more homologs, adapting a framework we previously introduced for the discovery of sites in sequences from co-regulated genes. Differently from the most commonly used methods, the approach we present does not need or compute an alignment of the sequences investigated, nor resorts to descriptors of the binding specificity of known transcription factors. The main novel idea we introduce is a relative measure of conservation, assuming that true functional elements should present a higher level of conservation with respect to the rest of the sequence surrounding them. We present tests where we applied the algorithm to the identification of conserved annotated sites in homologous promoters, as well as in distal regions like enhancers.

Conclusion: Results of the tests show how the algorithm can provide fast and reliable predictions of conserved transcription factor binding sites regulating the transcription of a gene, with better performances than other available methods for the same task. We also show examples on how the algorithm can be successfully employed when promoter annotations of the genes investigated are missing, or when regulatory sites and regions are located far away from the genes.

Show MeSH