Limits...
Evolutionary Conserved Motif Finder (ECMFinder) for genome-wide identification of clustered YY1- and CTCF-binding sites.

Kang K, Chung JH, Kim J - Nucleic Acids Res. (2009)

Bottom Line: This program successfully identified many clustered YY1- and CTCF-binding sites that are conserved among these species but were previously undetected.In particular, this program identified CTCF-binding sites that are located close to the Dlk1, Magel2 and Cdkn1c imprinted genes.Individual ChIP experiments confirmed the in vivo binding of the YY1 and CTCF proteins to most of these newly discovered binding sites, demonstrating the feasibility and usefulness of ECMFinder.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.

ABSTRACT
We have developed a new bioinformatics approach called ECMFinder (Evolutionary Conserved Motif Finder). This program searches for a given DNA motif within the entire genome of one species and uses the gene association information of a potential transcription factor-binding site (TFBS) to screen the homologous regions of a second and third species. If multiple species have this potential TFBS in homologous positions, this program recognizes the identified TFBS as an evolutionary conserved motif (ECM). This program outputs a list of ECMs, which can be uploaded as a Custom Track in the UCSC genome browser and can be visualized along with other available data. The feasibility of this approach was tested by searching the genomes of three mammals (human, mouse and cow) with the DNA-binding motifs of YY1 and CTCF. This program successfully identified many clustered YY1- and CTCF-binding sites that are conserved among these species but were previously undetected. In particular, this program identified CTCF-binding sites that are located close to the Dlk1, Magel2 and Cdkn1c imprinted genes. Individual ChIP experiments confirmed the in vivo binding of the YY1 and CTCF proteins to most of these newly discovered binding sites, demonstrating the feasibility and usefulness of ECMFinder.

Show MeSH
Overall scheme of ECMFinder. ECMFinder uses the CGAT (Common Gene Annotation Table) database, which is the product of merging homologous gene annotations derived from HomoloGene (release 62) and the genome sequence of four species—human (hg18), mouse (mm9), cow (bosTau4) and chicken (galGal3). Users can define an input motif based on the ECMFinder syntax, which is briefly described in the Readme file of the program (Step 1). ECMFinder searches a user-defined homologous region (dark blue bar) around a gene's TSS for the motif (or motif cluster) in all species. If at least one motif (or motif cluster) exists in the homologous region of all species, they are identified as ECMs (Evolutionary Conserved Motifs, red oval) (Step 2). The output of ECMFinder is a GFF (General Feature Format) file that can be uploaded to the UCSC genome browser as a Custom Track and visualized along with other data sets (Step 3).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2665242&req=5

Figure 1: Overall scheme of ECMFinder. ECMFinder uses the CGAT (Common Gene Annotation Table) database, which is the product of merging homologous gene annotations derived from HomoloGene (release 62) and the genome sequence of four species—human (hg18), mouse (mm9), cow (bosTau4) and chicken (galGal3). Users can define an input motif based on the ECMFinder syntax, which is briefly described in the Readme file of the program (Step 1). ECMFinder searches a user-defined homologous region (dark blue bar) around a gene's TSS for the motif (or motif cluster) in all species. If at least one motif (or motif cluster) exists in the homologous region of all species, they are identified as ECMs (Evolutionary Conserved Motifs, red oval) (Step 2). The output of ECMFinder is a GFF (General Feature Format) file that can be uploaded to the UCSC genome browser as a Custom Track and visualized along with other data sets (Step 3).

Mentions: ECMFinder incorporates the following principles into its algorithm. First, the motif search of ECMFinder is based on a text pattern-matching method within the Perl programming language. Although PWM approaches provide more quantitative information about the motif search, they are not suitable for many TFs that do not have reliable PWM profiles. This has been one of motivations for the use of text pattern-matching strategies in ECMFinder. Second, the homologous regions used in ECMFinder have been defined as the regions surrounding the TSSs of homologous genes in the HomoloGene database. This database contains a list of homologous gene groups that have been derived using protein alignments of multiple species. In detail, ECMFinder first scans one genomic region surrounding the TSS of a given gene with a user-defined motif. If at least one motif is identified in the region, ECMFinder further searches homologous regions from the other species using the HomoloGene database (Figure 1). If all species have at least one motif in a given homologous region, this motif is regarded as an ECM.Figure 1.


Evolutionary Conserved Motif Finder (ECMFinder) for genome-wide identification of clustered YY1- and CTCF-binding sites.

Kang K, Chung JH, Kim J - Nucleic Acids Res. (2009)

Overall scheme of ECMFinder. ECMFinder uses the CGAT (Common Gene Annotation Table) database, which is the product of merging homologous gene annotations derived from HomoloGene (release 62) and the genome sequence of four species—human (hg18), mouse (mm9), cow (bosTau4) and chicken (galGal3). Users can define an input motif based on the ECMFinder syntax, which is briefly described in the Readme file of the program (Step 1). ECMFinder searches a user-defined homologous region (dark blue bar) around a gene's TSS for the motif (or motif cluster) in all species. If at least one motif (or motif cluster) exists in the homologous region of all species, they are identified as ECMs (Evolutionary Conserved Motifs, red oval) (Step 2). The output of ECMFinder is a GFF (General Feature Format) file that can be uploaded to the UCSC genome browser as a Custom Track and visualized along with other data sets (Step 3).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2665242&req=5

Figure 1: Overall scheme of ECMFinder. ECMFinder uses the CGAT (Common Gene Annotation Table) database, which is the product of merging homologous gene annotations derived from HomoloGene (release 62) and the genome sequence of four species—human (hg18), mouse (mm9), cow (bosTau4) and chicken (galGal3). Users can define an input motif based on the ECMFinder syntax, which is briefly described in the Readme file of the program (Step 1). ECMFinder searches a user-defined homologous region (dark blue bar) around a gene's TSS for the motif (or motif cluster) in all species. If at least one motif (or motif cluster) exists in the homologous region of all species, they are identified as ECMs (Evolutionary Conserved Motifs, red oval) (Step 2). The output of ECMFinder is a GFF (General Feature Format) file that can be uploaded to the UCSC genome browser as a Custom Track and visualized along with other data sets (Step 3).
Mentions: ECMFinder incorporates the following principles into its algorithm. First, the motif search of ECMFinder is based on a text pattern-matching method within the Perl programming language. Although PWM approaches provide more quantitative information about the motif search, they are not suitable for many TFs that do not have reliable PWM profiles. This has been one of motivations for the use of text pattern-matching strategies in ECMFinder. Second, the homologous regions used in ECMFinder have been defined as the regions surrounding the TSSs of homologous genes in the HomoloGene database. This database contains a list of homologous gene groups that have been derived using protein alignments of multiple species. In detail, ECMFinder first scans one genomic region surrounding the TSS of a given gene with a user-defined motif. If at least one motif is identified in the region, ECMFinder further searches homologous regions from the other species using the HomoloGene database (Figure 1). If all species have at least one motif in a given homologous region, this motif is regarded as an ECM.Figure 1.

Bottom Line: This program successfully identified many clustered YY1- and CTCF-binding sites that are conserved among these species but were previously undetected.In particular, this program identified CTCF-binding sites that are located close to the Dlk1, Magel2 and Cdkn1c imprinted genes.Individual ChIP experiments confirmed the in vivo binding of the YY1 and CTCF proteins to most of these newly discovered binding sites, demonstrating the feasibility and usefulness of ECMFinder.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.

ABSTRACT
We have developed a new bioinformatics approach called ECMFinder (Evolutionary Conserved Motif Finder). This program searches for a given DNA motif within the entire genome of one species and uses the gene association information of a potential transcription factor-binding site (TFBS) to screen the homologous regions of a second and third species. If multiple species have this potential TFBS in homologous positions, this program recognizes the identified TFBS as an evolutionary conserved motif (ECM). This program outputs a list of ECMs, which can be uploaded as a Custom Track in the UCSC genome browser and can be visualized along with other available data. The feasibility of this approach was tested by searching the genomes of three mammals (human, mouse and cow) with the DNA-binding motifs of YY1 and CTCF. This program successfully identified many clustered YY1- and CTCF-binding sites that are conserved among these species but were previously undetected. In particular, this program identified CTCF-binding sites that are located close to the Dlk1, Magel2 and Cdkn1c imprinted genes. Individual ChIP experiments confirmed the in vivo binding of the YY1 and CTCF proteins to most of these newly discovered binding sites, demonstrating the feasibility and usefulness of ECMFinder.

Show MeSH