Limits...
Evolutionary Conserved Motif Finder (ECMFinder) for genome-wide identification of clustered YY1- and CTCF-binding sites.

Kang K, Chung JH, Kim J - Nucleic Acids Res. (2009)

Bottom Line: This program successfully identified many clustered YY1- and CTCF-binding sites that are conserved among these species but were previously undetected.In particular, this program identified CTCF-binding sites that are located close to the Dlk1, Magel2 and Cdkn1c imprinted genes.Individual ChIP experiments confirmed the in vivo binding of the YY1 and CTCF proteins to most of these newly discovered binding sites, demonstrating the feasibility and usefulness of ECMFinder.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.

ABSTRACT
We have developed a new bioinformatics approach called ECMFinder (Evolutionary Conserved Motif Finder). This program searches for a given DNA motif within the entire genome of one species and uses the gene association information of a potential transcription factor-binding site (TFBS) to screen the homologous regions of a second and third species. If multiple species have this potential TFBS in homologous positions, this program recognizes the identified TFBS as an evolutionary conserved motif (ECM). This program outputs a list of ECMs, which can be uploaded as a Custom Track in the UCSC genome browser and can be visualized along with other available data. The feasibility of this approach was tested by searching the genomes of three mammals (human, mouse and cow) with the DNA-binding motifs of YY1 and CTCF. This program successfully identified many clustered YY1- and CTCF-binding sites that are conserved among these species but were previously undetected. In particular, this program identified CTCF-binding sites that are located close to the Dlk1, Magel2 and Cdkn1c imprinted genes. Individual ChIP experiments confirmed the in vivo binding of the YY1 and CTCF proteins to most of these newly discovered binding sites, demonstrating the feasibility and usefulness of ECMFinder.

Show MeSH

Related in: MedlinePlus

CTCF ECMs in the DLK1-DIO3 domain. (A) Custom Track View of human chromosome 14. The first track shows the density of genome-wide CTCF ChIP-seq data (17). The second track shows the log2 value of genome-wide CTCF ChIP-chip data (13). The third and fourth tracks indicate the CTCF ECMs and CTCF single binding motif detected by ECMFinder, respectively. The remaining tracks have been derived from the UCSC genome browser. The CTCF motifs are conserved although their flanking sequences have degenerated. Each sequence includes a CTCF motif (red) with its immediate surrounding regions (bottom left). The middle section shows the Dlk1-Dio3 domain in mouse chromosome 12. Maternally and paternally expressed genes are indicated by red and blue arrows, respectively. Sequence alignments of individual CTCF-binding sites and their flanking regions are also shown in the bottom left section. (B) ChIP confirmation of the three CTCF sites using mouse liver tissues. The Gtl2-DMR and H19-ICR were used as positive controls, and the Dlk1-3′ DMR was used as a negative control. Individual ChIP results from the three CTCF sites are shown below with their site numbers. (C) The first CTCF-binding site (#1) is well conserved among seven mammalian species (mouse, rat, human, orangutan, dog, horse and opossum). (D) PvuII enzyme digestion of CTCF ChIP–PCR product with an input control. PvuII digests only the paternal DNA (Mus spretus). The upper band is an undigested DNA (300 bp) and the lower band is a 241-bp fragment of DNA digested by PvuII. (E) Results of bisulfite sequencing of the 957-bp region surrounding the CTCF site #1. The closed and open circles indicate methylated and unmethylated CpGs, respectively. The red triangle represents the position of the CTCF site #1. The bisulfite sequencing results were further separated based on parental origin indicated by sex symbols and species names.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2665242&req=5

Figure 4: CTCF ECMs in the DLK1-DIO3 domain. (A) Custom Track View of human chromosome 14. The first track shows the density of genome-wide CTCF ChIP-seq data (17). The second track shows the log2 value of genome-wide CTCF ChIP-chip data (13). The third and fourth tracks indicate the CTCF ECMs and CTCF single binding motif detected by ECMFinder, respectively. The remaining tracks have been derived from the UCSC genome browser. The CTCF motifs are conserved although their flanking sequences have degenerated. Each sequence includes a CTCF motif (red) with its immediate surrounding regions (bottom left). The middle section shows the Dlk1-Dio3 domain in mouse chromosome 12. Maternally and paternally expressed genes are indicated by red and blue arrows, respectively. Sequence alignments of individual CTCF-binding sites and their flanking regions are also shown in the bottom left section. (B) ChIP confirmation of the three CTCF sites using mouse liver tissues. The Gtl2-DMR and H19-ICR were used as positive controls, and the Dlk1-3′ DMR was used as a negative control. Individual ChIP results from the three CTCF sites are shown below with their site numbers. (C) The first CTCF-binding site (#1) is well conserved among seven mammalian species (mouse, rat, human, orangutan, dog, horse and opossum). (D) PvuII enzyme digestion of CTCF ChIP–PCR product with an input control. PvuII digests only the paternal DNA (Mus spretus). The upper band is an undigested DNA (300 bp) and the lower band is a 241-bp fragment of DNA digested by PvuII. (E) Results of bisulfite sequencing of the 957-bp region surrounding the CTCF site #1. The closed and open circles indicate methylated and unmethylated CpGs, respectively. The red triangle represents the position of the CTCF site #1. The bisulfite sequencing results were further separated based on parental origin indicated by sex symbols and species names.

Mentions: Human (hg18), mouse (mm9), cow (bosTau4) and chicken (galGal3) genome sequences were obtained from the UCSC Genome Bioinformatics site (data set by chromosome, ftp://hgdownload.cse.ucsc.edu). Annotated table data for all genes were collected using the Table Browser (group: Genes and Gene Prediction Tracks, track: RefSeq Genes, region: genome). Annotations of homologs were obtained from the HomoloGene database (ftp://ftp.ncbi.nih.gov/pub/HomoloGene/build62). The true binding site of E2F1 used in Table 1 was obtained from mouse embryonic stem cell ChIP-seq data (16) and converted to the mm9 version (NCBI Build 37) with the liftOver tool (http://genome.ucsc.edu/cgi-bin/hgLiftOver). The mouse and human CTCF ChIP-seq data used in Table 2 were obtained from the study of mouse ES cells (16) and human CD4+ T cells (http://dir.nhlbi.nih.gov/papers/lmi/epigenomes/hgTcell.html) (17). The human CTCF ChIP-chip data used in Figure 4 were obtained and converted to the hg18 version by using the liftOver tool (13).Table 1.


Evolutionary Conserved Motif Finder (ECMFinder) for genome-wide identification of clustered YY1- and CTCF-binding sites.

Kang K, Chung JH, Kim J - Nucleic Acids Res. (2009)

CTCF ECMs in the DLK1-DIO3 domain. (A) Custom Track View of human chromosome 14. The first track shows the density of genome-wide CTCF ChIP-seq data (17). The second track shows the log2 value of genome-wide CTCF ChIP-chip data (13). The third and fourth tracks indicate the CTCF ECMs and CTCF single binding motif detected by ECMFinder, respectively. The remaining tracks have been derived from the UCSC genome browser. The CTCF motifs are conserved although their flanking sequences have degenerated. Each sequence includes a CTCF motif (red) with its immediate surrounding regions (bottom left). The middle section shows the Dlk1-Dio3 domain in mouse chromosome 12. Maternally and paternally expressed genes are indicated by red and blue arrows, respectively. Sequence alignments of individual CTCF-binding sites and their flanking regions are also shown in the bottom left section. (B) ChIP confirmation of the three CTCF sites using mouse liver tissues. The Gtl2-DMR and H19-ICR were used as positive controls, and the Dlk1-3′ DMR was used as a negative control. Individual ChIP results from the three CTCF sites are shown below with their site numbers. (C) The first CTCF-binding site (#1) is well conserved among seven mammalian species (mouse, rat, human, orangutan, dog, horse and opossum). (D) PvuII enzyme digestion of CTCF ChIP–PCR product with an input control. PvuII digests only the paternal DNA (Mus spretus). The upper band is an undigested DNA (300 bp) and the lower band is a 241-bp fragment of DNA digested by PvuII. (E) Results of bisulfite sequencing of the 957-bp region surrounding the CTCF site #1. The closed and open circles indicate methylated and unmethylated CpGs, respectively. The red triangle represents the position of the CTCF site #1. The bisulfite sequencing results were further separated based on parental origin indicated by sex symbols and species names.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2665242&req=5

Figure 4: CTCF ECMs in the DLK1-DIO3 domain. (A) Custom Track View of human chromosome 14. The first track shows the density of genome-wide CTCF ChIP-seq data (17). The second track shows the log2 value of genome-wide CTCF ChIP-chip data (13). The third and fourth tracks indicate the CTCF ECMs and CTCF single binding motif detected by ECMFinder, respectively. The remaining tracks have been derived from the UCSC genome browser. The CTCF motifs are conserved although their flanking sequences have degenerated. Each sequence includes a CTCF motif (red) with its immediate surrounding regions (bottom left). The middle section shows the Dlk1-Dio3 domain in mouse chromosome 12. Maternally and paternally expressed genes are indicated by red and blue arrows, respectively. Sequence alignments of individual CTCF-binding sites and their flanking regions are also shown in the bottom left section. (B) ChIP confirmation of the three CTCF sites using mouse liver tissues. The Gtl2-DMR and H19-ICR were used as positive controls, and the Dlk1-3′ DMR was used as a negative control. Individual ChIP results from the three CTCF sites are shown below with their site numbers. (C) The first CTCF-binding site (#1) is well conserved among seven mammalian species (mouse, rat, human, orangutan, dog, horse and opossum). (D) PvuII enzyme digestion of CTCF ChIP–PCR product with an input control. PvuII digests only the paternal DNA (Mus spretus). The upper band is an undigested DNA (300 bp) and the lower band is a 241-bp fragment of DNA digested by PvuII. (E) Results of bisulfite sequencing of the 957-bp region surrounding the CTCF site #1. The closed and open circles indicate methylated and unmethylated CpGs, respectively. The red triangle represents the position of the CTCF site #1. The bisulfite sequencing results were further separated based on parental origin indicated by sex symbols and species names.
Mentions: Human (hg18), mouse (mm9), cow (bosTau4) and chicken (galGal3) genome sequences were obtained from the UCSC Genome Bioinformatics site (data set by chromosome, ftp://hgdownload.cse.ucsc.edu). Annotated table data for all genes were collected using the Table Browser (group: Genes and Gene Prediction Tracks, track: RefSeq Genes, region: genome). Annotations of homologs were obtained from the HomoloGene database (ftp://ftp.ncbi.nih.gov/pub/HomoloGene/build62). The true binding site of E2F1 used in Table 1 was obtained from mouse embryonic stem cell ChIP-seq data (16) and converted to the mm9 version (NCBI Build 37) with the liftOver tool (http://genome.ucsc.edu/cgi-bin/hgLiftOver). The mouse and human CTCF ChIP-seq data used in Table 2 were obtained from the study of mouse ES cells (16) and human CD4+ T cells (http://dir.nhlbi.nih.gov/papers/lmi/epigenomes/hgTcell.html) (17). The human CTCF ChIP-chip data used in Figure 4 were obtained and converted to the hg18 version by using the liftOver tool (13).Table 1.

Bottom Line: This program successfully identified many clustered YY1- and CTCF-binding sites that are conserved among these species but were previously undetected.In particular, this program identified CTCF-binding sites that are located close to the Dlk1, Magel2 and Cdkn1c imprinted genes.Individual ChIP experiments confirmed the in vivo binding of the YY1 and CTCF proteins to most of these newly discovered binding sites, demonstrating the feasibility and usefulness of ECMFinder.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.

ABSTRACT
We have developed a new bioinformatics approach called ECMFinder (Evolutionary Conserved Motif Finder). This program searches for a given DNA motif within the entire genome of one species and uses the gene association information of a potential transcription factor-binding site (TFBS) to screen the homologous regions of a second and third species. If multiple species have this potential TFBS in homologous positions, this program recognizes the identified TFBS as an evolutionary conserved motif (ECM). This program outputs a list of ECMs, which can be uploaded as a Custom Track in the UCSC genome browser and can be visualized along with other available data. The feasibility of this approach was tested by searching the genomes of three mammals (human, mouse and cow) with the DNA-binding motifs of YY1 and CTCF. This program successfully identified many clustered YY1- and CTCF-binding sites that are conserved among these species but were previously undetected. In particular, this program identified CTCF-binding sites that are located close to the Dlk1, Magel2 and Cdkn1c imprinted genes. Individual ChIP experiments confirmed the in vivo binding of the YY1 and CTCF proteins to most of these newly discovered binding sites, demonstrating the feasibility and usefulness of ECMFinder.

Show MeSH
Related in: MedlinePlus