Limits...
Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages.

Schmidt D, Schwalie PC, Wilson MD, Ballester B, Gonçalves A, Kutter C, Brown GD, Marshall A, Flicek P, Odom DT - Cell (2012)

Bottom Line: To gain insight into how these DNA elements are conserved and spread through the genome, we defined the full spectrum of CTCF-binding sites, including a 33/34-mer motif, and identified over five thousand highly conserved, robust, and tissue-independent CTCF-binding locations by comparing ChIP-seq data from six mammals.We discovered fossilized repeat elements flanking deeply conserved CTCF-binding regions, indicating that similar retrotransposon expansions occurred hundreds of millions of years ago.Repeat-driven dispersal of CTCF binding is a fundamental, ancient, and still highly active mechanism of genome evolution in mammalian lineages.

View Article: PubMed Central - PubMed

Affiliation: Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.

Show MeSH
CTCF Binding Often Occurs at a Highly Conserved Motif, Consisting of a Two-Part Profile(A) Motifs (M1 and M2) identified de novo from CTCF-binding events.(B) Binding event counts and number of binding events with at least one motif (M1 and M1+M2) in all six species. M1+M2 20,21 represents the preferred spacing patterns of these two submotifs.(C) The DNA sequence constraint around the CTCF motif in human was plotted by observed/expected genomic evolutionary rate profiling (red, GERP) scores (Cooper et al., 2005). The frequencies of unchanged bases in five-way shared CTCF-binding events are shown as position weight matrix (PWM) below the GERP profile.(D) Peaks containing the M2 motif in preferred spacing are stronger in ChIP enrichment both by read count and peak width, are more highly shared among mammals, and are resistant to RNAi-mediated knockdown.(E) A multiple mammalian sequence alignment of a CTCF peak at the APP gene is shown. The DNase I footprint (red box, Quitschke et al., 2000) encompasses a complete 34 bp M1 and M2 CTCF motif.(F) DNA sequence of the human c-myc promoter (Human c-myc Fragment A) bound by CTCF in vivo and in vitro (Filippova et al., 1996). The sequence contains the canonical M1 CTCF motif (red) and the M2 motif (blue). A 3 bp mutation in the M2 motif that eliminates CTCF binding in vitro is indicated in green.See also Figure S2.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3368268&req=5

fig2: CTCF Binding Often Occurs at a Highly Conserved Motif, Consisting of a Two-Part Profile(A) Motifs (M1 and M2) identified de novo from CTCF-binding events.(B) Binding event counts and number of binding events with at least one motif (M1 and M1+M2) in all six species. M1+M2 20,21 represents the preferred spacing patterns of these two submotifs.(C) The DNA sequence constraint around the CTCF motif in human was plotted by observed/expected genomic evolutionary rate profiling (red, GERP) scores (Cooper et al., 2005). The frequencies of unchanged bases in five-way shared CTCF-binding events are shown as position weight matrix (PWM) below the GERP profile.(D) Peaks containing the M2 motif in preferred spacing are stronger in ChIP enrichment both by read count and peak width, are more highly shared among mammals, and are resistant to RNAi-mediated knockdown.(E) A multiple mammalian sequence alignment of a CTCF peak at the APP gene is shown. The DNase I footprint (red box, Quitschke et al., 2000) encompasses a complete 34 bp M1 and M2 CTCF motif.(F) DNA sequence of the human c-myc promoter (Human c-myc Fragment A) bound by CTCF in vivo and in vitro (Filippova et al., 1996). The sequence contains the canonical M1 CTCF motif (red) and the M2 motif (blue). A 3 bp mutation in the M2 motif that eliminates CTCF binding in vitro is indicated in green.See also Figure S2.

Mentions: Our genome-wide data for CTCF binding in livers of five eutherian species allowed us to identify de novo DNA sequences associated with CTCF binding at hundreds of thousands of locations. In addition to the known 20 bp motif, we further discovered a second 9 bp motif present at high frequency and with consistent spacing in each species. Both halves of the motif are unchanged across 180 million years of evolution, consistent with the high conservation of CTCF's DNA-binding domain (Figure S2), and create together a two-part 33/34 bp binding motif, which occurs in a quarter to a third of CTCF-binding events (Figures 2A and 2B). The second motif is downstream by either 21 or 22 bp from the center of the previously identified motif in approximately equal proportions in all studied species, except mouse and rat (Figure 4). Henceforth, we will refer to the canonical 20 base motif as M1 and to the 9 base motif as M2. The M2 motif has previously been found in CTCF DNase footprints, but the role of this motif is unknown (Boyle et al., 2011). The variable presence of the shorter and less information-rich M2 agrees with earlier suggestions that CTCF may have multiple binding modalities (Burcin et al., 1997; Filippova et al., 1996).


Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages.

Schmidt D, Schwalie PC, Wilson MD, Ballester B, Gonçalves A, Kutter C, Brown GD, Marshall A, Flicek P, Odom DT - Cell (2012)

CTCF Binding Often Occurs at a Highly Conserved Motif, Consisting of a Two-Part Profile(A) Motifs (M1 and M2) identified de novo from CTCF-binding events.(B) Binding event counts and number of binding events with at least one motif (M1 and M1+M2) in all six species. M1+M2 20,21 represents the preferred spacing patterns of these two submotifs.(C) The DNA sequence constraint around the CTCF motif in human was plotted by observed/expected genomic evolutionary rate profiling (red, GERP) scores (Cooper et al., 2005). The frequencies of unchanged bases in five-way shared CTCF-binding events are shown as position weight matrix (PWM) below the GERP profile.(D) Peaks containing the M2 motif in preferred spacing are stronger in ChIP enrichment both by read count and peak width, are more highly shared among mammals, and are resistant to RNAi-mediated knockdown.(E) A multiple mammalian sequence alignment of a CTCF peak at the APP gene is shown. The DNase I footprint (red box, Quitschke et al., 2000) encompasses a complete 34 bp M1 and M2 CTCF motif.(F) DNA sequence of the human c-myc promoter (Human c-myc Fragment A) bound by CTCF in vivo and in vitro (Filippova et al., 1996). The sequence contains the canonical M1 CTCF motif (red) and the M2 motif (blue). A 3 bp mutation in the M2 motif that eliminates CTCF binding in vitro is indicated in green.See also Figure S2.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3368268&req=5

fig2: CTCF Binding Often Occurs at a Highly Conserved Motif, Consisting of a Two-Part Profile(A) Motifs (M1 and M2) identified de novo from CTCF-binding events.(B) Binding event counts and number of binding events with at least one motif (M1 and M1+M2) in all six species. M1+M2 20,21 represents the preferred spacing patterns of these two submotifs.(C) The DNA sequence constraint around the CTCF motif in human was plotted by observed/expected genomic evolutionary rate profiling (red, GERP) scores (Cooper et al., 2005). The frequencies of unchanged bases in five-way shared CTCF-binding events are shown as position weight matrix (PWM) below the GERP profile.(D) Peaks containing the M2 motif in preferred spacing are stronger in ChIP enrichment both by read count and peak width, are more highly shared among mammals, and are resistant to RNAi-mediated knockdown.(E) A multiple mammalian sequence alignment of a CTCF peak at the APP gene is shown. The DNase I footprint (red box, Quitschke et al., 2000) encompasses a complete 34 bp M1 and M2 CTCF motif.(F) DNA sequence of the human c-myc promoter (Human c-myc Fragment A) bound by CTCF in vivo and in vitro (Filippova et al., 1996). The sequence contains the canonical M1 CTCF motif (red) and the M2 motif (blue). A 3 bp mutation in the M2 motif that eliminates CTCF binding in vitro is indicated in green.See also Figure S2.
Mentions: Our genome-wide data for CTCF binding in livers of five eutherian species allowed us to identify de novo DNA sequences associated with CTCF binding at hundreds of thousands of locations. In addition to the known 20 bp motif, we further discovered a second 9 bp motif present at high frequency and with consistent spacing in each species. Both halves of the motif are unchanged across 180 million years of evolution, consistent with the high conservation of CTCF's DNA-binding domain (Figure S2), and create together a two-part 33/34 bp binding motif, which occurs in a quarter to a third of CTCF-binding events (Figures 2A and 2B). The second motif is downstream by either 21 or 22 bp from the center of the previously identified motif in approximately equal proportions in all studied species, except mouse and rat (Figure 4). Henceforth, we will refer to the canonical 20 base motif as M1 and to the 9 base motif as M2. The M2 motif has previously been found in CTCF DNase footprints, but the role of this motif is unknown (Boyle et al., 2011). The variable presence of the shorter and less information-rich M2 agrees with earlier suggestions that CTCF may have multiple binding modalities (Burcin et al., 1997; Filippova et al., 1996).

Bottom Line: To gain insight into how these DNA elements are conserved and spread through the genome, we defined the full spectrum of CTCF-binding sites, including a 33/34-mer motif, and identified over five thousand highly conserved, robust, and tissue-independent CTCF-binding locations by comparing ChIP-seq data from six mammals.We discovered fossilized repeat elements flanking deeply conserved CTCF-binding regions, indicating that similar retrotransposon expansions occurred hundreds of millions of years ago.Repeat-driven dispersal of CTCF binding is a fundamental, ancient, and still highly active mechanism of genome evolution in mammalian lineages.

View Article: PubMed Central - PubMed

Affiliation: Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.

Show MeSH