Limits...
Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages.

Schmidt D, Schwalie PC, Wilson MD, Ballester B, Gonçalves A, Kutter C, Brown GD, Marshall A, Flicek P, Odom DT - Cell (2012)

Bottom Line: To gain insight into how these DNA elements are conserved and spread through the genome, we defined the full spectrum of CTCF-binding sites, including a 33/34-mer motif, and identified over five thousand highly conserved, robust, and tissue-independent CTCF-binding locations by comparing ChIP-seq data from six mammals.We discovered fossilized repeat elements flanking deeply conserved CTCF-binding regions, indicating that similar retrotransposon expansions occurred hundreds of millions of years ago.Repeat-driven dispersal of CTCF binding is a fundamental, ancient, and still highly active mechanism of genome evolution in mammalian lineages.

View Article: PubMed Central - PubMed

Affiliation: Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.

Show MeSH
CTCF Motif Usage Shows a Conserved Hierarchy among Placental MammalsHeat map of the 2,492 CTCF motif-words found at least five times in any species anchored to human; words are normalized by their background occurrences within each genome. This set of words is found in 27,543 human-binding events. The data are sorted in the human column by decreasing frequency, and spearman rank correlations after one-dimensional hierarchical clustering of the rows are shown. The average ChIP-enrichment of the motif-words separated into bins containing 100 words is shown as a bar chart (left). Similarly, the fraction of five-way conserved CTCF-binding events within the same bins are shown as a bar chart (right). See also Figure S3.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3368268&req=5

fig3: CTCF Motif Usage Shows a Conserved Hierarchy among Placental MammalsHeat map of the 2,492 CTCF motif-words found at least five times in any species anchored to human; words are normalized by their background occurrences within each genome. This set of words is found in 27,543 human-binding events. The data are sorted in the human column by decreasing frequency, and spearman rank correlations after one-dimensional hierarchical clustering of the rows are shown. The average ChIP-enrichment of the motif-words separated into bins containing 100 words is shown as a bar chart (left). Similarly, the fraction of five-way conserved CTCF-binding events within the same bins are shown as a bar chart (right). See also Figure S3.

Mentions: The position weight matrix of CTCF's binding motif is composed of thousands of specific sequences, or motif-words. We tested whether CTCF has a preferred set of motif-words by analyzing their frequency of occurrence. We clustered highly similar motif-words using the 14 most informative bases of the M1 motif, which together capture over 95% of the motif's information content. A set of 33,994 different 14-mer motif-words (out of a possible 69,865) are used by CTCF at least once in the five placental mammals. We found that a small subset of these tens of thousands of motif-words are disproportionately often bound by CTCF within and between different species (Figure 3). For example, the top 200 bound motif-words are responsible for 4,006 binding events in the human genome; in fact, just 2,492 words (3.6% of the possible words) account for over half of the binding events in the human genome. CTCF motif-word usage is strikingly conserved between the species (Spearman rank correlation > 0.76) and recapitulates both the evolutionary distances of the species as well as key characteristics of the CTCF-binding events (Figure 3). In particular, we observed that the frequency of a word's usage positively correlates with both the likelihood of a binding event being shared among all five species and the strength of the ChIP enrichment (Figure 3). A similar analysis for a typical tissue-specific TF (HNF4A) showed considerably lower correlation of motif-word usage (Figure S3A) and no correlation between word frequency and either conservation or ChIP enrichment (Figure S3B). Collectively, these results reveal a functional hierarchy of CTCF-bound motif-words maintained during evolution.


Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages.

Schmidt D, Schwalie PC, Wilson MD, Ballester B, Gonçalves A, Kutter C, Brown GD, Marshall A, Flicek P, Odom DT - Cell (2012)

CTCF Motif Usage Shows a Conserved Hierarchy among Placental MammalsHeat map of the 2,492 CTCF motif-words found at least five times in any species anchored to human; words are normalized by their background occurrences within each genome. This set of words is found in 27,543 human-binding events. The data are sorted in the human column by decreasing frequency, and spearman rank correlations after one-dimensional hierarchical clustering of the rows are shown. The average ChIP-enrichment of the motif-words separated into bins containing 100 words is shown as a bar chart (left). Similarly, the fraction of five-way conserved CTCF-binding events within the same bins are shown as a bar chart (right). See also Figure S3.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3368268&req=5

fig3: CTCF Motif Usage Shows a Conserved Hierarchy among Placental MammalsHeat map of the 2,492 CTCF motif-words found at least five times in any species anchored to human; words are normalized by their background occurrences within each genome. This set of words is found in 27,543 human-binding events. The data are sorted in the human column by decreasing frequency, and spearman rank correlations after one-dimensional hierarchical clustering of the rows are shown. The average ChIP-enrichment of the motif-words separated into bins containing 100 words is shown as a bar chart (left). Similarly, the fraction of five-way conserved CTCF-binding events within the same bins are shown as a bar chart (right). See also Figure S3.
Mentions: The position weight matrix of CTCF's binding motif is composed of thousands of specific sequences, or motif-words. We tested whether CTCF has a preferred set of motif-words by analyzing their frequency of occurrence. We clustered highly similar motif-words using the 14 most informative bases of the M1 motif, which together capture over 95% of the motif's information content. A set of 33,994 different 14-mer motif-words (out of a possible 69,865) are used by CTCF at least once in the five placental mammals. We found that a small subset of these tens of thousands of motif-words are disproportionately often bound by CTCF within and between different species (Figure 3). For example, the top 200 bound motif-words are responsible for 4,006 binding events in the human genome; in fact, just 2,492 words (3.6% of the possible words) account for over half of the binding events in the human genome. CTCF motif-word usage is strikingly conserved between the species (Spearman rank correlation > 0.76) and recapitulates both the evolutionary distances of the species as well as key characteristics of the CTCF-binding events (Figure 3). In particular, we observed that the frequency of a word's usage positively correlates with both the likelihood of a binding event being shared among all five species and the strength of the ChIP enrichment (Figure 3). A similar analysis for a typical tissue-specific TF (HNF4A) showed considerably lower correlation of motif-word usage (Figure S3A) and no correlation between word frequency and either conservation or ChIP enrichment (Figure S3B). Collectively, these results reveal a functional hierarchy of CTCF-bound motif-words maintained during evolution.

Bottom Line: To gain insight into how these DNA elements are conserved and spread through the genome, we defined the full spectrum of CTCF-binding sites, including a 33/34-mer motif, and identified over five thousand highly conserved, robust, and tissue-independent CTCF-binding locations by comparing ChIP-seq data from six mammals.We discovered fossilized repeat elements flanking deeply conserved CTCF-binding regions, indicating that similar retrotransposon expansions occurred hundreds of millions of years ago.Repeat-driven dispersal of CTCF binding is a fundamental, ancient, and still highly active mechanism of genome evolution in mammalian lineages.

View Article: PubMed Central - PubMed

Affiliation: Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.

Show MeSH