Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages.
Bottom Line: To gain insight into how these DNA elements are conserved and spread through the genome, we defined the full spectrum of CTCF-binding sites, including a 33/34-mer motif, and identified over five thousand highly conserved, robust, and tissue-independent CTCF-binding locations by comparing ChIP-seq data from six mammals.We discovered fossilized repeat elements flanking deeply conserved CTCF-binding regions, indicating that similar retrotransposon expansions occurred hundreds of millions of years ago.Repeat-driven dispersal of CTCF binding is a fundamental, ancient, and still highly active mechanism of genome evolution in mammalian lineages.
Affiliation: Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.Show MeSH
Mentions: The position weight matrix of CTCF's binding motif is composed of thousands of specific sequences, or motif-words. We tested whether CTCF has a preferred set of motif-words by analyzing their frequency of occurrence. We clustered highly similar motif-words using the 14 most informative bases of the M1 motif, which together capture over 95% of the motif's information content. A set of 33,994 different 14-mer motif-words (out of a possible 69,865) are used by CTCF at least once in the five placental mammals. We found that a small subset of these tens of thousands of motif-words are disproportionately often bound by CTCF within and between different species (Figure 3). For example, the top 200 bound motif-words are responsible for 4,006 binding events in the human genome; in fact, just 2,492 words (3.6% of the possible words) account for over half of the binding events in the human genome. CTCF motif-word usage is strikingly conserved between the species (Spearman rank correlation > 0.76) and recapitulates both the evolutionary distances of the species as well as key characteristics of the CTCF-binding events (Figure 3). In particular, we observed that the frequency of a word's usage positively correlates with both the likelihood of a binding event being shared among all five species and the strength of the ChIP enrichment (Figure 3). A similar analysis for a typical tissue-specific TF (HNF4A) showed considerably lower correlation of motif-word usage (Figure S3A) and no correlation between word frequency and either conservation or ChIP enrichment (Figure S3B). Collectively, these results reveal a functional hierarchy of CTCF-bound motif-words maintained during evolution.
Affiliation: Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.