Limits...
The majority of primate-specific regulatory sequences are derived from transposable elements.

Jacques PÉ, Jeyakani J, Bourque G - PLoS Genet. (2013)

Bottom Line: We also showed that distinct subfamilies of endogenous retroviruses (ERVs) contributed significantly more accessible regions than expected by chance, with up to 80% of their instances in open chromatin.Based on these results, we further characterized 2,150 TE subfamily-transcription factor pairs that were bound in vivo or enriched for specific binding motifs, and observed that TEs contributing to open chromatin had higher levels of sequence conservation.Taken together, these results demonstrate that TEs, and in particular ERVs, have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.

View Article: PubMed Central - PubMed

Affiliation: Computational and Systems Biology, Genome Institute of Singapore, Singapore, Singapore.

ABSTRACT
Although emerging evidence suggests that transposable elements (TEs) have contributed novel regulatory elements to the human genome, their global impact on transcriptional networks remains largely uncharacterized. Here we show that TEs have contributed to the human genome nearly half of its active elements. Using DNase I hypersensitivity data sets from ENCODE in normal, embryonic, and cancer cells, we found that 44% of open chromatin regions were in TEs and that this proportion reached 63% for primate-specific regions. We also showed that distinct subfamilies of endogenous retroviruses (ERVs) contributed significantly more accessible regions than expected by chance, with up to 80% of their instances in open chromatin. Based on these results, we further characterized 2,150 TE subfamily-transcription factor pairs that were bound in vivo or enriched for specific binding motifs, and observed that TEs contributing to open chromatin had higher levels of sequence conservation. We also showed that thousands of ERV-derived sequences were activated in a cell type-specific manner, especially in embryonic and cancer cells, and we demonstrated that this activity was associated with cell type-specific expression of neighboring genes. Taken together, these results demonstrate that TEs, and in particular ERVs, have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.

Show MeSH

Related in: MedlinePlus

TEs have contributed a large fraction of accessible regions in human cells.(A) Proportion of human DHS regions overlapping different classes of repeats based on the age of the sequence in which they are embedded. (B) Specific repeat subfamilies, called DHS-associated repeats (DARs), are over-represented and their cumulative relative contribution (Observed-Expected) is shown as a percentage of all DHS data. (C–D) Proportion of all repeat instances in the genome (All repeats) and for DAR instances in three classes of cells (Normal, ESC and Cancer). (E–F) Fraction of repeat subfamily instances that is contributing to open chromatin in at least one data set. The estimated age is in millions of years (Myrs).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3649963&req=5

pgen-1003504-g001: TEs have contributed a large fraction of accessible regions in human cells.(A) Proportion of human DHS regions overlapping different classes of repeats based on the age of the sequence in which they are embedded. (B) Specific repeat subfamilies, called DHS-associated repeats (DARs), are over-represented and their cumulative relative contribution (Observed-Expected) is shown as a percentage of all DHS data. (C–D) Proportion of all repeat instances in the genome (All repeats) and for DAR instances in three classes of cells (Normal, ESC and Cancer). (E–F) Fraction of repeat subfamily instances that is contributing to open chromatin in at least one data set. The estimated age is in millions of years (Myrs).

Mentions: Starting from 106 DHS data sets we performed extensive quality control and retained 75 data sets defining a total of 11,848,530 regions of open chromatin in 41 distinct human cell types derived from normal, embryonic, and cancer tissues (Table 1 and Table S1, see Materials and Methods). These DHS data were further grouped across cell types into 1,643,643 distinct regions of open chromatin. By measuring the overlap with repeat elements, we found that 725,610 (44.1%) DHS regions overlapped instances of the 4 major classes of TEs (ERV, also known as LTR, DNA, LINE and SINE). Notably, by partitioning the DHS regions based on the presence or absence of homologous sequences at orthologous loci in other species, we also found that this proportion reached 63.1% for elements embedded in primate-specific sequences (Figure 1A, see Materials and Methods). A large fraction of these primate-specific DHS regions were observed in repeat subfamilies that were themselves specific to the primate lineage as estimated from the divergence of the repeat instances from their consensus (Figure S1, see Materials and Methods).


The majority of primate-specific regulatory sequences are derived from transposable elements.

Jacques PÉ, Jeyakani J, Bourque G - PLoS Genet. (2013)

TEs have contributed a large fraction of accessible regions in human cells.(A) Proportion of human DHS regions overlapping different classes of repeats based on the age of the sequence in which they are embedded. (B) Specific repeat subfamilies, called DHS-associated repeats (DARs), are over-represented and their cumulative relative contribution (Observed-Expected) is shown as a percentage of all DHS data. (C–D) Proportion of all repeat instances in the genome (All repeats) and for DAR instances in three classes of cells (Normal, ESC and Cancer). (E–F) Fraction of repeat subfamily instances that is contributing to open chromatin in at least one data set. The estimated age is in millions of years (Myrs).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3649963&req=5

pgen-1003504-g001: TEs have contributed a large fraction of accessible regions in human cells.(A) Proportion of human DHS regions overlapping different classes of repeats based on the age of the sequence in which they are embedded. (B) Specific repeat subfamilies, called DHS-associated repeats (DARs), are over-represented and their cumulative relative contribution (Observed-Expected) is shown as a percentage of all DHS data. (C–D) Proportion of all repeat instances in the genome (All repeats) and for DAR instances in three classes of cells (Normal, ESC and Cancer). (E–F) Fraction of repeat subfamily instances that is contributing to open chromatin in at least one data set. The estimated age is in millions of years (Myrs).
Mentions: Starting from 106 DHS data sets we performed extensive quality control and retained 75 data sets defining a total of 11,848,530 regions of open chromatin in 41 distinct human cell types derived from normal, embryonic, and cancer tissues (Table 1 and Table S1, see Materials and Methods). These DHS data were further grouped across cell types into 1,643,643 distinct regions of open chromatin. By measuring the overlap with repeat elements, we found that 725,610 (44.1%) DHS regions overlapped instances of the 4 major classes of TEs (ERV, also known as LTR, DNA, LINE and SINE). Notably, by partitioning the DHS regions based on the presence or absence of homologous sequences at orthologous loci in other species, we also found that this proportion reached 63.1% for elements embedded in primate-specific sequences (Figure 1A, see Materials and Methods). A large fraction of these primate-specific DHS regions were observed in repeat subfamilies that were themselves specific to the primate lineage as estimated from the divergence of the repeat instances from their consensus (Figure S1, see Materials and Methods).

Bottom Line: We also showed that distinct subfamilies of endogenous retroviruses (ERVs) contributed significantly more accessible regions than expected by chance, with up to 80% of their instances in open chromatin.Based on these results, we further characterized 2,150 TE subfamily-transcription factor pairs that were bound in vivo or enriched for specific binding motifs, and observed that TEs contributing to open chromatin had higher levels of sequence conservation.Taken together, these results demonstrate that TEs, and in particular ERVs, have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.

View Article: PubMed Central - PubMed

Affiliation: Computational and Systems Biology, Genome Institute of Singapore, Singapore, Singapore.

ABSTRACT
Although emerging evidence suggests that transposable elements (TEs) have contributed novel regulatory elements to the human genome, their global impact on transcriptional networks remains largely uncharacterized. Here we show that TEs have contributed to the human genome nearly half of its active elements. Using DNase I hypersensitivity data sets from ENCODE in normal, embryonic, and cancer cells, we found that 44% of open chromatin regions were in TEs and that this proportion reached 63% for primate-specific regions. We also showed that distinct subfamilies of endogenous retroviruses (ERVs) contributed significantly more accessible regions than expected by chance, with up to 80% of their instances in open chromatin. Based on these results, we further characterized 2,150 TE subfamily-transcription factor pairs that were bound in vivo or enriched for specific binding motifs, and observed that TEs contributing to open chromatin had higher levels of sequence conservation. We also showed that thousands of ERV-derived sequences were activated in a cell type-specific manner, especially in embryonic and cancer cells, and we demonstrated that this activity was associated with cell type-specific expression of neighboring genes. Taken together, these results demonstrate that TEs, and in particular ERVs, have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.

Show MeSH
Related in: MedlinePlus