Limits...
Motif signatures in stretch enhancers are enriched for disease-associated genetic variants.

Quang DX, Erdos MR, Parker SC, Collins FS - Epigenetics Chromatin (2015)

Bottom Line: Interestingly, we show that single nucleotide polymorphisms associated with diseases or quantitative traits significantly overlap motif occurrences located in SEs, but outside of DHSs.These results reinforce the role of SEs in influencing risk for diseases and suggest an expanded regulatory functional role for motifs that occur outside highly accessible chromatin.Furthermore, the motif signatures generated here expand our understanding of the binding preference of well-characterized TFs.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of California, Irvine, Irvine, CA 92697 USA ; Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697 USA.

ABSTRACT

Background: Stretch enhancers (SEs) are large chromatin-defined regulatory elements that are at least 3,000 base pairs (bps) long, in contrast to the median enhancer length of 800 bps. SEs tend to be cell-type specific, regulate cell-type specific gene expression, and are enriched in disease-associated genetic variants in disease-relevant cell types. Transcription factors (TFs) can bind to enhancers to modulate enhancer activity, and their sequence specificity can be represented by motifs. We hypothesize motifs can provide a biological context for how genetic variants contribute to disease.

Results: We integrated chromatin state, gene expression, and chromatin accessibility [measured as DNase I Hypersensitive Sites (DHSs)] maps across nine different cell types. Motif enrichment analyses of chromatin-defined enhancer sequences identify several known cell-type specific "master" factors. Furthermore, de novo motif discovery not only recovers many of these motifs, but also identifies novel non-canonical motifs, providing additional insight into TF binding preferences. Across the length of SEs, motifs are most enriched in DHSs, though relative enrichment is also observed outside of DHSs. Interestingly, we show that single nucleotide polymorphisms associated with diseases or quantitative traits significantly overlap motif occurrences located in SEs, but outside of DHSs.

Conclusions: These results reinforce the role of SEs in influencing risk for diseases and suggest an expanded regulatory functional role for motifs that occur outside highly accessible chromatin. Furthermore, the motif signatures generated here expand our understanding of the binding preference of well-characterized TFs.

No MeSH data available.


Related in: MedlinePlus

De novo motif discovery in enhancer footprint sequences reveals novel binding patterns of well-characterized TFs. Motifs of known activators in the HUVEC, K562, HSMM, and HepG2 cell lines can co-occur together. For example, in the HUVEC enhancer footprint sequences, the ERG motif, a member of the ETS family that is characterized by a “GGAA” binding paper, often co-occurs with the AP-1 motif. In the presence of the AP-1 motif, the degree of resemblance of a predicted site to the ERG motif is weaker. SPI1, another member of the ETS family, shares a similar relationship with the GATA1 motif in K562. In other examples, activator TFs appear to often homodimerize and form palindromic motifs. Sequence logos of examples of de novo motifs in the cell types are displayed alongside, if available, CentriMo E-values and number of matches in SE DHS sequences (“Methods”). For two of the HepG2 examples, the motifs are so infrequent that CentriMo failed to find a significant number of matches in SE sequences.
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4502539&req=5

Fig6: De novo motif discovery in enhancer footprint sequences reveals novel binding patterns of well-characterized TFs. Motifs of known activators in the HUVEC, K562, HSMM, and HepG2 cell lines can co-occur together. For example, in the HUVEC enhancer footprint sequences, the ERG motif, a member of the ETS family that is characterized by a “GGAA” binding paper, often co-occurs with the AP-1 motif. In the presence of the AP-1 motif, the degree of resemblance of a predicted site to the ERG motif is weaker. SPI1, another member of the ETS family, shares a similar relationship with the GATA1 motif in K562. In other examples, activator TFs appear to often homodimerize and form palindromic motifs. Sequence logos of examples of de novo motifs in the cell types are displayed alongside, if available, CentriMo E-values and number of matches in SE DHS sequences (“Methods”). For two of the HepG2 examples, the motifs are so infrequent that CentriMo failed to find a significant number of matches in SE sequences.

Mentions: The de novo motif discovery analysis does not yield any prominent examples of novel motif families, possibly because the systems we consider have already been studied extensively. However, we do find that de novo motif discovery can provide novel insight into the spatial arrangement of motif combinations at nucleotide resolution. Some of the motifs discovered in footprint sequences appear as combinations of two known motifs in close spatial proximity (Figure 6). In HUVEC enhancers, for example, we find a significant number of activator protein 1 (AP-1) and ERG motif matches. AP-1 is a heterodimeric protein that recognizes and binds to the enhancer heptamer motif 5′-TGA[CG]TCA-3′. ERG is a subfamily of the ETS family of TFs, which have a strong 5′-GGAA-3′ core binding sequence within their binding motifs. The ERG subfamily includes TFs such as ERG and FLI1, which are known to be functionally active in HUVEC. Through our de novo motif analysis, we find these two classes of motifs are significantly co-enriched, but the frequency of the combination depends on the relative orientation of these two motifs. Furthermore, sequence-specific constraints for the ERG binding motif are relaxed when an AP-1 motif is nearby. These results suggest a motif regulatory “grammar” governed by physical constraints that dictate the in vivo spatial arrangements and frequencies of combinations of motifs, which is consistent with a previous report [26], and may uncover some of the non-canonical sequence determinants that underly disease-associated SNPs. Similarly, another previous study showed the sequence-specific constraints of some TFs can decrease as a function of the number of co-occupying factors [27]. Although these motifs contain binding preferences of well-characterized TFs, most of them are novel, lacking any database matches. As a resource to the community, we provide all de novo discovered motifs in MEME Minimal Motif Format (Additional file 4).Figure 6


Motif signatures in stretch enhancers are enriched for disease-associated genetic variants.

Quang DX, Erdos MR, Parker SC, Collins FS - Epigenetics Chromatin (2015)

De novo motif discovery in enhancer footprint sequences reveals novel binding patterns of well-characterized TFs. Motifs of known activators in the HUVEC, K562, HSMM, and HepG2 cell lines can co-occur together. For example, in the HUVEC enhancer footprint sequences, the ERG motif, a member of the ETS family that is characterized by a “GGAA” binding paper, often co-occurs with the AP-1 motif. In the presence of the AP-1 motif, the degree of resemblance of a predicted site to the ERG motif is weaker. SPI1, another member of the ETS family, shares a similar relationship with the GATA1 motif in K562. In other examples, activator TFs appear to often homodimerize and form palindromic motifs. Sequence logos of examples of de novo motifs in the cell types are displayed alongside, if available, CentriMo E-values and number of matches in SE DHS sequences (“Methods”). For two of the HepG2 examples, the motifs are so infrequent that CentriMo failed to find a significant number of matches in SE sequences.
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4502539&req=5

Fig6: De novo motif discovery in enhancer footprint sequences reveals novel binding patterns of well-characterized TFs. Motifs of known activators in the HUVEC, K562, HSMM, and HepG2 cell lines can co-occur together. For example, in the HUVEC enhancer footprint sequences, the ERG motif, a member of the ETS family that is characterized by a “GGAA” binding paper, often co-occurs with the AP-1 motif. In the presence of the AP-1 motif, the degree of resemblance of a predicted site to the ERG motif is weaker. SPI1, another member of the ETS family, shares a similar relationship with the GATA1 motif in K562. In other examples, activator TFs appear to often homodimerize and form palindromic motifs. Sequence logos of examples of de novo motifs in the cell types are displayed alongside, if available, CentriMo E-values and number of matches in SE DHS sequences (“Methods”). For two of the HepG2 examples, the motifs are so infrequent that CentriMo failed to find a significant number of matches in SE sequences.
Mentions: The de novo motif discovery analysis does not yield any prominent examples of novel motif families, possibly because the systems we consider have already been studied extensively. However, we do find that de novo motif discovery can provide novel insight into the spatial arrangement of motif combinations at nucleotide resolution. Some of the motifs discovered in footprint sequences appear as combinations of two known motifs in close spatial proximity (Figure 6). In HUVEC enhancers, for example, we find a significant number of activator protein 1 (AP-1) and ERG motif matches. AP-1 is a heterodimeric protein that recognizes and binds to the enhancer heptamer motif 5′-TGA[CG]TCA-3′. ERG is a subfamily of the ETS family of TFs, which have a strong 5′-GGAA-3′ core binding sequence within their binding motifs. The ERG subfamily includes TFs such as ERG and FLI1, which are known to be functionally active in HUVEC. Through our de novo motif analysis, we find these two classes of motifs are significantly co-enriched, but the frequency of the combination depends on the relative orientation of these two motifs. Furthermore, sequence-specific constraints for the ERG binding motif are relaxed when an AP-1 motif is nearby. These results suggest a motif regulatory “grammar” governed by physical constraints that dictate the in vivo spatial arrangements and frequencies of combinations of motifs, which is consistent with a previous report [26], and may uncover some of the non-canonical sequence determinants that underly disease-associated SNPs. Similarly, another previous study showed the sequence-specific constraints of some TFs can decrease as a function of the number of co-occupying factors [27]. Although these motifs contain binding preferences of well-characterized TFs, most of them are novel, lacking any database matches. As a resource to the community, we provide all de novo discovered motifs in MEME Minimal Motif Format (Additional file 4).Figure 6

Bottom Line: Interestingly, we show that single nucleotide polymorphisms associated with diseases or quantitative traits significantly overlap motif occurrences located in SEs, but outside of DHSs.These results reinforce the role of SEs in influencing risk for diseases and suggest an expanded regulatory functional role for motifs that occur outside highly accessible chromatin.Furthermore, the motif signatures generated here expand our understanding of the binding preference of well-characterized TFs.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of California, Irvine, Irvine, CA 92697 USA ; Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697 USA.

ABSTRACT

Background: Stretch enhancers (SEs) are large chromatin-defined regulatory elements that are at least 3,000 base pairs (bps) long, in contrast to the median enhancer length of 800 bps. SEs tend to be cell-type specific, regulate cell-type specific gene expression, and are enriched in disease-associated genetic variants in disease-relevant cell types. Transcription factors (TFs) can bind to enhancers to modulate enhancer activity, and their sequence specificity can be represented by motifs. We hypothesize motifs can provide a biological context for how genetic variants contribute to disease.

Results: We integrated chromatin state, gene expression, and chromatin accessibility [measured as DNase I Hypersensitive Sites (DHSs)] maps across nine different cell types. Motif enrichment analyses of chromatin-defined enhancer sequences identify several known cell-type specific "master" factors. Furthermore, de novo motif discovery not only recovers many of these motifs, but also identifies novel non-canonical motifs, providing additional insight into TF binding preferences. Across the length of SEs, motifs are most enriched in DHSs, though relative enrichment is also observed outside of DHSs. Interestingly, we show that single nucleotide polymorphisms associated with diseases or quantitative traits significantly overlap motif occurrences located in SEs, but outside of DHSs.

Conclusions: These results reinforce the role of SEs in influencing risk for diseases and suggest an expanded regulatory functional role for motifs that occur outside highly accessible chromatin. Furthermore, the motif signatures generated here expand our understanding of the binding preference of well-characterized TFs.

No MeSH data available.


Related in: MedlinePlus