Limits...
Multi-scale chromatin state annotation using a hierarchical hidden Markov model

View Article: PubMed Central - PubMed

ABSTRACT

Chromatin-state analysis is widely applied in the studies of development and diseases. However, existing methods operate at a single length scale, and therefore cannot distinguish large domains from isolated elements of the same type. To overcome this limitation, we present a hierarchical hidden Markov model, diHMM, to systematically annotate chromatin states at multiple length scales. We apply diHMM to analyse a public ChIP-seq data set. diHMM not only accurately captures nucleosome-level information, but identifies domain-level states that vary in nucleosome-level state composition, spatial distribution and functionality. The domain-level states recapitulate known patterns such as super-enhancers, bivalent promoters and Polycomb repressed regions, and identify additional patterns whose biological functions are not yet characterized. By integrating chromatin-state information with gene expression and Hi-C data, we identify context-dependent functions of nucleosome-level states. Thus, diHMM provides a powerful tool for investigating the role of higher-order chromatin structure in gene regulation.

No MeSH data available.


Annotation of the chromatin states identified by diHMM.(a) Emission probability matrix for our diHMM model that contains 30 domain-level and 30 nucleosome-level states. The scale varies linearly between 0 (white) and 1 (dark purple). Colour legend on the left shows our nucleosome-level state annotations. (b) Genomic annotation enrichment for our 30 nucleosome-level states in all cell types combined. Each column shows relative enrichment in a linear scale between 0 (white) and 1 (dark orange). (c) Fraction of genomic coverage in each cell type for each nucleosome-level state. The scale varies logarithmically between 10−4 (white) and 1 (dark blue). (d) Significant fold enrichments for nucleosome- and domain-level combinations. Only combinations for which false discovery rate (FDR) <0.01 (Fisher's exact test) are displayed above background level. The scale varies logarithmically between 1 (white) and 50 (dark green). Colour legend on the left shows our domain-level annotations. (e) Fraction of genomic coverage in each cell type for each domain-level state. The scale varies logarithmically between 10−4 (white) and 1 (dark blue).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5385569&req=5

f2: Annotation of the chromatin states identified by diHMM.(a) Emission probability matrix for our diHMM model that contains 30 domain-level and 30 nucleosome-level states. The scale varies linearly between 0 (white) and 1 (dark purple). Colour legend on the left shows our nucleosome-level state annotations. (b) Genomic annotation enrichment for our 30 nucleosome-level states in all cell types combined. Each column shows relative enrichment in a linear scale between 0 (white) and 1 (dark orange). (c) Fraction of genomic coverage in each cell type for each nucleosome-level state. The scale varies logarithmically between 10−4 (white) and 1 (dark blue). (d) Significant fold enrichments for nucleosome- and domain-level combinations. Only combinations for which false discovery rate (FDR) <0.01 (Fisher's exact test) are displayed above background level. The scale varies logarithmically between 1 (white) and 50 (dark green). Colour legend on the left shows our domain-level annotations. (e) Fraction of genomic coverage in each cell type for each domain-level state. The scale varies logarithmically between 10−4 (white) and 1 (dark blue).

Mentions: Using a similar strategy as in ChromHMM7, we functionally annotated the nucleosome-level states, based on the combinatorial pattern of ChIP-seq signals (Fig. 2a), the spatial distribution (Supplementary Fig. 5c) as well as the enrichment of various functionally relevant elements (Fig. 2b). In the end, these 30 nucleosome-level states were annotated as 14 distinct functional categories (Fig. 2a). Specifically, states N1 and N2 were characterized by high intensity of H3K4me2 and H3K4me3, and therefore were annotated as active promoters. Promoter flanking states (N3–N6) had predominantly H3K4me2, and were enriched around transcription start sites (TSSs) (Supplementary Fig. 5c). diHMM identified two nucleosome-level states (N7–N8) that were enriched in a repressive marker, H3K27me3, and an active marker, H3K4me2 or H3K4me1. Due to the spatial distribution difference, these states are annotated differently as bivalent promoters (N7) and poised enhancers (N8), respectively. Strong enhancer states (N9–N11) were associated with high H3K27ac and H3K4me1 signals, whereas weak enhancers (N12–N13) were enriched in H3K4me1. We found a category of transcribed enhancer states (N14–N19) that were enriched in gene body regions (Supplementary Fig. 5c), often associated with H3K36me3, H3K4me1 and sometimes in conjunction with H3K4me2. Transcriptional elongation states (N20–N21) were enriched in H3K36me3 but depleted in the enhancer markers. diHMM also found three states enriched in CTCF (N22–N24). Based on the spatial distributions, these states are further divided into two subcategories: CTCF promoter (N22) and CTCF (N23–N24) (Supplementary Fig. 5c). We also found a state (N25) that was enriched in only H4K20me1 and located downstream from TSS (Supplementary Fig. 5c). The polycomb repressed state (N26) was characterized by the enrichment of H3K27me3 and no other marks. The vast majority of the genome was characterized by a heterochromatin/low signal state (N27–N28). Finally, there were two infrequent states (N29–N30) characterized by the abundance of nearly all marks. These states typically fell in repetitive regions and therefore referred to as the repetitive/copy number variation (CNV) state.


Multi-scale chromatin state annotation using a hierarchical hidden Markov model
Annotation of the chromatin states identified by diHMM.(a) Emission probability matrix for our diHMM model that contains 30 domain-level and 30 nucleosome-level states. The scale varies linearly between 0 (white) and 1 (dark purple). Colour legend on the left shows our nucleosome-level state annotations. (b) Genomic annotation enrichment for our 30 nucleosome-level states in all cell types combined. Each column shows relative enrichment in a linear scale between 0 (white) and 1 (dark orange). (c) Fraction of genomic coverage in each cell type for each nucleosome-level state. The scale varies logarithmically between 10−4 (white) and 1 (dark blue). (d) Significant fold enrichments for nucleosome- and domain-level combinations. Only combinations for which false discovery rate (FDR) <0.01 (Fisher's exact test) are displayed above background level. The scale varies logarithmically between 1 (white) and 50 (dark green). Colour legend on the left shows our domain-level annotations. (e) Fraction of genomic coverage in each cell type for each domain-level state. The scale varies logarithmically between 10−4 (white) and 1 (dark blue).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5385569&req=5

f2: Annotation of the chromatin states identified by diHMM.(a) Emission probability matrix for our diHMM model that contains 30 domain-level and 30 nucleosome-level states. The scale varies linearly between 0 (white) and 1 (dark purple). Colour legend on the left shows our nucleosome-level state annotations. (b) Genomic annotation enrichment for our 30 nucleosome-level states in all cell types combined. Each column shows relative enrichment in a linear scale between 0 (white) and 1 (dark orange). (c) Fraction of genomic coverage in each cell type for each nucleosome-level state. The scale varies logarithmically between 10−4 (white) and 1 (dark blue). (d) Significant fold enrichments for nucleosome- and domain-level combinations. Only combinations for which false discovery rate (FDR) <0.01 (Fisher's exact test) are displayed above background level. The scale varies logarithmically between 1 (white) and 50 (dark green). Colour legend on the left shows our domain-level annotations. (e) Fraction of genomic coverage in each cell type for each domain-level state. The scale varies logarithmically between 10−4 (white) and 1 (dark blue).
Mentions: Using a similar strategy as in ChromHMM7, we functionally annotated the nucleosome-level states, based on the combinatorial pattern of ChIP-seq signals (Fig. 2a), the spatial distribution (Supplementary Fig. 5c) as well as the enrichment of various functionally relevant elements (Fig. 2b). In the end, these 30 nucleosome-level states were annotated as 14 distinct functional categories (Fig. 2a). Specifically, states N1 and N2 were characterized by high intensity of H3K4me2 and H3K4me3, and therefore were annotated as active promoters. Promoter flanking states (N3–N6) had predominantly H3K4me2, and were enriched around transcription start sites (TSSs) (Supplementary Fig. 5c). diHMM identified two nucleosome-level states (N7–N8) that were enriched in a repressive marker, H3K27me3, and an active marker, H3K4me2 or H3K4me1. Due to the spatial distribution difference, these states are annotated differently as bivalent promoters (N7) and poised enhancers (N8), respectively. Strong enhancer states (N9–N11) were associated with high H3K27ac and H3K4me1 signals, whereas weak enhancers (N12–N13) were enriched in H3K4me1. We found a category of transcribed enhancer states (N14–N19) that were enriched in gene body regions (Supplementary Fig. 5c), often associated with H3K36me3, H3K4me1 and sometimes in conjunction with H3K4me2. Transcriptional elongation states (N20–N21) were enriched in H3K36me3 but depleted in the enhancer markers. diHMM also found three states enriched in CTCF (N22–N24). Based on the spatial distributions, these states are further divided into two subcategories: CTCF promoter (N22) and CTCF (N23–N24) (Supplementary Fig. 5c). We also found a state (N25) that was enriched in only H4K20me1 and located downstream from TSS (Supplementary Fig. 5c). The polycomb repressed state (N26) was characterized by the enrichment of H3K27me3 and no other marks. The vast majority of the genome was characterized by a heterochromatin/low signal state (N27–N28). Finally, there were two infrequent states (N29–N30) characterized by the abundance of nearly all marks. These states typically fell in repetitive regions and therefore referred to as the repetitive/copy number variation (CNV) state.

View Article: PubMed Central - PubMed

ABSTRACT

Chromatin-state analysis is widely applied in the studies of development and diseases. However, existing methods operate at a single length scale, and therefore cannot distinguish large domains from isolated elements of the same type. To overcome this limitation, we present a hierarchical hidden Markov model, diHMM, to systematically annotate chromatin states at multiple length scales. We apply diHMM to analyse a public ChIP-seq data set. diHMM not only accurately captures nucleosome-level information, but identifies domain-level states that vary in nucleosome-level state composition, spatial distribution and functionality. The domain-level states recapitulate known patterns such as super-enhancers, bivalent promoters and Polycomb repressed regions, and identify additional patterns whose biological functions are not yet characterized. By integrating chromatin-state information with gene expression and Hi-C data, we identify context-dependent functions of nucleosome-level states. Thus, diHMM provides a powerful tool for investigating the role of higher-order chromatin structure in gene regulation.

No MeSH data available.