Limits...
Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle.

Zacher B, Lidschreiber M, Cramer P, Gagneur J, Tresch A - Mol. Syst. Biol. (2014)

Bottom Line: To overcome these limitations, we introduce bidirectional HMMs which infer directed genomic states from occupancy profiles de novo.Application to RNA polymerase II-associated factors in yeast and chromatin modifications in human T cells recovers the majority of transcribed loci, reveals gene-specific variations in the yeast transcription cycle and indicates the existence of directed chromatin state patterns at transcribed, but not at repressed, regions in the human genome.We anticipate bidirectional HMMs to significantly improve the analyses of genome-associated directed processes.

View Article: PubMed Central - PubMed

Affiliation: Gene Center and Department of Biochemistry, Center for Integrated Protein Science CIPSM, Ludwig-Maximilians-Universität München, Munich, Germany Institute for Genetics, University of Cologne, Cologne, Germany.

No MeSH data available.


Related in: MedlinePlus

Clustering of state paths reveals gene-specific variations in the transcription cycle.Genomic state sequences of 4,632 genes were clustered into 55 groups (left, only clusters containing at least 20 genes are labeled). Each line corresponds to the state sequence of a single gene. States are colored as shown in the legend.Clusters exhibit distinct state frequency distributions and transition patterns (shown as schematic flux diagrams on top of panels). Cluster 14 shows a transcription cycle closest to the canonical one proposed by Mayer et al (2010). Genomic state sequences of clusters 32 and 38 differ from the canonical one, indicating variations in the transcription cycle.Clusters 14 and 32 exhibit distinct recruitment of factors to genes. PolII subunit Rpb3, Nrd1, Spt5 and Spt16 binding is very similar in the beginning of genes, but decreases much more strongly in cluster 32 throughout the transcripts. Ctk1 and Paf1 are depleted at cluster 32, but not at cluster 14 genes.Cluster 14 shows the canonical Pol II (Rpb3) peak in the 5' region of genes, but Pol II reaches a stable, high level downstream of the TSS in cluster 38. This may suggest a lack of the mechanism for Pol II peaking observed in cluster 14. The steep increase of serine 2 phophorylation in cluster 38 might indicate that productive elongation is reached earlier at those genes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4300491&req=5

fig05: Clustering of state paths reveals gene-specific variations in the transcription cycle.Genomic state sequences of 4,632 genes were clustered into 55 groups (left, only clusters containing at least 20 genes are labeled). Each line corresponds to the state sequence of a single gene. States are colored as shown in the legend.Clusters exhibit distinct state frequency distributions and transition patterns (shown as schematic flux diagrams on top of panels). Cluster 14 shows a transcription cycle closest to the canonical one proposed by Mayer et al (2010). Genomic state sequences of clusters 32 and 38 differ from the canonical one, indicating variations in the transcription cycle.Clusters 14 and 32 exhibit distinct recruitment of factors to genes. PolII subunit Rpb3, Nrd1, Spt5 and Spt16 binding is very similar in the beginning of genes, but decreases much more strongly in cluster 32 throughout the transcripts. Ctk1 and Paf1 are depleted at cluster 32, but not at cluster 14 genes.Cluster 14 shows the canonical Pol II (Rpb3) peak in the 5' region of genes, but Pol II reaches a stable, high level downstream of the TSS in cluster 38. This may suggest a lack of the mechanism for Pol II peaking observed in cluster 14. The steep increase of serine 2 phophorylation in cluster 38 might indicate that productive elongation is reached earlier at those genes.

Mentions: For a systematic investigation of gene-specific variation during the transcription cycle, we clustered genes based on their annotated state path. To that end, the state paths of 4,263 genes were rescaled to a common length and clustered into 55 groups according to their Hamming distance (Fig5A and 5B, Materials and Methods). The obtained gene clusters show distinct patterns of protein occupancies, suggesting mechanistic differences in transcription (Fig5, Supplementary Fig S2 and below). Moreover, the gene clusters differed by gene length, expression level and genomic context (e.g., termination overlaps with neighboring downstream promoters or bidirectionality of promoters). Gene set enrichment analysis showed that clusters also corresponded to distinct functional gene groups (Supplementary Table S1). The functional categories range from housekeeping (e.g., cluster 14, 38), cell cycle (e.g., cluster 17) to stress response (e.g., cluster 39). For instance, the high expression of cluster 38 and 14 is in accordance with their associated functions including ribosome biogenesis, positive regulation of transcription, translation or nucleosome assembly. More strikingly, we found the DNA binding motif of SFP1—a regulator of ribosomal protein and ribosome biogenesis genes—to be enriched in promoter state P/T1 (which is a frequent promoter state of cluster 14 and 38 genes, Supplementary Fig S4). In contrast, stress- and autophagy-related genes in cluster 39 show very low expression and protein binding (Supplementary Fig S2B). Altogether, this suggests that different transcription cycles as they are modeled by the bdHMM correspond to different co-regulated gene sets.


Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle.

Zacher B, Lidschreiber M, Cramer P, Gagneur J, Tresch A - Mol. Syst. Biol. (2014)

Clustering of state paths reveals gene-specific variations in the transcription cycle.Genomic state sequences of 4,632 genes were clustered into 55 groups (left, only clusters containing at least 20 genes are labeled). Each line corresponds to the state sequence of a single gene. States are colored as shown in the legend.Clusters exhibit distinct state frequency distributions and transition patterns (shown as schematic flux diagrams on top of panels). Cluster 14 shows a transcription cycle closest to the canonical one proposed by Mayer et al (2010). Genomic state sequences of clusters 32 and 38 differ from the canonical one, indicating variations in the transcription cycle.Clusters 14 and 32 exhibit distinct recruitment of factors to genes. PolII subunit Rpb3, Nrd1, Spt5 and Spt16 binding is very similar in the beginning of genes, but decreases much more strongly in cluster 32 throughout the transcripts. Ctk1 and Paf1 are depleted at cluster 32, but not at cluster 14 genes.Cluster 14 shows the canonical Pol II (Rpb3) peak in the 5' region of genes, but Pol II reaches a stable, high level downstream of the TSS in cluster 38. This may suggest a lack of the mechanism for Pol II peaking observed in cluster 14. The steep increase of serine 2 phophorylation in cluster 38 might indicate that productive elongation is reached earlier at those genes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4300491&req=5

fig05: Clustering of state paths reveals gene-specific variations in the transcription cycle.Genomic state sequences of 4,632 genes were clustered into 55 groups (left, only clusters containing at least 20 genes are labeled). Each line corresponds to the state sequence of a single gene. States are colored as shown in the legend.Clusters exhibit distinct state frequency distributions and transition patterns (shown as schematic flux diagrams on top of panels). Cluster 14 shows a transcription cycle closest to the canonical one proposed by Mayer et al (2010). Genomic state sequences of clusters 32 and 38 differ from the canonical one, indicating variations in the transcription cycle.Clusters 14 and 32 exhibit distinct recruitment of factors to genes. PolII subunit Rpb3, Nrd1, Spt5 and Spt16 binding is very similar in the beginning of genes, but decreases much more strongly in cluster 32 throughout the transcripts. Ctk1 and Paf1 are depleted at cluster 32, but not at cluster 14 genes.Cluster 14 shows the canonical Pol II (Rpb3) peak in the 5' region of genes, but Pol II reaches a stable, high level downstream of the TSS in cluster 38. This may suggest a lack of the mechanism for Pol II peaking observed in cluster 14. The steep increase of serine 2 phophorylation in cluster 38 might indicate that productive elongation is reached earlier at those genes.
Mentions: For a systematic investigation of gene-specific variation during the transcription cycle, we clustered genes based on their annotated state path. To that end, the state paths of 4,263 genes were rescaled to a common length and clustered into 55 groups according to their Hamming distance (Fig5A and 5B, Materials and Methods). The obtained gene clusters show distinct patterns of protein occupancies, suggesting mechanistic differences in transcription (Fig5, Supplementary Fig S2 and below). Moreover, the gene clusters differed by gene length, expression level and genomic context (e.g., termination overlaps with neighboring downstream promoters or bidirectionality of promoters). Gene set enrichment analysis showed that clusters also corresponded to distinct functional gene groups (Supplementary Table S1). The functional categories range from housekeeping (e.g., cluster 14, 38), cell cycle (e.g., cluster 17) to stress response (e.g., cluster 39). For instance, the high expression of cluster 38 and 14 is in accordance with their associated functions including ribosome biogenesis, positive regulation of transcription, translation or nucleosome assembly. More strikingly, we found the DNA binding motif of SFP1—a regulator of ribosomal protein and ribosome biogenesis genes—to be enriched in promoter state P/T1 (which is a frequent promoter state of cluster 14 and 38 genes, Supplementary Fig S4). In contrast, stress- and autophagy-related genes in cluster 39 show very low expression and protein binding (Supplementary Fig S2B). Altogether, this suggests that different transcription cycles as they are modeled by the bdHMM correspond to different co-regulated gene sets.

Bottom Line: To overcome these limitations, we introduce bidirectional HMMs which infer directed genomic states from occupancy profiles de novo.Application to RNA polymerase II-associated factors in yeast and chromatin modifications in human T cells recovers the majority of transcribed loci, reveals gene-specific variations in the yeast transcription cycle and indicates the existence of directed chromatin state patterns at transcribed, but not at repressed, regions in the human genome.We anticipate bidirectional HMMs to significantly improve the analyses of genome-associated directed processes.

View Article: PubMed Central - PubMed

Affiliation: Gene Center and Department of Biochemistry, Center for Integrated Protein Science CIPSM, Ludwig-Maximilians-Universität München, Munich, Germany Institute for Genetics, University of Cologne, Cologne, Germany.

No MeSH data available.


Related in: MedlinePlus