Limits...
A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data.

Han Z, Tian L, Pécot T, Huang T, Machiraju R, Huang K - BMC Bioinformatics (2012)

Bottom Line: Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7.The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets.We demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics.

View Article: PubMed Central - HTML - PubMed

Affiliation: College of Software, Nankai University, Tianjin, China.

ABSTRACT

Background: RNA polymerase II (PolII) is essential in gene transcription and ChIP-seq experiments have been used to study PolII binding patterns over the entire genome. However, since PolII enriched regions in the genome can be very long, existing peak finding algorithms for ChIP-seq data are not adequate for identifying such long regions.

Methods: Here we propose an enriched region detection method for ChIP-seq data to identify long enriched regions by combining a signal denoising algorithm with a false discovery rate (FDR) approach. The binned ChIP-seq data for PolII are first processed using a non-local means (NL-means) algorithm for purposes of denoising. Then, a FDR approach is developed to determine the threshold for marking enriched regions in the binned histogram.

Results: We first test our method using a public PolII ChIP-seq dataset and compare our results with published results obtained using the published algorithm HPeak. Our results show a high consistency with the published results (80-100%). Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7. The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets. Specifically, pertaining to MCF7 control samples we identified 5,911 segments with length of at least 4 Kbp (maximum 233,000 bp); and in MCF7 treated with E2 samples, we identified 6,200 such segments (maximum 325,000 bp).

Conclusions: We demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics. Our method complements existing peak detection algorithms for ChIP-seq experiments.

Show MeSH

Related in: MedlinePlus

Examples of PolII ChIP-seq data for MCF7 cell line. ChIP-seq data for PolII binding pattern on SEMA3C in MCF7 cell control samples. The top lane shows the histogram of the PolII binding densities over a range of genome. The gene covered by this range is shown in the bottom lane. In the bottom lane, the thick bars below the gene symbol indicate exons of the gene while the blue arrow indicates its orientation. The tail and head of the arrow correspond to the transcription starting site (TSS) and transcription ending site (TES) of the gene respectively. The same arrangements are also applied to the other figures. It is apparent that PolII not only binds to the TSS regions of the gene but also form long enriched regions over the entire transcript.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3375632&req=5

Figure 1: Examples of PolII ChIP-seq data for MCF7 cell line. ChIP-seq data for PolII binding pattern on SEMA3C in MCF7 cell control samples. The top lane shows the histogram of the PolII binding densities over a range of genome. The gene covered by this range is shown in the bottom lane. In the bottom lane, the thick bars below the gene symbol indicate exons of the gene while the blue arrow indicates its orientation. The tail and head of the arrow correspond to the transcription starting site (TSS) and transcription ending site (TES) of the gene respectively. The same arrangements are also applied to the other figures. It is apparent that PolII not only binds to the TSS regions of the gene but also form long enriched regions over the entire transcript.

Mentions: PolII plays an essential role in gene transcription. During transcription, it is responsible for the synthesis of nascent messenger RNA molecules (mRNA) for protein-coding genes and microRNAs [4]. The nascent mRNAs then go through a series of processing steps including splicing to form mature mRNAs. To transcribe a gene, PolII will undergose several steps including recruitment, initiation, elongation, and dissociation [4,5]. In addition, PolII pausing and pre-mature dissociation will cause stalling of the transcription process [4,5]. Thus, accurately characterization of PolII binding patterns over the entire genome is of great importance in studying the dynamics of transcription as well as contributing to the characterization of nascent mRNA, which cannot be directly inferred from gene expression microarray or regular RNA-seq technologies since they focus on mature mRNA. However, since during transcription PolII elongates along the entire gene, the PolII binding pattern over a gene is usually not just a single peak but forms elongated regions as manifest in ChIP-seq data. PolII enriched regions can stretch to several thousands of basepairs (Figure 1). Traditionally, ChIP-seq data analysis methods rely on peak region detection algorithm to delineate genomic regions with enriched protein bindings. However, the binding pattern of PolII poses a very different paradigm of computing and in turn significant challenges. Several peak detection algorithms were developed for delineating transcription factor binding sites and the anticipated regions are short (e.g., 200-1500 bp) [6-12] thus rendering such algorithms inadequate for studying proteins with prevalent binding over the entire genome such as PolII.


A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data.

Han Z, Tian L, Pécot T, Huang T, Machiraju R, Huang K - BMC Bioinformatics (2012)

Examples of PolII ChIP-seq data for MCF7 cell line. ChIP-seq data for PolII binding pattern on SEMA3C in MCF7 cell control samples. The top lane shows the histogram of the PolII binding densities over a range of genome. The gene covered by this range is shown in the bottom lane. In the bottom lane, the thick bars below the gene symbol indicate exons of the gene while the blue arrow indicates its orientation. The tail and head of the arrow correspond to the transcription starting site (TSS) and transcription ending site (TES) of the gene respectively. The same arrangements are also applied to the other figures. It is apparent that PolII not only binds to the TSS regions of the gene but also form long enriched regions over the entire transcript.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3375632&req=5

Figure 1: Examples of PolII ChIP-seq data for MCF7 cell line. ChIP-seq data for PolII binding pattern on SEMA3C in MCF7 cell control samples. The top lane shows the histogram of the PolII binding densities over a range of genome. The gene covered by this range is shown in the bottom lane. In the bottom lane, the thick bars below the gene symbol indicate exons of the gene while the blue arrow indicates its orientation. The tail and head of the arrow correspond to the transcription starting site (TSS) and transcription ending site (TES) of the gene respectively. The same arrangements are also applied to the other figures. It is apparent that PolII not only binds to the TSS regions of the gene but also form long enriched regions over the entire transcript.
Mentions: PolII plays an essential role in gene transcription. During transcription, it is responsible for the synthesis of nascent messenger RNA molecules (mRNA) for protein-coding genes and microRNAs [4]. The nascent mRNAs then go through a series of processing steps including splicing to form mature mRNAs. To transcribe a gene, PolII will undergose several steps including recruitment, initiation, elongation, and dissociation [4,5]. In addition, PolII pausing and pre-mature dissociation will cause stalling of the transcription process [4,5]. Thus, accurately characterization of PolII binding patterns over the entire genome is of great importance in studying the dynamics of transcription as well as contributing to the characterization of nascent mRNA, which cannot be directly inferred from gene expression microarray or regular RNA-seq technologies since they focus on mature mRNA. However, since during transcription PolII elongates along the entire gene, the PolII binding pattern over a gene is usually not just a single peak but forms elongated regions as manifest in ChIP-seq data. PolII enriched regions can stretch to several thousands of basepairs (Figure 1). Traditionally, ChIP-seq data analysis methods rely on peak region detection algorithm to delineate genomic regions with enriched protein bindings. However, the binding pattern of PolII poses a very different paradigm of computing and in turn significant challenges. Several peak detection algorithms were developed for delineating transcription factor binding sites and the anticipated regions are short (e.g., 200-1500 bp) [6-12] thus rendering such algorithms inadequate for studying proteins with prevalent binding over the entire genome such as PolII.

Bottom Line: Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7.The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets.We demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics.

View Article: PubMed Central - HTML - PubMed

Affiliation: College of Software, Nankai University, Tianjin, China.

ABSTRACT

Background: RNA polymerase II (PolII) is essential in gene transcription and ChIP-seq experiments have been used to study PolII binding patterns over the entire genome. However, since PolII enriched regions in the genome can be very long, existing peak finding algorithms for ChIP-seq data are not adequate for identifying such long regions.

Methods: Here we propose an enriched region detection method for ChIP-seq data to identify long enriched regions by combining a signal denoising algorithm with a false discovery rate (FDR) approach. The binned ChIP-seq data for PolII are first processed using a non-local means (NL-means) algorithm for purposes of denoising. Then, a FDR approach is developed to determine the threshold for marking enriched regions in the binned histogram.

Results: We first test our method using a public PolII ChIP-seq dataset and compare our results with published results obtained using the published algorithm HPeak. Our results show a high consistency with the published results (80-100%). Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7. The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets. Specifically, pertaining to MCF7 control samples we identified 5,911 segments with length of at least 4 Kbp (maximum 233,000 bp); and in MCF7 treated with E2 samples, we identified 6,200 such segments (maximum 325,000 bp).

Conclusions: We demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics. Our method complements existing peak detection algorithms for ChIP-seq experiments.

Show MeSH
Related in: MedlinePlus