Limits...
A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data.

Han Z, Tian L, Pécot T, Huang T, Machiraju R, Huang K - BMC Bioinformatics (2012)

Bottom Line: Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7.The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets.We demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics.

View Article: PubMed Central - HTML - PubMed

Affiliation: College of Software, Nankai University, Tianjin, China.

ABSTRACT

Background: RNA polymerase II (PolII) is essential in gene transcription and ChIP-seq experiments have been used to study PolII binding patterns over the entire genome. However, since PolII enriched regions in the genome can be very long, existing peak finding algorithms for ChIP-seq data are not adequate for identifying such long regions.

Methods: Here we propose an enriched region detection method for ChIP-seq data to identify long enriched regions by combining a signal denoising algorithm with a false discovery rate (FDR) approach. The binned ChIP-seq data for PolII are first processed using a non-local means (NL-means) algorithm for purposes of denoising. Then, a FDR approach is developed to determine the threshold for marking enriched regions in the binned histogram.

Results: We first test our method using a public PolII ChIP-seq dataset and compare our results with published results obtained using the published algorithm HPeak. Our results show a high consistency with the published results (80-100%). Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7. The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets. Specifically, pertaining to MCF7 control samples we identified 5,911 segments with length of at least 4 Kbp (maximum 233,000 bp); and in MCF7 treated with E2 samples, we identified 6,200 such segments (maximum 325,000 bp).

Conclusions: We demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics. Our method complements existing peak detection algorithms for ChIP-seq experiments.

Show MeSH

Related in: MedlinePlus

The PolII binding patterns and expression levels for PLK2 gene. Top: the PolII binding patterns for PLK2 gene in control (first lane) and E2 treated samples (second lane). PolII shows a higher peak for the E2 treated sample but lower total amount of binding over the transcript. Red bar indicates the 16 Kbp region detected using our method in MCF7 and blue bar indicates the 14 Kbp region detected in MCF7+E2. Bottom: the expression levels of PLK2 gene in the two different conditions (n = 4). Error bar is for standard deviation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3375632&req=5

Figure 5: The PolII binding patterns and expression levels for PLK2 gene. Top: the PolII binding patterns for PLK2 gene in control (first lane) and E2 treated samples (second lane). PolII shows a higher peak for the E2 treated sample but lower total amount of binding over the transcript. Red bar indicates the 16 Kbp region detected using our method in MCF7 and blue bar indicates the 14 Kbp region detected in MCF7+E2. Bottom: the expression levels of PLK2 gene in the two different conditions (n = 4). Error bar is for standard deviation.

Mentions: Identification of long segments for PolII binding are important for further investigation for understanding gene transcription regulation as well as potentially discovery novel transcripts and alternative promoters. For gene transcription, while PolII binding density at promoter around the TSSs was considered to determine gene transcription levels, recent studies show that the density of PolII binding on gene body is also critical [5,22]. We also observed such phenomena using the above identified segments. For instance, as shown in Figure 5, a segment of 16,000 bp has been identified over the transcript of the gene PLK2 on human chromosome 5. The MCF7 control sample has more sequencing reads over this region than the MCF7 sample treated with E2 sample (958 vs 454 reads with similar amount of total reads in chromosome 5 between the two samples). Although, the height of the "peak" at the TSS region in the MCF7 control sample is lower than that in the MCF7 E2 treated sample, the total transcription level (measured by Affymetrix gene expression array) is still higher in MCF7 control by a factor of 3.95-fold (Student t-test p = 3.872 × 10-6).


A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data.

Han Z, Tian L, Pécot T, Huang T, Machiraju R, Huang K - BMC Bioinformatics (2012)

The PolII binding patterns and expression levels for PLK2 gene. Top: the PolII binding patterns for PLK2 gene in control (first lane) and E2 treated samples (second lane). PolII shows a higher peak for the E2 treated sample but lower total amount of binding over the transcript. Red bar indicates the 16 Kbp region detected using our method in MCF7 and blue bar indicates the 14 Kbp region detected in MCF7+E2. Bottom: the expression levels of PLK2 gene in the two different conditions (n = 4). Error bar is for standard deviation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3375632&req=5

Figure 5: The PolII binding patterns and expression levels for PLK2 gene. Top: the PolII binding patterns for PLK2 gene in control (first lane) and E2 treated samples (second lane). PolII shows a higher peak for the E2 treated sample but lower total amount of binding over the transcript. Red bar indicates the 16 Kbp region detected using our method in MCF7 and blue bar indicates the 14 Kbp region detected in MCF7+E2. Bottom: the expression levels of PLK2 gene in the two different conditions (n = 4). Error bar is for standard deviation.
Mentions: Identification of long segments for PolII binding are important for further investigation for understanding gene transcription regulation as well as potentially discovery novel transcripts and alternative promoters. For gene transcription, while PolII binding density at promoter around the TSSs was considered to determine gene transcription levels, recent studies show that the density of PolII binding on gene body is also critical [5,22]. We also observed such phenomena using the above identified segments. For instance, as shown in Figure 5, a segment of 16,000 bp has been identified over the transcript of the gene PLK2 on human chromosome 5. The MCF7 control sample has more sequencing reads over this region than the MCF7 sample treated with E2 sample (958 vs 454 reads with similar amount of total reads in chromosome 5 between the two samples). Although, the height of the "peak" at the TSS region in the MCF7 control sample is lower than that in the MCF7 E2 treated sample, the total transcription level (measured by Affymetrix gene expression array) is still higher in MCF7 control by a factor of 3.95-fold (Student t-test p = 3.872 × 10-6).

Bottom Line: Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7.The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets.We demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics.

View Article: PubMed Central - HTML - PubMed

Affiliation: College of Software, Nankai University, Tianjin, China.

ABSTRACT

Background: RNA polymerase II (PolII) is essential in gene transcription and ChIP-seq experiments have been used to study PolII binding patterns over the entire genome. However, since PolII enriched regions in the genome can be very long, existing peak finding algorithms for ChIP-seq data are not adequate for identifying such long regions.

Methods: Here we propose an enriched region detection method for ChIP-seq data to identify long enriched regions by combining a signal denoising algorithm with a false discovery rate (FDR) approach. The binned ChIP-seq data for PolII are first processed using a non-local means (NL-means) algorithm for purposes of denoising. Then, a FDR approach is developed to determine the threshold for marking enriched regions in the binned histogram.

Results: We first test our method using a public PolII ChIP-seq dataset and compare our results with published results obtained using the published algorithm HPeak. Our results show a high consistency with the published results (80-100%). Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7. The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets. Specifically, pertaining to MCF7 control samples we identified 5,911 segments with length of at least 4 Kbp (maximum 233,000 bp); and in MCF7 treated with E2 samples, we identified 6,200 such segments (maximum 325,000 bp).

Conclusions: We demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics. Our method complements existing peak detection algorithms for ChIP-seq experiments.

Show MeSH
Related in: MedlinePlus