Limits...
A non-parametric peak calling algorithm for DamID-Seq.

Li R, Hempel LU, Jiang T - PLoS ONE (2015)

Bottom Line: Protein-DNA interactions play a significant role in gene regulation and expression.Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates.In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

View Article: PubMed Central - PubMed

Affiliation: State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150040, China; Laboratory of Cellular and Developmental Biology, NIDDK/NIH, Bethesda, MD, United States of America.

ABSTRACT
Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

No MeSH data available.


Distributions of local signal enrichment (log2 fold changes) and Dpnl cutting sites in the Drosophila genome.A: We resampled 90% of the control (Dam only) reads each time, followed by reads scaling and local kernel smoothing at 100-bp running windows for the treatment and control, respectively. We then calculated log2 fold changes in each running window. B: Distribution of the distances in base pairs between adjacent Dpnl cutting sites in the Drosophila melanogaster genome.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4364623&req=5

pone.0117415.g002: Distributions of local signal enrichment (log2 fold changes) and Dpnl cutting sites in the Drosophila genome.A: We resampled 90% of the control (Dam only) reads each time, followed by reads scaling and local kernel smoothing at 100-bp running windows for the treatment and control, respectively. We then calculated log2 fold changes in each running window. B: Distribution of the distances in base pairs between adjacent Dpnl cutting sites in the Drosophila melanogaster genome.

Mentions: Genome-wide distribution of local signal enrichment is illustrated by a Dam-DsxF sample contrasting with a corresponding Dam only (Fig. 2A). A similar distribution has been observed for Dam-DsxM (data not shown). The distribution reveals an interesting pattern; that is, on the left side the log2 fold changes are markedly variable, which is in sharp contrast to the right side. We suspect that when the DamID signals are weaker than the background signals, the latter vary markedly. However, when the DamID signals are stronger than the background signals, the latter have a small variation across windows.


A non-parametric peak calling algorithm for DamID-Seq.

Li R, Hempel LU, Jiang T - PLoS ONE (2015)

Distributions of local signal enrichment (log2 fold changes) and Dpnl cutting sites in the Drosophila genome.A: We resampled 90% of the control (Dam only) reads each time, followed by reads scaling and local kernel smoothing at 100-bp running windows for the treatment and control, respectively. We then calculated log2 fold changes in each running window. B: Distribution of the distances in base pairs between adjacent Dpnl cutting sites in the Drosophila melanogaster genome.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4364623&req=5

pone.0117415.g002: Distributions of local signal enrichment (log2 fold changes) and Dpnl cutting sites in the Drosophila genome.A: We resampled 90% of the control (Dam only) reads each time, followed by reads scaling and local kernel smoothing at 100-bp running windows for the treatment and control, respectively. We then calculated log2 fold changes in each running window. B: Distribution of the distances in base pairs between adjacent Dpnl cutting sites in the Drosophila melanogaster genome.
Mentions: Genome-wide distribution of local signal enrichment is illustrated by a Dam-DsxF sample contrasting with a corresponding Dam only (Fig. 2A). A similar distribution has been observed for Dam-DsxM (data not shown). The distribution reveals an interesting pattern; that is, on the left side the log2 fold changes are markedly variable, which is in sharp contrast to the right side. We suspect that when the DamID signals are weaker than the background signals, the latter vary markedly. However, when the DamID signals are stronger than the background signals, the latter have a small variation across windows.

Bottom Line: Protein-DNA interactions play a significant role in gene regulation and expression.Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates.In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

View Article: PubMed Central - PubMed

Affiliation: State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150040, China; Laboratory of Cellular and Developmental Biology, NIDDK/NIH, Bethesda, MD, United States of America.

ABSTRACT
Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

No MeSH data available.