Limits...
damidseq_pipeline: an automated pipeline for processing DamID sequencing datasets.

Marshall OJ, Brand AH - Bioinformatics (2015)

Bottom Line: DamID is a powerful technique for identifying regions of the genome bound by a DNA-binding (or DNA-associated) protein.DamID-seq thus presents novel challenges in terms of normalization and background minimization.We describe here damidseq_pipeline, a software pipeline that performs automatic normalization and background reduction on multiple DamID-seq FASTQ datasets.

View Article: PubMed Central - PubMed

Affiliation: Wellcome Trust/Cancer Research UK Gurdon Institute, Cambridge, CB2 1QN, UK.

No MeSH data available.


Results of the damidseq_pipeline. (A) The gene eyeless (ey) (highlighted) is expressed in D. melanogaster laval neural stem cells (Southall et al., 2013) and previously published microarray DamID in these cells (i) shows RNA polymerase II occupancy (Southall et al., 2013). (B) Performing DamID-seq in the same cell type illustrates the high correlation between Dam-Pol II (i) and Dam alone (ii) in terms of RPM (read counts/million mapped reads). Taking the ratio of the two RPM-normalized datasets fails to show significant RNA pol II occupancy at ey (iii); however, processing via the damidseq_pipeline software successfully recovers the RNA pol II occupancy profile while minimizing background (iv). See Supplementary Methods for experimental details
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4595905&req=5

btv386-F1: Results of the damidseq_pipeline. (A) The gene eyeless (ey) (highlighted) is expressed in D. melanogaster laval neural stem cells (Southall et al., 2013) and previously published microarray DamID in these cells (i) shows RNA polymerase II occupancy (Southall et al., 2013). (B) Performing DamID-seq in the same cell type illustrates the high correlation between Dam-Pol II (i) and Dam alone (ii) in terms of RPM (read counts/million mapped reads). Taking the ratio of the two RPM-normalized datasets fails to show significant RNA pol II occupancy at ey (iii); however, processing via the damidseq_pipeline software successfully recovers the RNA pol II occupancy profile while minimizing background (iv). See Supplementary Methods for experimental details

Mentions: Although DamID-seq data can be aligned and binned as per all NGS data, two issues arise that are specific to DamID. The first major consideration is the correct normalization of the Dam-fusion and Dam-control samples. The greatest contribution to many Dam-fusion protein datasets is the non-specific methylation of accessible genomic regions (e.g. Fig. 1B), with a mean correlation between Dam alone and Dam-fusion datasets of 0.70 (n = 4, Spearman’s correlation). Representing the data as a (Dam-fusion/Dam) ratio in theory negates such non-specific methylation. However, strong methylation signals at highly bound regions in the Dam-fusion dataset will reduce the relative numbers of reads present at accessible genomic regions in this dataset (see, for example, the occupancy of Dam-RNA Pol II over the eyeless gene in Fig. 1), and normalizing the data based on read counts alone can therefore produce a strong negative bias to the ratio file [Fig. 1B (iii), Supplementary Fig. S5A]. Depending on the characteristics of the fusion protein, this negative bias can lead to real signal being lost (Fig. 1). Although microarray data inadvertently overcame this bias through the manual adjustment of laser intensities during microarray scanning, until now no method has existed for correctly normalizing DamID-seq datasets.Fig. 1.


damidseq_pipeline: an automated pipeline for processing DamID sequencing datasets.

Marshall OJ, Brand AH - Bioinformatics (2015)

Results of the damidseq_pipeline. (A) The gene eyeless (ey) (highlighted) is expressed in D. melanogaster laval neural stem cells (Southall et al., 2013) and previously published microarray DamID in these cells (i) shows RNA polymerase II occupancy (Southall et al., 2013). (B) Performing DamID-seq in the same cell type illustrates the high correlation between Dam-Pol II (i) and Dam alone (ii) in terms of RPM (read counts/million mapped reads). Taking the ratio of the two RPM-normalized datasets fails to show significant RNA pol II occupancy at ey (iii); however, processing via the damidseq_pipeline software successfully recovers the RNA pol II occupancy profile while minimizing background (iv). See Supplementary Methods for experimental details
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4595905&req=5

btv386-F1: Results of the damidseq_pipeline. (A) The gene eyeless (ey) (highlighted) is expressed in D. melanogaster laval neural stem cells (Southall et al., 2013) and previously published microarray DamID in these cells (i) shows RNA polymerase II occupancy (Southall et al., 2013). (B) Performing DamID-seq in the same cell type illustrates the high correlation between Dam-Pol II (i) and Dam alone (ii) in terms of RPM (read counts/million mapped reads). Taking the ratio of the two RPM-normalized datasets fails to show significant RNA pol II occupancy at ey (iii); however, processing via the damidseq_pipeline software successfully recovers the RNA pol II occupancy profile while minimizing background (iv). See Supplementary Methods for experimental details
Mentions: Although DamID-seq data can be aligned and binned as per all NGS data, two issues arise that are specific to DamID. The first major consideration is the correct normalization of the Dam-fusion and Dam-control samples. The greatest contribution to many Dam-fusion protein datasets is the non-specific methylation of accessible genomic regions (e.g. Fig. 1B), with a mean correlation between Dam alone and Dam-fusion datasets of 0.70 (n = 4, Spearman’s correlation). Representing the data as a (Dam-fusion/Dam) ratio in theory negates such non-specific methylation. However, strong methylation signals at highly bound regions in the Dam-fusion dataset will reduce the relative numbers of reads present at accessible genomic regions in this dataset (see, for example, the occupancy of Dam-RNA Pol II over the eyeless gene in Fig. 1), and normalizing the data based on read counts alone can therefore produce a strong negative bias to the ratio file [Fig. 1B (iii), Supplementary Fig. S5A]. Depending on the characteristics of the fusion protein, this negative bias can lead to real signal being lost (Fig. 1). Although microarray data inadvertently overcame this bias through the manual adjustment of laser intensities during microarray scanning, until now no method has existed for correctly normalizing DamID-seq datasets.Fig. 1.

Bottom Line: DamID is a powerful technique for identifying regions of the genome bound by a DNA-binding (or DNA-associated) protein.DamID-seq thus presents novel challenges in terms of normalization and background minimization.We describe here damidseq_pipeline, a software pipeline that performs automatic normalization and background reduction on multiple DamID-seq FASTQ datasets.

View Article: PubMed Central - PubMed

Affiliation: Wellcome Trust/Cancer Research UK Gurdon Institute, Cambridge, CB2 1QN, UK.

No MeSH data available.