Limits...
Picking ChIP-seq peak detectors for analyzing chromatin modification experiments.

Micsinai M, Parisi F, Strino F, Asp P, Dynlacht BD, Kluger Y - Nucleic Acids Res. (2012)

Bottom Line: In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets.Nevertheless, default parameters show the most stable performance, suggesting that they should be used.This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.

View Article: PubMed Central - PubMed

Affiliation: Yale University School of Medicine, Department of Pathology, New Haven, CT 06520, USA.

ABSTRACT
Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.

Show MeSH

Related in: MedlinePlus

Comparison of unbiased AUCROC performance estimates in existing and novel qPCR validated histone modification datasets. Algorithms are in order of publication. LOESS estimators have been added to facilitate visualization. (A) AUCROC statistics of the 15 ChIP-Seq algorithms sorted according to their time of publication shows incremental improvements in existing qPCR validated histone modification datasets (MYO.H3K27me3.GM, MYO.H3K27me3.MT, ES.H3K4me3 and ES.H3K27me3). The lines indicate that over time algorithms achieved better performance. (B) AUCROC statistics of the 15 algorithms sorted according their time of publication shows incremental improvements in novel qPCR validated histone modification datasets (MYO.H3K27me3.GM.VDA and MYO.H3K36me3.GM.VDA). The lines indicate that over time algorithms achieved better performance.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3351193&req=5

gks048-F3: Comparison of unbiased AUCROC performance estimates in existing and novel qPCR validated histone modification datasets. Algorithms are in order of publication. LOESS estimators have been added to facilitate visualization. (A) AUCROC statistics of the 15 ChIP-Seq algorithms sorted according to their time of publication shows incremental improvements in existing qPCR validated histone modification datasets (MYO.H3K27me3.GM, MYO.H3K27me3.MT, ES.H3K4me3 and ES.H3K27me3). The lines indicate that over time algorithms achieved better performance. (B) AUCROC statistics of the 15 algorithms sorted according their time of publication shows incremental improvements in novel qPCR validated histone modification datasets (MYO.H3K27me3.GM.VDA and MYO.H3K36me3.GM.VDA). The lines indicate that over time algorithms achieved better performance.

Mentions: We initially assessed the performance of each algorithm using publicly available qPCR validation datasets (Table 2). Our unbiased AUCROC measure [Equation (4)] indicated differences between algorithms, suggesting that on average the most recent algorithms had improved performance (Figure 3A and Tables 3 and 4). Using the existing histone qPCR validation datasets, ChIPDiff, FindPeaks FSeq, Qeseq, RSEG, SWEMBL and TPIC have AUCROCs > 0.8 in the majority of cases (Figure 3A). On average, in the four existing datasets the top performing algorithms had nearly identical AUCROCs: RSEG (average AUCROC = 0.87), Qeseq and FindPeaks (both with average AUCROC = 0.85).Figure 3.


Picking ChIP-seq peak detectors for analyzing chromatin modification experiments.

Micsinai M, Parisi F, Strino F, Asp P, Dynlacht BD, Kluger Y - Nucleic Acids Res. (2012)

Comparison of unbiased AUCROC performance estimates in existing and novel qPCR validated histone modification datasets. Algorithms are in order of publication. LOESS estimators have been added to facilitate visualization. (A) AUCROC statistics of the 15 ChIP-Seq algorithms sorted according to their time of publication shows incremental improvements in existing qPCR validated histone modification datasets (MYO.H3K27me3.GM, MYO.H3K27me3.MT, ES.H3K4me3 and ES.H3K27me3). The lines indicate that over time algorithms achieved better performance. (B) AUCROC statistics of the 15 algorithms sorted according their time of publication shows incremental improvements in novel qPCR validated histone modification datasets (MYO.H3K27me3.GM.VDA and MYO.H3K36me3.GM.VDA). The lines indicate that over time algorithms achieved better performance.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3351193&req=5

gks048-F3: Comparison of unbiased AUCROC performance estimates in existing and novel qPCR validated histone modification datasets. Algorithms are in order of publication. LOESS estimators have been added to facilitate visualization. (A) AUCROC statistics of the 15 ChIP-Seq algorithms sorted according to their time of publication shows incremental improvements in existing qPCR validated histone modification datasets (MYO.H3K27me3.GM, MYO.H3K27me3.MT, ES.H3K4me3 and ES.H3K27me3). The lines indicate that over time algorithms achieved better performance. (B) AUCROC statistics of the 15 algorithms sorted according their time of publication shows incremental improvements in novel qPCR validated histone modification datasets (MYO.H3K27me3.GM.VDA and MYO.H3K36me3.GM.VDA). The lines indicate that over time algorithms achieved better performance.
Mentions: We initially assessed the performance of each algorithm using publicly available qPCR validation datasets (Table 2). Our unbiased AUCROC measure [Equation (4)] indicated differences between algorithms, suggesting that on average the most recent algorithms had improved performance (Figure 3A and Tables 3 and 4). Using the existing histone qPCR validation datasets, ChIPDiff, FindPeaks FSeq, Qeseq, RSEG, SWEMBL and TPIC have AUCROCs > 0.8 in the majority of cases (Figure 3A). On average, in the four existing datasets the top performing algorithms had nearly identical AUCROCs: RSEG (average AUCROC = 0.87), Qeseq and FindPeaks (both with average AUCROC = 0.85).Figure 3.

Bottom Line: In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets.Nevertheless, default parameters show the most stable performance, suggesting that they should be used.This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.

View Article: PubMed Central - PubMed

Affiliation: Yale University School of Medicine, Department of Pathology, New Haven, CT 06520, USA.

ABSTRACT
Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.

Show MeSH
Related in: MedlinePlus