Limits...
COHCAP: an integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis.

Warden CD, Lee H, Tompkins JD, Li X, Wang C, Riggs AD, Yu H, Jove R, Yuan YC - Nucleic Acids Res. (2013)

Bottom Line: COHCAP is currently the only DNA methylation package that provides integration with gene expression data to identify a subset of CpG islands that are most likely to regulate downstream gene expression, and it can generate lists of differentially methylated CpG islands with ∼50% concordance with gene expression from both cell line data and heterogeneous patient data.For example, this article describes known breast cancer biomarkers (such as estrogen receptor) with a negative correlation between DNA methylation and gene expression.COHCAP also provides visualization for quality control metrics, regions of differential methylation and correlation between methylation and gene expression.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Core, City of Hope National Medical Center, Duarte, CA 91010, USA. cwarden@coh.org

ABSTRACT
COHCAP (City of Hope CpG Island Analysis Pipeline) is an algorithm to analyze single-nucleotide resolution DNA methylation data produced by either an Illumina methylation array or targeted bisulfite sequencing. The goal of the COHCAP algorithm is to identify CpG islands that show a consistent pattern of methylation among CpG sites. COHCAP is currently the only DNA methylation package that provides integration with gene expression data to identify a subset of CpG islands that are most likely to regulate downstream gene expression, and it can generate lists of differentially methylated CpG islands with ∼50% concordance with gene expression from both cell line data and heterogeneous patient data. For example, this article describes known breast cancer biomarkers (such as estrogen receptor) with a negative correlation between DNA methylation and gene expression. COHCAP also provides visualization for quality control metrics, regions of differential methylation and correlation between methylation and gene expression. This software is freely available at https://sourceforge.net/projects/cohcap/.

Show MeSH

Related in: MedlinePlus

Overlapping signal from HCT116 Illumina array and BS-Seq data. (A) Illumina array overlap: two independently produced samples (17, this study) measuring hyper- and hypo-methylated regions in the HCT116 cell are used to compare similarity in signal for CpG islands between these two studies. Each study had triplicate samples, which were used for COHCAP analysis. The matched overlap is clearly much greater than the inverse overlap, indicating that there is non-random overlap due to concordance of the COHCAP results. (B) Illumina array versus BS-Seq: two independently produced samples for HCT116 methylation for the Illumina array (this study) and targeted BS-Seq (36) were used to compare the similarity of COHCAP hyper- and hypo-methylated regions between these two different technologies. The matched overlap is clearly much greater than the inverse overlap, indicating that there is non-random overlap due to the good concordance and reproducibility of the COHCAP results.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3675470&req=5

gkt242-F4: Overlapping signal from HCT116 Illumina array and BS-Seq data. (A) Illumina array overlap: two independently produced samples (17, this study) measuring hyper- and hypo-methylated regions in the HCT116 cell are used to compare similarity in signal for CpG islands between these two studies. Each study had triplicate samples, which were used for COHCAP analysis. The matched overlap is clearly much greater than the inverse overlap, indicating that there is non-random overlap due to concordance of the COHCAP results. (B) Illumina array versus BS-Seq: two independently produced samples for HCT116 methylation for the Illumina array (this study) and targeted BS-Seq (36) were used to compare the similarity of COHCAP hyper- and hypo-methylated regions between these two different technologies. The matched overlap is clearly much greater than the inverse overlap, indicating that there is non-random overlap due to the good concordance and reproducibility of the COHCAP results.

Mentions: The high accuracy of the one-group COHCAP workflow estimated using the MIRA data is further validated by comparing HCT116 BS-Seq and 450k array data. Targeted BS-Seq and 450k array data for HCT116 is publicly available (12,36), so we compared the results of the HCT116 data from this study with these publicly available datasets. More specifically, the one-group workflow was first used to determine the overlap of methylated and unmethylated regions in the HCT116 using the two 450k datasets (Figure 4A), and it was clear that regions with corresponding methylation designations (e.g. methylated versus methylated, unmethylated versus unmethylated) show much greater overlap than unrelated regions (e.g. methylated versus unmethylated). As a technical note, the dataset from this study was used as a benchmark to calculate run-time (Supplementary Table S12); users should be aware that one-group analysis typically takes longer than two-group analysis (for comparable-size datasets) because the CpG site filter usually removes a much smaller proportion of the CpG sites for CpG island analysis. This 450k comparison indicates that non-random overlap can be seen with the same technology even though the samples were produced at different times by different laboratories. This is not trivial because methylation patterns can diverge over time (52,53), so an accurate method of analysis would still show some inconsistency between samples. Similarly, the data from this study also shows non-random overlap with COHCAP regions detected using BS-Seq (Figure 4B). Concordance between BS-Seq and Illumina methylation array has been previously reported (21,51,54), but it was not clear whether the specific regions identified by COHCAP would show good concordance for these samples produced from different laboratories. Furthermore, the BS-Seq data lacked replicates, and there are clear differences in signal distribution for BS-Seq data compared with Illumina array data (Figure S20). Therefore, we believe the COHCAP results for the 450k array and BS-Seq results show very strong overlap. Most importantly, the concordance of the independent HCT116 datasets for the 450k/BS-Seq comparison further emphasizes the accuracy of the one-group COHCAP workflow (for both hyper- and hypo-methylated regions).Figure 4.


COHCAP: an integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis.

Warden CD, Lee H, Tompkins JD, Li X, Wang C, Riggs AD, Yu H, Jove R, Yuan YC - Nucleic Acids Res. (2013)

Overlapping signal from HCT116 Illumina array and BS-Seq data. (A) Illumina array overlap: two independently produced samples (17, this study) measuring hyper- and hypo-methylated regions in the HCT116 cell are used to compare similarity in signal for CpG islands between these two studies. Each study had triplicate samples, which were used for COHCAP analysis. The matched overlap is clearly much greater than the inverse overlap, indicating that there is non-random overlap due to concordance of the COHCAP results. (B) Illumina array versus BS-Seq: two independently produced samples for HCT116 methylation for the Illumina array (this study) and targeted BS-Seq (36) were used to compare the similarity of COHCAP hyper- and hypo-methylated regions between these two different technologies. The matched overlap is clearly much greater than the inverse overlap, indicating that there is non-random overlap due to the good concordance and reproducibility of the COHCAP results.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3675470&req=5

gkt242-F4: Overlapping signal from HCT116 Illumina array and BS-Seq data. (A) Illumina array overlap: two independently produced samples (17, this study) measuring hyper- and hypo-methylated regions in the HCT116 cell are used to compare similarity in signal for CpG islands between these two studies. Each study had triplicate samples, which were used for COHCAP analysis. The matched overlap is clearly much greater than the inverse overlap, indicating that there is non-random overlap due to concordance of the COHCAP results. (B) Illumina array versus BS-Seq: two independently produced samples for HCT116 methylation for the Illumina array (this study) and targeted BS-Seq (36) were used to compare the similarity of COHCAP hyper- and hypo-methylated regions between these two different technologies. The matched overlap is clearly much greater than the inverse overlap, indicating that there is non-random overlap due to the good concordance and reproducibility of the COHCAP results.
Mentions: The high accuracy of the one-group COHCAP workflow estimated using the MIRA data is further validated by comparing HCT116 BS-Seq and 450k array data. Targeted BS-Seq and 450k array data for HCT116 is publicly available (12,36), so we compared the results of the HCT116 data from this study with these publicly available datasets. More specifically, the one-group workflow was first used to determine the overlap of methylated and unmethylated regions in the HCT116 using the two 450k datasets (Figure 4A), and it was clear that regions with corresponding methylation designations (e.g. methylated versus methylated, unmethylated versus unmethylated) show much greater overlap than unrelated regions (e.g. methylated versus unmethylated). As a technical note, the dataset from this study was used as a benchmark to calculate run-time (Supplementary Table S12); users should be aware that one-group analysis typically takes longer than two-group analysis (for comparable-size datasets) because the CpG site filter usually removes a much smaller proportion of the CpG sites for CpG island analysis. This 450k comparison indicates that non-random overlap can be seen with the same technology even though the samples were produced at different times by different laboratories. This is not trivial because methylation patterns can diverge over time (52,53), so an accurate method of analysis would still show some inconsistency between samples. Similarly, the data from this study also shows non-random overlap with COHCAP regions detected using BS-Seq (Figure 4B). Concordance between BS-Seq and Illumina methylation array has been previously reported (21,51,54), but it was not clear whether the specific regions identified by COHCAP would show good concordance for these samples produced from different laboratories. Furthermore, the BS-Seq data lacked replicates, and there are clear differences in signal distribution for BS-Seq data compared with Illumina array data (Figure S20). Therefore, we believe the COHCAP results for the 450k array and BS-Seq results show very strong overlap. Most importantly, the concordance of the independent HCT116 datasets for the 450k/BS-Seq comparison further emphasizes the accuracy of the one-group COHCAP workflow (for both hyper- and hypo-methylated regions).Figure 4.

Bottom Line: COHCAP is currently the only DNA methylation package that provides integration with gene expression data to identify a subset of CpG islands that are most likely to regulate downstream gene expression, and it can generate lists of differentially methylated CpG islands with ∼50% concordance with gene expression from both cell line data and heterogeneous patient data.For example, this article describes known breast cancer biomarkers (such as estrogen receptor) with a negative correlation between DNA methylation and gene expression.COHCAP also provides visualization for quality control metrics, regions of differential methylation and correlation between methylation and gene expression.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Core, City of Hope National Medical Center, Duarte, CA 91010, USA. cwarden@coh.org

ABSTRACT
COHCAP (City of Hope CpG Island Analysis Pipeline) is an algorithm to analyze single-nucleotide resolution DNA methylation data produced by either an Illumina methylation array or targeted bisulfite sequencing. The goal of the COHCAP algorithm is to identify CpG islands that show a consistent pattern of methylation among CpG sites. COHCAP is currently the only DNA methylation package that provides integration with gene expression data to identify a subset of CpG islands that are most likely to regulate downstream gene expression, and it can generate lists of differentially methylated CpG islands with ∼50% concordance with gene expression from both cell line data and heterogeneous patient data. For example, this article describes known breast cancer biomarkers (such as estrogen receptor) with a negative correlation between DNA methylation and gene expression. COHCAP also provides visualization for quality control metrics, regions of differential methylation and correlation between methylation and gene expression. This software is freely available at https://sourceforge.net/projects/cohcap/.

Show MeSH
Related in: MedlinePlus