Limits...
methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data.

Kishore K, de Pretis S, Lister R, Morelli MJ, Bianchi V, Amati B, Ecker JR, Pelizzola M - BMC Bioinformatics (2015)

Bottom Line: The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data.Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms.These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.

View Article: PubMed Central - PubMed

Affiliation: Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), Milano, 20139, Italy. kamal.kishore@iit.it.

ABSTRACT

Background: Numerous methods are available to profile several epigenetic marks, providing data with different genome coverage and resolution. Large epigenomic datasets are then generated, and often combined with other high-throughput data, including RNA-seq, ChIP-seq for transcription factors (TFs) binding and DNase-seq experiments. Despite the numerous computational tools covering specific steps in the analysis of large-scale epigenomics data, comprehensive software solutions for their integrative analysis are still missing. Multiple tools must be identified and combined to jointly analyze histone marks, TFs binding and other -omics data together with DNA methylation data, complicating the analysis of these data and their integration with publicly available datasets.

Results: To overcome the burden of integrating various data types with multiple tools, we developed two companion R/Bioconductor packages. The former, methylPipe, is tailored to the analysis of high- or low-resolution DNA methylomes in several species, accommodating (hydroxy-)methyl-cytosines in both CpG and non-CpG sequence context. The analysis of multiple whole-genome bisulfite sequencing experiments is supported, while maintaining the ability of integrating targeted genomic data. The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data. It provides a number of methods to score these data in regions of interest, leading to the identification of enhancers, lncRNAs, and RNAPII stalling/elongation dynamics. Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms. Finally, the package includes a flexible method based on heatmaps for the integration of various data types, combining annotation tracks with continuous or categorical data tracks.

Conclusions: methylPipe and compEpiTools provide a comprehensive Bioconductor-compliant solution for the integrative analysis of heterogeneous epigenomics data. These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.

No MeSH data available.


Related in: MedlinePlus

The integrative heatmap generated by the compEpiTools heatmapData and heatmapPlot functions. Heatmaps can easily be obtained incorporating any mixture of data and annotation tracks. Heatmap rows represent ROIs, while columns represent tracks profiled over those ROIs (or bins thereof). Data and annotation tracks might contain either quantitative (e.g. normalized reads counts) or categorical (e.g. presence/absence of a ChIP-seq peak) data. If available, the significance of associated data can be incorporated affecting colour brightness. In this example, generated as described in detail in the supplemental material, NIH Roadmap DNA methylation data where visualized together with ENCODE histone marks for a set of differentially methylated regions. ROIs were clustered based on the data available in all the displayed tracks including gene models annotations. The schema on the top of the figure depicts the workflow leading to the heatmap. A set of standard Bioconductor objects, listed in red, is the input for the heatmapData and heatmapPlot compEpiTools functions. The underlined text points to the key analysis steps automatically performed internally to the functions generating the heatmap, calling routines available in the same packages
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4587815&req=5

Fig2: The integrative heatmap generated by the compEpiTools heatmapData and heatmapPlot functions. Heatmaps can easily be obtained incorporating any mixture of data and annotation tracks. Heatmap rows represent ROIs, while columns represent tracks profiled over those ROIs (or bins thereof). Data and annotation tracks might contain either quantitative (e.g. normalized reads counts) or categorical (e.g. presence/absence of a ChIP-seq peak) data. If available, the significance of associated data can be incorporated affecting colour brightness. In this example, generated as described in detail in the supplemental material, NIH Roadmap DNA methylation data where visualized together with ENCODE histone marks for a set of differentially methylated regions. ROIs were clustered based on the data available in all the displayed tracks including gene models annotations. The schema on the top of the figure depicts the workflow leading to the heatmap. A set of standard Bioconductor objects, listed in red, is the input for the heatmapData and heatmapPlot compEpiTools functions. The underlined text points to the key analysis steps automatically performed internally to the functions generating the heatmap, calling routines available in the same packages

Mentions: The integration of heterogeneous data types remains a challenging task, and explorative analyses based on the generation of heatmaps are frequently used to highlight patterns in composite datasets. In our experience, the creation of these heatmaps requires an extensive number of processing steps, especially when applied to datasets composed of heterogeneous data types and annotation tracks, discouraging the repeated use of these tools. Moreover, heatmaps are typically iteratively generated until a satisfactory combination of data tracks, clustering and normalization settings is identified. A powerful and efficient visualization system based on heatmaps is provided in compEpiTools, based on the heatmapData and heatmapPlot functions. Heatmap rows represent ROIs and columns represent data tracks. Every track can be assigned to any of the supported data types: GRanges, GRanges metadata, BAM files, and GElist and GEcollection objects generated by methylPipe. Thus, any combination of base-resolution or low-resolution DNA methylation data, histone marks, TF binding, RNA-seq expression and genomic annotations, including gene models, is accommodated. Quantile or thresholding-based normalization methods can be independently activated for each track to emphasize patterns in the combined dataset and adjust the signal range of the track (for example to exclude outliers or underweight data tracks that are overall poorly scoring in the ROIs). Clustering of rows can be activated, including data from all or selected tracks. The resolution of the displayed data can be controlled by dividing each ROI in a user-defined number of uniformly-sized bins. Importantly, each track can be supplied with significance scores, which can be used to progressively dim the colour of low-scoring (less significant) hits, while maintaining full brightness for the significant ones. The data matrix underlying the heatmap is returned together with the dendrogram structure, allowing further analysis of the clusters of interest (Fig. 2).Fig. 2


methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data.

Kishore K, de Pretis S, Lister R, Morelli MJ, Bianchi V, Amati B, Ecker JR, Pelizzola M - BMC Bioinformatics (2015)

The integrative heatmap generated by the compEpiTools heatmapData and heatmapPlot functions. Heatmaps can easily be obtained incorporating any mixture of data and annotation tracks. Heatmap rows represent ROIs, while columns represent tracks profiled over those ROIs (or bins thereof). Data and annotation tracks might contain either quantitative (e.g. normalized reads counts) or categorical (e.g. presence/absence of a ChIP-seq peak) data. If available, the significance of associated data can be incorporated affecting colour brightness. In this example, generated as described in detail in the supplemental material, NIH Roadmap DNA methylation data where visualized together with ENCODE histone marks for a set of differentially methylated regions. ROIs were clustered based on the data available in all the displayed tracks including gene models annotations. The schema on the top of the figure depicts the workflow leading to the heatmap. A set of standard Bioconductor objects, listed in red, is the input for the heatmapData and heatmapPlot compEpiTools functions. The underlined text points to the key analysis steps automatically performed internally to the functions generating the heatmap, calling routines available in the same packages
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4587815&req=5

Fig2: The integrative heatmap generated by the compEpiTools heatmapData and heatmapPlot functions. Heatmaps can easily be obtained incorporating any mixture of data and annotation tracks. Heatmap rows represent ROIs, while columns represent tracks profiled over those ROIs (or bins thereof). Data and annotation tracks might contain either quantitative (e.g. normalized reads counts) or categorical (e.g. presence/absence of a ChIP-seq peak) data. If available, the significance of associated data can be incorporated affecting colour brightness. In this example, generated as described in detail in the supplemental material, NIH Roadmap DNA methylation data where visualized together with ENCODE histone marks for a set of differentially methylated regions. ROIs were clustered based on the data available in all the displayed tracks including gene models annotations. The schema on the top of the figure depicts the workflow leading to the heatmap. A set of standard Bioconductor objects, listed in red, is the input for the heatmapData and heatmapPlot compEpiTools functions. The underlined text points to the key analysis steps automatically performed internally to the functions generating the heatmap, calling routines available in the same packages
Mentions: The integration of heterogeneous data types remains a challenging task, and explorative analyses based on the generation of heatmaps are frequently used to highlight patterns in composite datasets. In our experience, the creation of these heatmaps requires an extensive number of processing steps, especially when applied to datasets composed of heterogeneous data types and annotation tracks, discouraging the repeated use of these tools. Moreover, heatmaps are typically iteratively generated until a satisfactory combination of data tracks, clustering and normalization settings is identified. A powerful and efficient visualization system based on heatmaps is provided in compEpiTools, based on the heatmapData and heatmapPlot functions. Heatmap rows represent ROIs and columns represent data tracks. Every track can be assigned to any of the supported data types: GRanges, GRanges metadata, BAM files, and GElist and GEcollection objects generated by methylPipe. Thus, any combination of base-resolution or low-resolution DNA methylation data, histone marks, TF binding, RNA-seq expression and genomic annotations, including gene models, is accommodated. Quantile or thresholding-based normalization methods can be independently activated for each track to emphasize patterns in the combined dataset and adjust the signal range of the track (for example to exclude outliers or underweight data tracks that are overall poorly scoring in the ROIs). Clustering of rows can be activated, including data from all or selected tracks. The resolution of the displayed data can be controlled by dividing each ROI in a user-defined number of uniformly-sized bins. Importantly, each track can be supplied with significance scores, which can be used to progressively dim the colour of low-scoring (less significant) hits, while maintaining full brightness for the significant ones. The data matrix underlying the heatmap is returned together with the dendrogram structure, allowing further analysis of the clusters of interest (Fig. 2).Fig. 2

Bottom Line: The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data.Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms.These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.

View Article: PubMed Central - PubMed

Affiliation: Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), Milano, 20139, Italy. kamal.kishore@iit.it.

ABSTRACT

Background: Numerous methods are available to profile several epigenetic marks, providing data with different genome coverage and resolution. Large epigenomic datasets are then generated, and often combined with other high-throughput data, including RNA-seq, ChIP-seq for transcription factors (TFs) binding and DNase-seq experiments. Despite the numerous computational tools covering specific steps in the analysis of large-scale epigenomics data, comprehensive software solutions for their integrative analysis are still missing. Multiple tools must be identified and combined to jointly analyze histone marks, TFs binding and other -omics data together with DNA methylation data, complicating the analysis of these data and their integration with publicly available datasets.

Results: To overcome the burden of integrating various data types with multiple tools, we developed two companion R/Bioconductor packages. The former, methylPipe, is tailored to the analysis of high- or low-resolution DNA methylomes in several species, accommodating (hydroxy-)methyl-cytosines in both CpG and non-CpG sequence context. The analysis of multiple whole-genome bisulfite sequencing experiments is supported, while maintaining the ability of integrating targeted genomic data. The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data. It provides a number of methods to score these data in regions of interest, leading to the identification of enhancers, lncRNAs, and RNAPII stalling/elongation dynamics. Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms. Finally, the package includes a flexible method based on heatmaps for the integration of various data types, combining annotation tracks with continuous or categorical data tracks.

Conclusions: methylPipe and compEpiTools provide a comprehensive Bioconductor-compliant solution for the integrative analysis of heterogeneous epigenomics data. These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.

No MeSH data available.


Related in: MedlinePlus