Limits...
methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data.

Kishore K, de Pretis S, Lister R, Morelli MJ, Bianchi V, Amati B, Ecker JR, Pelizzola M - BMC Bioinformatics (2015)

Bottom Line: The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data.Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms.These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.

View Article: PubMed Central - PubMed

Affiliation: Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), Milano, 20139, Italy. kamal.kishore@iit.it.

ABSTRACT

Background: Numerous methods are available to profile several epigenetic marks, providing data with different genome coverage and resolution. Large epigenomic datasets are then generated, and often combined with other high-throughput data, including RNA-seq, ChIP-seq for transcription factors (TFs) binding and DNase-seq experiments. Despite the numerous computational tools covering specific steps in the analysis of large-scale epigenomics data, comprehensive software solutions for their integrative analysis are still missing. Multiple tools must be identified and combined to jointly analyze histone marks, TFs binding and other -omics data together with DNA methylation data, complicating the analysis of these data and their integration with publicly available datasets.

Results: To overcome the burden of integrating various data types with multiple tools, we developed two companion R/Bioconductor packages. The former, methylPipe, is tailored to the analysis of high- or low-resolution DNA methylomes in several species, accommodating (hydroxy-)methyl-cytosines in both CpG and non-CpG sequence context. The analysis of multiple whole-genome bisulfite sequencing experiments is supported, while maintaining the ability of integrating targeted genomic data. The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data. It provides a number of methods to score these data in regions of interest, leading to the identification of enhancers, lncRNAs, and RNAPII stalling/elongation dynamics. Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms. Finally, the package includes a flexible method based on heatmaps for the integration of various data types, combining annotation tracks with continuous or categorical data tracks.

Conclusions: methylPipe and compEpiTools provide a comprehensive Bioconductor-compliant solution for the integrative analysis of heterogeneous epigenomics data. These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.

No MeSH data available.


Related in: MedlinePlus

Diagram describing input and output for the methylPipe and compEpiTools R packages. Most typical input data and output are listed for both packages. Regions of interest (ROIs) might be both input and output for these tools. For example, input ROIs can be generated in R based on the UCSC table browser or can be based on Bioconductor gene models or reference genome-sequence packages. Output ROIs are generated by methylPipe and compEpiTools and can typically feedback on the same tools as a new set of genomic regions to be investigated, often associated with scores or more complex data. Abbreviations: differentially methylated regions (DMRs); methyl-cytosine (mC); CpG Islands (CGIs); GeneOntology (GO); long non-coding RNAs (lncRNAs); transcription factors (TFs). The dashed arrow identifies a computational step that can be covered with additional tools (see the text for details)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4587815&req=5

Fig1: Diagram describing input and output for the methylPipe and compEpiTools R packages. Most typical input data and output are listed for both packages. Regions of interest (ROIs) might be both input and output for these tools. For example, input ROIs can be generated in R based on the UCSC table browser or can be based on Bioconductor gene models or reference genome-sequence packages. Output ROIs are generated by methylPipe and compEpiTools and can typically feedback on the same tools as a new set of genomic regions to be investigated, often associated with scores or more complex data. Abbreviations: differentially methylated regions (DMRs); methyl-cytosine (mC); CpG Islands (CGIs); GeneOntology (GO); long non-coding RNAs (lncRNAs); transcription factors (TFs). The dashed arrow identifies a computational step that can be covered with additional tools (see the text for details)

Mentions: Some of these datasets can be particularly large: for example, data resulting from whole-genome bisulfite (WGBS) experiments in human cells. In order to accommodate studies including multiple WGBS without affecting performance (in terms of speed and required memory), in the packages we developed, the data are maintained on the disk as indexed and compressed flat files [6]. The code is parallelized in order to minimize the computational time for the most demanding tasks, as in the case of the identification of differentially methylated regions. FigureĀ 1 illustrates the overall design along with the main input and output of the methylPipe and compEpiTools packages.Fig. 1


methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data.

Kishore K, de Pretis S, Lister R, Morelli MJ, Bianchi V, Amati B, Ecker JR, Pelizzola M - BMC Bioinformatics (2015)

Diagram describing input and output for the methylPipe and compEpiTools R packages. Most typical input data and output are listed for both packages. Regions of interest (ROIs) might be both input and output for these tools. For example, input ROIs can be generated in R based on the UCSC table browser or can be based on Bioconductor gene models or reference genome-sequence packages. Output ROIs are generated by methylPipe and compEpiTools and can typically feedback on the same tools as a new set of genomic regions to be investigated, often associated with scores or more complex data. Abbreviations: differentially methylated regions (DMRs); methyl-cytosine (mC); CpG Islands (CGIs); GeneOntology (GO); long non-coding RNAs (lncRNAs); transcription factors (TFs). The dashed arrow identifies a computational step that can be covered with additional tools (see the text for details)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4587815&req=5

Fig1: Diagram describing input and output for the methylPipe and compEpiTools R packages. Most typical input data and output are listed for both packages. Regions of interest (ROIs) might be both input and output for these tools. For example, input ROIs can be generated in R based on the UCSC table browser or can be based on Bioconductor gene models or reference genome-sequence packages. Output ROIs are generated by methylPipe and compEpiTools and can typically feedback on the same tools as a new set of genomic regions to be investigated, often associated with scores or more complex data. Abbreviations: differentially methylated regions (DMRs); methyl-cytosine (mC); CpG Islands (CGIs); GeneOntology (GO); long non-coding RNAs (lncRNAs); transcription factors (TFs). The dashed arrow identifies a computational step that can be covered with additional tools (see the text for details)
Mentions: Some of these datasets can be particularly large: for example, data resulting from whole-genome bisulfite (WGBS) experiments in human cells. In order to accommodate studies including multiple WGBS without affecting performance (in terms of speed and required memory), in the packages we developed, the data are maintained on the disk as indexed and compressed flat files [6]. The code is parallelized in order to minimize the computational time for the most demanding tasks, as in the case of the identification of differentially methylated regions. FigureĀ 1 illustrates the overall design along with the main input and output of the methylPipe and compEpiTools packages.Fig. 1

Bottom Line: The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data.Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms.These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.

View Article: PubMed Central - PubMed

Affiliation: Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), Milano, 20139, Italy. kamal.kishore@iit.it.

ABSTRACT

Background: Numerous methods are available to profile several epigenetic marks, providing data with different genome coverage and resolution. Large epigenomic datasets are then generated, and often combined with other high-throughput data, including RNA-seq, ChIP-seq for transcription factors (TFs) binding and DNase-seq experiments. Despite the numerous computational tools covering specific steps in the analysis of large-scale epigenomics data, comprehensive software solutions for their integrative analysis are still missing. Multiple tools must be identified and combined to jointly analyze histone marks, TFs binding and other -omics data together with DNA methylation data, complicating the analysis of these data and their integration with publicly available datasets.

Results: To overcome the burden of integrating various data types with multiple tools, we developed two companion R/Bioconductor packages. The former, methylPipe, is tailored to the analysis of high- or low-resolution DNA methylomes in several species, accommodating (hydroxy-)methyl-cytosines in both CpG and non-CpG sequence context. The analysis of multiple whole-genome bisulfite sequencing experiments is supported, while maintaining the ability of integrating targeted genomic data. The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data. It provides a number of methods to score these data in regions of interest, leading to the identification of enhancers, lncRNAs, and RNAPII stalling/elongation dynamics. Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms. Finally, the package includes a flexible method based on heatmaps for the integration of various data types, combining annotation tracks with continuous or categorical data tracks.

Conclusions: methylPipe and compEpiTools provide a comprehensive Bioconductor-compliant solution for the integrative analysis of heterogeneous epigenomics data. These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.

No MeSH data available.


Related in: MedlinePlus