Limits...
Pgltools: a genomic arithmetic tool suite for manipulation of Hi-C peak and other chromatin interaction data

View Article: PubMed Central - PubMed

ABSTRACT

Ngs: Genomic interaction studies use next-generation sequencing () to examine the interactions between two loci on the genome, with subsequent bioinformatics analyses typically including annotation, intersection, and merging of data from multiple experiments. While many file types and analysis tools exist for storing and manipulating single locus NGS data, there is currently no file standard or analysis tool suite for manipulating and storing paired-genomic-loci: the data type resulting from “genomic interaction” studies. As genomic interaction sequencing data are becoming prevalent, a standard file format and tools for working with these data conveniently and efficiently are needed.

Pglpgl: This article details a file standard and novel software tool suite for working with paired-genomic-loci data. We present the aired-enomic-oci () file standard for genomic-interactions data, and the accompanying analysis tool suite “pgltools”: a cross platform, pypy compatible python package available both as an easy-to-use UNIX package, and as a python module, for integration into pipelines of paired-genomic-loci analyses.

Conclusions: Pgltools is a freely available, open source tool suite for manipulating paired-genomic-loci data. Source code, an in-depth manual, and a tutorial are available publicly at www.github.com/billgreenwald/pgltools, and a python module of the operations can be installed from PyPI via the PyGLtools module.

No MeSH data available.


The operations of pgltools. PGL entries from file one are shown in various shades of blue, PGL entries from file two are shown in orange, and windows are shown in yellow (see legend at bottom right). All resulting outputs are shown below dashed lines, with novel entries shown in green and original entries shown in their original color. (a) The intersect operation finds overlapping paired-genomic-loci between two PGL files and returns the overlapping regions. (b) The merge operation combines overlapping paired-genomic-loci within a single PGL file. (c) The subtract operation returns the PGL entries from file one with the PGL entries from file two removed. (d) The window operation returns the PGL entries that fall completely within a specified genomic region. (e) The coverage operation returns the number of PGL entries from file two that overlap each PGL entry in file one. (f) The closest operation returns the closest PGL entry from file two for each PGL entry in file one. (g) The intersect1D operation returns PGL entries from file one that overlap regions in a bed file. (h) The closest1D operation returns the closest region from a bed file for each PGL entry in file one. (i) The subtract1D operation returns the PGL entries from file one with the regions from a bed file removed
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC5384132&req=5

Fig2: The operations of pgltools. PGL entries from file one are shown in various shades of blue, PGL entries from file two are shown in orange, and windows are shown in yellow (see legend at bottom right). All resulting outputs are shown below dashed lines, with novel entries shown in green and original entries shown in their original color. (a) The intersect operation finds overlapping paired-genomic-loci between two PGL files and returns the overlapping regions. (b) The merge operation combines overlapping paired-genomic-loci within a single PGL file. (c) The subtract operation returns the PGL entries from file one with the PGL entries from file two removed. (d) The window operation returns the PGL entries that fall completely within a specified genomic region. (e) The coverage operation returns the number of PGL entries from file two that overlap each PGL entry in file one. (f) The closest operation returns the closest PGL entry from file two for each PGL entry in file one. (g) The intersect1D operation returns PGL entries from file one that overlap regions in a bed file. (h) The closest1D operation returns the closest region from a bed file for each PGL entry in file one. (i) The subtract1D operation returns the PGL entries from file one with the regions from a bed file removed

Mentions: Table 1 includes a full list of pgltools operations and their default behavior. Visualizations of these operations are provided in Fig. 2. The pgltools intersect operation can be used to identify either the overlap, union, or uniqueness of PGL entries between two PGL files, while preserving or combining annotations during these analyses; for example, the number of overlapping bases at each locus from each PGL entry from two PGL files can be determined. The pgltools merge operation can be utilized to merge overlapping PGL entries, or PGL entries within a specified distance within a single PGL file. Summary statistics, such as the number of merged entries, can be obtained through command line arguments to the merge operation. To determine differential PGL entries between two PGL files, the subtract operation has been included to remove the parts of PGL entries present in one PGL file from those present in another. Once a set of PGL entries has been determined, it is common to filter these entries to a desired genomic region—the window operation can be used to filter based on either or both end(s) of the PGL entries in a PGL file. To interrogate questions regarding differential coverage depth of genomic interactions, such as genetic association with interaction intensity, we provide the samTopgl operation, which when utilized with the coverage operation, will find the number of reads from a sam file that overlap each PGL entry in a PGL file (though the operation is generalizable for any two PGL files). The closest operation is provided for finding the closest PGL entries between two PGL files. The expand operation can expand both loci by a given value. In addition, as single locus genomic metadata is often analyzed together with interaction data, such as presence of a coding region, epigenetic annotation, or motif locations, we provide the intersect1D, closest1D, and subtract1D operations for analysis on traditional BED files and PGL files. Finally, we include helper operations both for converting files to the PGL format, including formatbedpe to convert a bedpe file and formatTripSparse to convert triple sparse matrix files, and for converting from the PGL format to packages for visualization or further analysis, such as the conveRt operation to convert to a file readable by the GenomicInteractions R package [10], browser for visualizing with the UCSC Genome Browser [11], juiceBox for visualizing with JuiceBox [3, 12], and condense and findLoops to create a BED file of either the discrete loci or interior regions of each PGL.Table 1


Pgltools: a genomic arithmetic tool suite for manipulation of Hi-C peak and other chromatin interaction data
The operations of pgltools. PGL entries from file one are shown in various shades of blue, PGL entries from file two are shown in orange, and windows are shown in yellow (see legend at bottom right). All resulting outputs are shown below dashed lines, with novel entries shown in green and original entries shown in their original color. (a) The intersect operation finds overlapping paired-genomic-loci between two PGL files and returns the overlapping regions. (b) The merge operation combines overlapping paired-genomic-loci within a single PGL file. (c) The subtract operation returns the PGL entries from file one with the PGL entries from file two removed. (d) The window operation returns the PGL entries that fall completely within a specified genomic region. (e) The coverage operation returns the number of PGL entries from file two that overlap each PGL entry in file one. (f) The closest operation returns the closest PGL entry from file two for each PGL entry in file one. (g) The intersect1D operation returns PGL entries from file one that overlap regions in a bed file. (h) The closest1D operation returns the closest region from a bed file for each PGL entry in file one. (i) The subtract1D operation returns the PGL entries from file one with the regions from a bed file removed
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC5384132&req=5

Fig2: The operations of pgltools. PGL entries from file one are shown in various shades of blue, PGL entries from file two are shown in orange, and windows are shown in yellow (see legend at bottom right). All resulting outputs are shown below dashed lines, with novel entries shown in green and original entries shown in their original color. (a) The intersect operation finds overlapping paired-genomic-loci between two PGL files and returns the overlapping regions. (b) The merge operation combines overlapping paired-genomic-loci within a single PGL file. (c) The subtract operation returns the PGL entries from file one with the PGL entries from file two removed. (d) The window operation returns the PGL entries that fall completely within a specified genomic region. (e) The coverage operation returns the number of PGL entries from file two that overlap each PGL entry in file one. (f) The closest operation returns the closest PGL entry from file two for each PGL entry in file one. (g) The intersect1D operation returns PGL entries from file one that overlap regions in a bed file. (h) The closest1D operation returns the closest region from a bed file for each PGL entry in file one. (i) The subtract1D operation returns the PGL entries from file one with the regions from a bed file removed
Mentions: Table 1 includes a full list of pgltools operations and their default behavior. Visualizations of these operations are provided in Fig. 2. The pgltools intersect operation can be used to identify either the overlap, union, or uniqueness of PGL entries between two PGL files, while preserving or combining annotations during these analyses; for example, the number of overlapping bases at each locus from each PGL entry from two PGL files can be determined. The pgltools merge operation can be utilized to merge overlapping PGL entries, or PGL entries within a specified distance within a single PGL file. Summary statistics, such as the number of merged entries, can be obtained through command line arguments to the merge operation. To determine differential PGL entries between two PGL files, the subtract operation has been included to remove the parts of PGL entries present in one PGL file from those present in another. Once a set of PGL entries has been determined, it is common to filter these entries to a desired genomic region—the window operation can be used to filter based on either or both end(s) of the PGL entries in a PGL file. To interrogate questions regarding differential coverage depth of genomic interactions, such as genetic association with interaction intensity, we provide the samTopgl operation, which when utilized with the coverage operation, will find the number of reads from a sam file that overlap each PGL entry in a PGL file (though the operation is generalizable for any two PGL files). The closest operation is provided for finding the closest PGL entries between two PGL files. The expand operation can expand both loci by a given value. In addition, as single locus genomic metadata is often analyzed together with interaction data, such as presence of a coding region, epigenetic annotation, or motif locations, we provide the intersect1D, closest1D, and subtract1D operations for analysis on traditional BED files and PGL files. Finally, we include helper operations both for converting files to the PGL format, including formatbedpe to convert a bedpe file and formatTripSparse to convert triple sparse matrix files, and for converting from the PGL format to packages for visualization or further analysis, such as the conveRt operation to convert to a file readable by the GenomicInteractions R package [10], browser for visualizing with the UCSC Genome Browser [11], juiceBox for visualizing with JuiceBox [3, 12], and condense and findLoops to create a BED file of either the discrete loci or interior regions of each PGL.Table 1

View Article: PubMed Central - PubMed

ABSTRACT

Ngs: Genomic interaction studies use next-generation sequencing () to examine the interactions between two loci on the genome, with subsequent bioinformatics analyses typically including annotation, intersection, and merging of data from multiple experiments. While many file types and analysis tools exist for storing and manipulating single locus NGS data, there is currently no file standard or analysis tool suite for manipulating and storing paired-genomic-loci: the data type resulting from “genomic interaction” studies. As genomic interaction sequencing data are becoming prevalent, a standard file format and tools for working with these data conveniently and efficiently are needed.

Pglpgl: This article details a file standard and novel software tool suite for working with paired-genomic-loci data. We present the aired-enomic-oci () file standard for genomic-interactions data, and the accompanying analysis tool suite “pgltools”: a cross platform, pypy compatible python package available both as an easy-to-use UNIX package, and as a python module, for integration into pipelines of paired-genomic-loci analyses.

Conclusions: Pgltools is a freely available, open source tool suite for manipulating paired-genomic-loci data. Source code, an in-depth manual, and a tutorial are available publicly at www.github.com/billgreenwald/pgltools, and a python module of the operations can be installed from PyPI via the PyGLtools module.

No MeSH data available.