Limits...
Analysis methods for studying the 3D architecture of the genome.

Ay F, Noble WS - Genome Biol. (2015)

Bottom Line: The rapidly increasing quantity of genome-wide chromosome conformation capture data presents great opportunities and challenges in the computational modeling and interpretation of the three-dimensional genome.In particular, with recent trends towards higher-resolution high-throughput chromosome conformation capture (Hi-C) data, the diversity and complexity of biological hypotheses that can be tested necessitates rigorous computational and statistical methods as well as scalable pipelines to interpret these datasets.Here we review computational tools to interpret Hi-C data, including pipelines for mapping, filtering, and normalization, and methods for confidence estimation, domain calling, visualization, and three-dimensional modeling.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA. ferhatay@uw.edu.

ABSTRACT
The rapidly increasing quantity of genome-wide chromosome conformation capture data presents great opportunities and challenges in the computational modeling and interpretation of the three-dimensional genome. In particular, with recent trends towards higher-resolution high-throughput chromosome conformation capture (Hi-C) data, the diversity and complexity of biological hypotheses that can be tested necessitates rigorous computational and statistical methods as well as scalable pipelines to interpret these datasets. Here we review computational tools to interpret Hi-C data, including pipelines for mapping, filtering, and normalization, and methods for confidence estimation, domain calling, visualization, and three-dimensional modeling.

No MeSH data available.


Related in: MedlinePlus

Visualization of Hi-C data. An Epigenome Browser snapshot of a 4 Mb region of human chromosome 10. Top track shows Refseq genes. All other tracks display data from the human lymphoblastoid cell line GM12878. From top to bottom these tracks are: smoothed CTCF signal from ENCODE [130]; significant contact calls by Fit-Hi-C using 1 kb resolution Hi-C data (only the contacts >50 kb distance and − log(p-value) ≤25 are shown) [20]; arrowhead domain calls at 5 kb resolution [18]; Armatus multiscale domain calls for three different values of the domain-length scaling factor γ [87]; DI HMM TAD calls at 50 kb resolution [15]; and the heatmap of 10 kb resolution normalized contact counts for GM12878 Hi-C data [18]. The color scale of the heatmap is truncated to the range 20 to 400, with higher contact counts corresponding to a darker color
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4556012&req=5

Fig3: Visualization of Hi-C data. An Epigenome Browser snapshot of a 4 Mb region of human chromosome 10. Top track shows Refseq genes. All other tracks display data from the human lymphoblastoid cell line GM12878. From top to bottom these tracks are: smoothed CTCF signal from ENCODE [130]; significant contact calls by Fit-Hi-C using 1 kb resolution Hi-C data (only the contacts >50 kb distance and − log(p-value) ≤25 are shown) [20]; arrowhead domain calls at 5 kb resolution [18]; Armatus multiscale domain calls for three different values of the domain-length scaling factor γ [87]; DI HMM TAD calls at 50 kb resolution [15]; and the heatmap of 10 kb resolution normalized contact counts for GM12878 Hi-C data [18]. The color scale of the heatmap is truncated to the range 20 to 400, with higher contact counts corresponding to a darker color

Mentions: Instead of assuming a specific distribution, one can infer the distance-dependence relationship using nonparametric methods, such as splines, directly from the observed contact counts. Compared with parametric fits, nonparametric fits are more general in capturing the distance dependence, which changes substantially with varying resolution, genomic distance range, and sequencing depth [20]. A recent method, Fit-Hi-C, uses smoothing splines to find an initial fit, refines the initial fit to account for bona fide (non-random) contacts, and computes confidence estimates using the refined fit while incorporating biases computed by the matrix balancing-based normalization methods [20]. The resulting p-values are subsequently subjected to multiple testing correction. Figure 3 displays examples of long-range chromatin loops identified by Fit-Hi-C.Fig. 3


Analysis methods for studying the 3D architecture of the genome.

Ay F, Noble WS - Genome Biol. (2015)

Visualization of Hi-C data. An Epigenome Browser snapshot of a 4 Mb region of human chromosome 10. Top track shows Refseq genes. All other tracks display data from the human lymphoblastoid cell line GM12878. From top to bottom these tracks are: smoothed CTCF signal from ENCODE [130]; significant contact calls by Fit-Hi-C using 1 kb resolution Hi-C data (only the contacts >50 kb distance and − log(p-value) ≤25 are shown) [20]; arrowhead domain calls at 5 kb resolution [18]; Armatus multiscale domain calls for three different values of the domain-length scaling factor γ [87]; DI HMM TAD calls at 50 kb resolution [15]; and the heatmap of 10 kb resolution normalized contact counts for GM12878 Hi-C data [18]. The color scale of the heatmap is truncated to the range 20 to 400, with higher contact counts corresponding to a darker color
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4556012&req=5

Fig3: Visualization of Hi-C data. An Epigenome Browser snapshot of a 4 Mb region of human chromosome 10. Top track shows Refseq genes. All other tracks display data from the human lymphoblastoid cell line GM12878. From top to bottom these tracks are: smoothed CTCF signal from ENCODE [130]; significant contact calls by Fit-Hi-C using 1 kb resolution Hi-C data (only the contacts >50 kb distance and − log(p-value) ≤25 are shown) [20]; arrowhead domain calls at 5 kb resolution [18]; Armatus multiscale domain calls for three different values of the domain-length scaling factor γ [87]; DI HMM TAD calls at 50 kb resolution [15]; and the heatmap of 10 kb resolution normalized contact counts for GM12878 Hi-C data [18]. The color scale of the heatmap is truncated to the range 20 to 400, with higher contact counts corresponding to a darker color
Mentions: Instead of assuming a specific distribution, one can infer the distance-dependence relationship using nonparametric methods, such as splines, directly from the observed contact counts. Compared with parametric fits, nonparametric fits are more general in capturing the distance dependence, which changes substantially with varying resolution, genomic distance range, and sequencing depth [20]. A recent method, Fit-Hi-C, uses smoothing splines to find an initial fit, refines the initial fit to account for bona fide (non-random) contacts, and computes confidence estimates using the refined fit while incorporating biases computed by the matrix balancing-based normalization methods [20]. The resulting p-values are subsequently subjected to multiple testing correction. Figure 3 displays examples of long-range chromatin loops identified by Fit-Hi-C.Fig. 3

Bottom Line: The rapidly increasing quantity of genome-wide chromosome conformation capture data presents great opportunities and challenges in the computational modeling and interpretation of the three-dimensional genome.In particular, with recent trends towards higher-resolution high-throughput chromosome conformation capture (Hi-C) data, the diversity and complexity of biological hypotheses that can be tested necessitates rigorous computational and statistical methods as well as scalable pipelines to interpret these datasets.Here we review computational tools to interpret Hi-C data, including pipelines for mapping, filtering, and normalization, and methods for confidence estimation, domain calling, visualization, and three-dimensional modeling.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA. ferhatay@uw.edu.

ABSTRACT
The rapidly increasing quantity of genome-wide chromosome conformation capture data presents great opportunities and challenges in the computational modeling and interpretation of the three-dimensional genome. In particular, with recent trends towards higher-resolution high-throughput chromosome conformation capture (Hi-C) data, the diversity and complexity of biological hypotheses that can be tested necessitates rigorous computational and statistical methods as well as scalable pipelines to interpret these datasets. Here we review computational tools to interpret Hi-C data, including pipelines for mapping, filtering, and normalization, and methods for confidence estimation, domain calling, visualization, and three-dimensional modeling.

No MeSH data available.


Related in: MedlinePlus