Limits...
Probing long-range interactions by extracting free energies from genome-wide chromosome conformation capture data.

Saberi S, Farré P, Cuvier O, Emberly E - BMC Bioinformatics (2015)

Bottom Line: PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out.The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping.PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

View Article: PubMed Central - PubMed

Affiliation: Physics Department, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6, BC, Canada. saied.sabery.m@gmail.com.

ABSTRACT

Background: A variety of DNA binding proteins are involved in regulating and shaping the packing of chromatin. They aid the formation of loops in the DNA that function to isolate different structural domains. A recent experimental technique, Hi-C, provides a method for determining the frequency of such looping between all distant parts of the genome. Given that the binding locations of many chromatin associated proteins have also been measured, it has been possible to make estimates for their influence on the long-range interactions as measured by Hi-C. However, a challenge in this analysis is the predominance of non-specific contacts that mask out the specific interactions of interest.

Results: We show that transforming the Hi-C contact frequencies into free energies gives a natural method for separating out the distance dependent non-specific interactions. In particular we apply Principal Component Analysis (PCA) to the transformed free energy matrix to identify the dominant modes of interaction. PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out. Thus it can be used as a data driven approach for normalizing Hi-C data. We assess this PCA based normalization approach, along with several other normalization schemes, by fitting the transformed Hi-C data using a pairwise interaction model that takes as input the known locations of bound chromatin factors. The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping. We show that the quality of the fit can be used as a means to determine how much PCA filtering should be applied to the Hi-C data.

Conclusions: We find that the different normalizations of the Hi-C data vary in the quality of fit to the pairwise interaction model. PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

Show MeSH

Related in: MedlinePlus

Average free energy of interaction. The genome-wide average free energy, , as a function of genomic separation (a 600 kb window at 10 kb resolution) for free energies derived from three contact matrices (shown in legend). All show that the average free energy cost associated with forming a loop grows with the linear separation between genomic bins. Fitting a polymer model,  (see Methods) gives α=1.09, 1.085 and 1.12 for the raw, raw + ICE and hierarchical matrices.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492175&req=5

Fig1: Average free energy of interaction. The genome-wide average free energy, , as a function of genomic separation (a 600 kb window at 10 kb resolution) for free energies derived from three contact matrices (shown in legend). All show that the average free energy cost associated with forming a loop grows with the linear separation between genomic bins. Fitting a polymer model, (see Methods) gives α=1.09, 1.085 and 1.12 for the raw, raw + ICE and hierarchical matrices.

Mentions: For each contact matrix we apply our free energy transformation (see Eq. 1 in ‘Methods’), leading to three different free energy matrices that represent the energetics of interaction between genomic locations. Regardless of whether the contact matrix was normalized or not, the dominant contribution to the free energy is due to the distance dependent entropic cost of looping the DNA polymer between two genomic locations. We determine this distance dependent background free energy, by averaging together all free energy matrix elements Fi,j that are at a fixed genomic separation k=j−i (see Methods). In Figure 1 we plot for the three different free energy matrices used in the analysis. It can be seen that the free energy associated with this looping increases with the linear separation. We have fit each of the three average free energies to the prediction for that of a random polymer, namely that , where α is the scaling exponent. For an ideal random polymer in 3D, the scaling exponent would be predicted to be α=3/2. From the Drosophila Hi-C data, we find that the four matrices have average free energies that have roughly the same scaling (α=1.1±0.1). This result is in agreement with that found for other Hi-C datasets where, α∼1. We now show how the free energy fluctuations around the average can be further decomposed into an independent set of interaction modes using Principal Component Analysis (PCA).Figure 1


Probing long-range interactions by extracting free energies from genome-wide chromosome conformation capture data.

Saberi S, Farré P, Cuvier O, Emberly E - BMC Bioinformatics (2015)

Average free energy of interaction. The genome-wide average free energy, , as a function of genomic separation (a 600 kb window at 10 kb resolution) for free energies derived from three contact matrices (shown in legend). All show that the average free energy cost associated with forming a loop grows with the linear separation between genomic bins. Fitting a polymer model,  (see Methods) gives α=1.09, 1.085 and 1.12 for the raw, raw + ICE and hierarchical matrices.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492175&req=5

Fig1: Average free energy of interaction. The genome-wide average free energy, , as a function of genomic separation (a 600 kb window at 10 kb resolution) for free energies derived from three contact matrices (shown in legend). All show that the average free energy cost associated with forming a loop grows with the linear separation between genomic bins. Fitting a polymer model, (see Methods) gives α=1.09, 1.085 and 1.12 for the raw, raw + ICE and hierarchical matrices.
Mentions: For each contact matrix we apply our free energy transformation (see Eq. 1 in ‘Methods’), leading to three different free energy matrices that represent the energetics of interaction between genomic locations. Regardless of whether the contact matrix was normalized or not, the dominant contribution to the free energy is due to the distance dependent entropic cost of looping the DNA polymer between two genomic locations. We determine this distance dependent background free energy, by averaging together all free energy matrix elements Fi,j that are at a fixed genomic separation k=j−i (see Methods). In Figure 1 we plot for the three different free energy matrices used in the analysis. It can be seen that the free energy associated with this looping increases with the linear separation. We have fit each of the three average free energies to the prediction for that of a random polymer, namely that , where α is the scaling exponent. For an ideal random polymer in 3D, the scaling exponent would be predicted to be α=3/2. From the Drosophila Hi-C data, we find that the four matrices have average free energies that have roughly the same scaling (α=1.1±0.1). This result is in agreement with that found for other Hi-C datasets where, α∼1. We now show how the free energy fluctuations around the average can be further decomposed into an independent set of interaction modes using Principal Component Analysis (PCA).Figure 1

Bottom Line: PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out.The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping.PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

View Article: PubMed Central - PubMed

Affiliation: Physics Department, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6, BC, Canada. saied.sabery.m@gmail.com.

ABSTRACT

Background: A variety of DNA binding proteins are involved in regulating and shaping the packing of chromatin. They aid the formation of loops in the DNA that function to isolate different structural domains. A recent experimental technique, Hi-C, provides a method for determining the frequency of such looping between all distant parts of the genome. Given that the binding locations of many chromatin associated proteins have also been measured, it has been possible to make estimates for their influence on the long-range interactions as measured by Hi-C. However, a challenge in this analysis is the predominance of non-specific contacts that mask out the specific interactions of interest.

Results: We show that transforming the Hi-C contact frequencies into free energies gives a natural method for separating out the distance dependent non-specific interactions. In particular we apply Principal Component Analysis (PCA) to the transformed free energy matrix to identify the dominant modes of interaction. PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out. Thus it can be used as a data driven approach for normalizing Hi-C data. We assess this PCA based normalization approach, along with several other normalization schemes, by fitting the transformed Hi-C data using a pairwise interaction model that takes as input the known locations of bound chromatin factors. The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping. We show that the quality of the fit can be used as a means to determine how much PCA filtering should be applied to the Hi-C data.

Conclusions: We find that the different normalizations of the Hi-C data vary in the quality of fit to the pairwise interaction model. PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

Show MeSH
Related in: MedlinePlus