Limits...
Probing long-range interactions by extracting free energies from genome-wide chromosome conformation capture data.

Saberi S, Farré P, Cuvier O, Emberly E - BMC Bioinformatics (2015)

Bottom Line: PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out.The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping.PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

View Article: PubMed Central - PubMed

Affiliation: Physics Department, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6, BC, Canada. saied.sabery.m@gmail.com.

ABSTRACT

Background: A variety of DNA binding proteins are involved in regulating and shaping the packing of chromatin. They aid the formation of loops in the DNA that function to isolate different structural domains. A recent experimental technique, Hi-C, provides a method for determining the frequency of such looping between all distant parts of the genome. Given that the binding locations of many chromatin associated proteins have also been measured, it has been possible to make estimates for their influence on the long-range interactions as measured by Hi-C. However, a challenge in this analysis is the predominance of non-specific contacts that mask out the specific interactions of interest.

Results: We show that transforming the Hi-C contact frequencies into free energies gives a natural method for separating out the distance dependent non-specific interactions. In particular we apply Principal Component Analysis (PCA) to the transformed free energy matrix to identify the dominant modes of interaction. PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out. Thus it can be used as a data driven approach for normalizing Hi-C data. We assess this PCA based normalization approach, along with several other normalization schemes, by fitting the transformed Hi-C data using a pairwise interaction model that takes as input the known locations of bound chromatin factors. The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping. We show that the quality of the fit can be used as a means to determine how much PCA filtering should be applied to the Hi-C data.

Conclusions: We find that the different normalizations of the Hi-C data vary in the quality of fit to the pairwise interaction model. PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

Show MeSH

Related in: MedlinePlus

Free energy principal components and chromatin-binding profiles. Shown are the first four principal components (A, B, C, D) calculated genome-wide from the Fi,j matrix created from the raw + ICE contact matrix (top plots). Below each free energy profile are heat maps of the genome-wide average binding profiles for the selected chromatin factors (see Text). The top heat map corresponds to the positive free energy interaction profile (blue curve), and the bottom heat map for that of the inverse profile (red curve). Red regions in the heat maps represent locations of higher occupancy and blue regions represent lower occupancy. The range of the heat maps goes from 0.0 (blue) to 1.0 (red).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492175&req=5

Fig2: Free energy principal components and chromatin-binding profiles. Shown are the first four principal components (A, B, C, D) calculated genome-wide from the Fi,j matrix created from the raw + ICE contact matrix (top plots). Below each free energy profile are heat maps of the genome-wide average binding profiles for the selected chromatin factors (see Text). The top heat map corresponds to the positive free energy interaction profile (blue curve), and the bottom heat map for that of the inverse profile (red curve). Red regions in the heat maps represent locations of higher occupancy and blue regions represent lower occupancy. The range of the heat maps goes from 0.0 (blue) to 1.0 (red).

Mentions: We performed PCA on each of the three matrices. In Figure 2 we show the top four principal components (PCs) for the raw + ICE free energy matrix. Each PC shows the variation in the free energy as a function of the genomic separation from the bin located at k=0. Positive free energies correspond to repulsive interactions whereas negative ones are attractive, and thus represent stabilizing interactions. It should also be noted that for each PC there is also the inverse interaction profile that is obtained by multiplying the PC by −1. These PCs can also be interpreted as a set of spatial modes with which to represent the data, akin to a Fourier decomposition. The characteristic spatial frequency of a PC increases as the corresponding eigenvalue (variance) associated with it decreases. Many of the PCs corresponding to small eigenvalues represent high-frequency noise. In what follows, we show that this noise can be filtered out by reconstructing the specific interaction energies (Eq. 4) without including them in the sum. The PCs resulting from the different free energy matrices are similar but do have key differences as shown in Additional file 1: Figure S2. (We note that the top PCs still emerged if a smaller subsample of free energy profiles was used, reducing the effect of nearby correlated bins, see Additional file 2: Figure S3). For example, if ICE normalization has not been performed on the raw matrix the first PC is an overall constant offset since the bins of the free energy matrix have different means. Also the spatial frequencies differed between the PCs derived from the raw or raw + ICE matrices compared to those from the hierarchical matrix. We attribute this to the distance scaling correction that is applied in the hierarchical normalization method. This will turn out to have consequences in how well the interaction model fits the hierarchical normalized Hi-C data.Figure 2


Probing long-range interactions by extracting free energies from genome-wide chromosome conformation capture data.

Saberi S, Farré P, Cuvier O, Emberly E - BMC Bioinformatics (2015)

Free energy principal components and chromatin-binding profiles. Shown are the first four principal components (A, B, C, D) calculated genome-wide from the Fi,j matrix created from the raw + ICE contact matrix (top plots). Below each free energy profile are heat maps of the genome-wide average binding profiles for the selected chromatin factors (see Text). The top heat map corresponds to the positive free energy interaction profile (blue curve), and the bottom heat map for that of the inverse profile (red curve). Red regions in the heat maps represent locations of higher occupancy and blue regions represent lower occupancy. The range of the heat maps goes from 0.0 (blue) to 1.0 (red).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492175&req=5

Fig2: Free energy principal components and chromatin-binding profiles. Shown are the first four principal components (A, B, C, D) calculated genome-wide from the Fi,j matrix created from the raw + ICE contact matrix (top plots). Below each free energy profile are heat maps of the genome-wide average binding profiles for the selected chromatin factors (see Text). The top heat map corresponds to the positive free energy interaction profile (blue curve), and the bottom heat map for that of the inverse profile (red curve). Red regions in the heat maps represent locations of higher occupancy and blue regions represent lower occupancy. The range of the heat maps goes from 0.0 (blue) to 1.0 (red).
Mentions: We performed PCA on each of the three matrices. In Figure 2 we show the top four principal components (PCs) for the raw + ICE free energy matrix. Each PC shows the variation in the free energy as a function of the genomic separation from the bin located at k=0. Positive free energies correspond to repulsive interactions whereas negative ones are attractive, and thus represent stabilizing interactions. It should also be noted that for each PC there is also the inverse interaction profile that is obtained by multiplying the PC by −1. These PCs can also be interpreted as a set of spatial modes with which to represent the data, akin to a Fourier decomposition. The characteristic spatial frequency of a PC increases as the corresponding eigenvalue (variance) associated with it decreases. Many of the PCs corresponding to small eigenvalues represent high-frequency noise. In what follows, we show that this noise can be filtered out by reconstructing the specific interaction energies (Eq. 4) without including them in the sum. The PCs resulting from the different free energy matrices are similar but do have key differences as shown in Additional file 1: Figure S2. (We note that the top PCs still emerged if a smaller subsample of free energy profiles was used, reducing the effect of nearby correlated bins, see Additional file 2: Figure S3). For example, if ICE normalization has not been performed on the raw matrix the first PC is an overall constant offset since the bins of the free energy matrix have different means. Also the spatial frequencies differed between the PCs derived from the raw or raw + ICE matrices compared to those from the hierarchical matrix. We attribute this to the distance scaling correction that is applied in the hierarchical normalization method. This will turn out to have consequences in how well the interaction model fits the hierarchical normalized Hi-C data.Figure 2

Bottom Line: PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out.The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping.PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

View Article: PubMed Central - PubMed

Affiliation: Physics Department, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6, BC, Canada. saied.sabery.m@gmail.com.

ABSTRACT

Background: A variety of DNA binding proteins are involved in regulating and shaping the packing of chromatin. They aid the formation of loops in the DNA that function to isolate different structural domains. A recent experimental technique, Hi-C, provides a method for determining the frequency of such looping between all distant parts of the genome. Given that the binding locations of many chromatin associated proteins have also been measured, it has been possible to make estimates for their influence on the long-range interactions as measured by Hi-C. However, a challenge in this analysis is the predominance of non-specific contacts that mask out the specific interactions of interest.

Results: We show that transforming the Hi-C contact frequencies into free energies gives a natural method for separating out the distance dependent non-specific interactions. In particular we apply Principal Component Analysis (PCA) to the transformed free energy matrix to identify the dominant modes of interaction. PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out. Thus it can be used as a data driven approach for normalizing Hi-C data. We assess this PCA based normalization approach, along with several other normalization schemes, by fitting the transformed Hi-C data using a pairwise interaction model that takes as input the known locations of bound chromatin factors. The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping. We show that the quality of the fit can be used as a means to determine how much PCA filtering should be applied to the Hi-C data.

Conclusions: We find that the different normalizations of the Hi-C data vary in the quality of fit to the pairwise interaction model. PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

Show MeSH
Related in: MedlinePlus