Limits...
Probing long-range interactions by extracting free energies from genome-wide chromosome conformation capture data.

Saberi S, Farré P, Cuvier O, Emberly E - BMC Bioinformatics (2015)

Bottom Line: PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out.The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping.PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

View Article: PubMed Central - PubMed

Affiliation: Physics Department, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6, BC, Canada. saied.sabery.m@gmail.com.

ABSTRACT

Background: A variety of DNA binding proteins are involved in regulating and shaping the packing of chromatin. They aid the formation of loops in the DNA that function to isolate different structural domains. A recent experimental technique, Hi-C, provides a method for determining the frequency of such looping between all distant parts of the genome. Given that the binding locations of many chromatin associated proteins have also been measured, it has been possible to make estimates for their influence on the long-range interactions as measured by Hi-C. However, a challenge in this analysis is the predominance of non-specific contacts that mask out the specific interactions of interest.

Results: We show that transforming the Hi-C contact frequencies into free energies gives a natural method for separating out the distance dependent non-specific interactions. In particular we apply Principal Component Analysis (PCA) to the transformed free energy matrix to identify the dominant modes of interaction. PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out. Thus it can be used as a data driven approach for normalizing Hi-C data. We assess this PCA based normalization approach, along with several other normalization schemes, by fitting the transformed Hi-C data using a pairwise interaction model that takes as input the known locations of bound chromatin factors. The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping. We show that the quality of the fit can be used as a means to determine how much PCA filtering should be applied to the Hi-C data.

Conclusions: We find that the different normalizations of the Hi-C data vary in the quality of fit to the pairwise interaction model. PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

Show MeSH

Related in: MedlinePlus

PCA filtering can improve fit to interaction model.A) There is an optimal number of PCs to use in reconstructing the energies of interaction δFi,j. Shown are the Pearson correlation coefficient and reduced χ2 of the genome-wide fits of the given data (see legend) to the model using Eq. 5 as a function of the number of PCs used in the filtering. For the energies derived from raw matrix, PC1 was excluded as it is simply a DC offset. (B, C) show the best fit results for the various datasets by chromosome. PCA filtering for the raw matrix leads to the best overall results.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492175&req=5

Fig4: PCA filtering can improve fit to interaction model.A) There is an optimal number of PCs to use in reconstructing the energies of interaction δFi,j. Shown are the Pearson correlation coefficient and reduced χ2 of the genome-wide fits of the given data (see legend) to the model using Eq. 5 as a function of the number of PCs used in the filtering. For the energies derived from raw matrix, PC1 was excluded as it is simply a DC offset. (B, C) show the best fit results for the various datasets by chromosome. PCA filtering for the raw matrix leads to the best overall results.

Mentions: For each set of interactions energies, either filtered by PCA or some other normalization method, we fit Eq. 5 to determine a fitted set of coupling energies Jμ,ν. (We have fit all the chromosomes at once, as well as chromosome by chromosome, allowing us to determine how much the fitted J’s vary by chromosome). We use χ2 and the Pearson correlation coefficient to determine how much PC filtering, if any should be applied to the interaction energies. (All of the fits are statistically significant, as determined by a permutation test, which gave r∼0). In Figure 4, we show that for the interaction energies derived from the raw and raw + ICE matrices, that PCA filtering can improve the quality of the fit. Figure 4A shows that using the first 35 PCs leads to the best genome-wide fit of the data by the model (for the non-ICED matrix, we also left out the DC offset PC). Interestingly, using ICE reduced the overall quality of the fit compared to the raw matrix, though PC filtering was able to improve the fitting for both. This reduction in fit quality is potentially not surprising as any normalization method is removing information present in the original data. We found that applying any form of PC filtering to the interaction energies derived from the hierarchical normalized matrix always made the fit worse. As a summary, in Figure 4B,C we show the chromosome by chromosome χ2 and Pearson correlation coefficient for the various fits of the model to both PC filtered and unfiltered data. PC filtering of the energies computed from the raw matrix give the best overall fit. The distance dependent scalings applied in the hierarchical normalization method lower the correlation between the interaction energies and the underlying bound chromatin factors, lessening the quality of the fit.Figure 4


Probing long-range interactions by extracting free energies from genome-wide chromosome conformation capture data.

Saberi S, Farré P, Cuvier O, Emberly E - BMC Bioinformatics (2015)

PCA filtering can improve fit to interaction model.A) There is an optimal number of PCs to use in reconstructing the energies of interaction δFi,j. Shown are the Pearson correlation coefficient and reduced χ2 of the genome-wide fits of the given data (see legend) to the model using Eq. 5 as a function of the number of PCs used in the filtering. For the energies derived from raw matrix, PC1 was excluded as it is simply a DC offset. (B, C) show the best fit results for the various datasets by chromosome. PCA filtering for the raw matrix leads to the best overall results.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492175&req=5

Fig4: PCA filtering can improve fit to interaction model.A) There is an optimal number of PCs to use in reconstructing the energies of interaction δFi,j. Shown are the Pearson correlation coefficient and reduced χ2 of the genome-wide fits of the given data (see legend) to the model using Eq. 5 as a function of the number of PCs used in the filtering. For the energies derived from raw matrix, PC1 was excluded as it is simply a DC offset. (B, C) show the best fit results for the various datasets by chromosome. PCA filtering for the raw matrix leads to the best overall results.
Mentions: For each set of interactions energies, either filtered by PCA or some other normalization method, we fit Eq. 5 to determine a fitted set of coupling energies Jμ,ν. (We have fit all the chromosomes at once, as well as chromosome by chromosome, allowing us to determine how much the fitted J’s vary by chromosome). We use χ2 and the Pearson correlation coefficient to determine how much PC filtering, if any should be applied to the interaction energies. (All of the fits are statistically significant, as determined by a permutation test, which gave r∼0). In Figure 4, we show that for the interaction energies derived from the raw and raw + ICE matrices, that PCA filtering can improve the quality of the fit. Figure 4A shows that using the first 35 PCs leads to the best genome-wide fit of the data by the model (for the non-ICED matrix, we also left out the DC offset PC). Interestingly, using ICE reduced the overall quality of the fit compared to the raw matrix, though PC filtering was able to improve the fitting for both. This reduction in fit quality is potentially not surprising as any normalization method is removing information present in the original data. We found that applying any form of PC filtering to the interaction energies derived from the hierarchical normalized matrix always made the fit worse. As a summary, in Figure 4B,C we show the chromosome by chromosome χ2 and Pearson correlation coefficient for the various fits of the model to both PC filtered and unfiltered data. PC filtering of the energies computed from the raw matrix give the best overall fit. The distance dependent scalings applied in the hierarchical normalization method lower the correlation between the interaction energies and the underlying bound chromatin factors, lessening the quality of the fit.Figure 4

Bottom Line: PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out.The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping.PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

View Article: PubMed Central - PubMed

Affiliation: Physics Department, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6, BC, Canada. saied.sabery.m@gmail.com.

ABSTRACT

Background: A variety of DNA binding proteins are involved in regulating and shaping the packing of chromatin. They aid the formation of loops in the DNA that function to isolate different structural domains. A recent experimental technique, Hi-C, provides a method for determining the frequency of such looping between all distant parts of the genome. Given that the binding locations of many chromatin associated proteins have also been measured, it has been possible to make estimates for their influence on the long-range interactions as measured by Hi-C. However, a challenge in this analysis is the predominance of non-specific contacts that mask out the specific interactions of interest.

Results: We show that transforming the Hi-C contact frequencies into free energies gives a natural method for separating out the distance dependent non-specific interactions. In particular we apply Principal Component Analysis (PCA) to the transformed free energy matrix to identify the dominant modes of interaction. PCA identifies systematic effects as well as high frequency spatial noise in the Hi-C data which can be filtered out. Thus it can be used as a data driven approach for normalizing Hi-C data. We assess this PCA based normalization approach, along with several other normalization schemes, by fitting the transformed Hi-C data using a pairwise interaction model that takes as input the known locations of bound chromatin factors. The result of fitting is a set of predictions for the coupling energies between the various chromatin factors and their effect on the energetics of looping. We show that the quality of the fit can be used as a means to determine how much PCA filtering should be applied to the Hi-C data.

Conclusions: We find that the different normalizations of the Hi-C data vary in the quality of fit to the pairwise interaction model. PCA filtering can improve the fit, and the predicted coupling energies lead to biologically meaningful insights for how various chromatin bound factors influence the stability of DNA loops in chromatin.

Show MeSH
Related in: MedlinePlus