Limits...
Reconstruction of 3D genome architecture via a two-stage algorithm.

Segal MR, Bengtsson HL - BMC Bioinformatics (2015)

Bottom Line: After describing the algorithm we present 3D architectures for mouse embryonic stem cells and human lymphoblastoid cells.We further analyze replicate data at differing resolutions obtained from recently devised in situ Hi-C assays.The improvements are such that we can progress from 1 Mb resolution to 100 kb resolution, notable since this latter value has been identified as critical to inferring topological domains in analyses performed on the contact (rather than 3D) level.

View Article: PubMed Central - PubMed

Affiliation: Division of Bioinformatics, Department of Epidemiology and Biostatistics, University of California, 550 16th Street, San Francisco, 94158, CA, USA. mark@biostat.ucsf.edu.

ABSTRACT

Background: The three-dimensional (3D) configuration of chromosomes within the eukaryote nucleus is an important factor for several cellular functions, including gene expression regulation, and has also been linked with cancer-causing translocation events. While visualization of such architecture remains limited to low resolutions, the ability to infer structures at increasing resolutions has been enabled by recently-devised chromosome conformation capture techniques. In particular, when coupled with next generation sequencing, such methods yield an inventory of genome-wide chromatin contacts or interactions. Various algorithms have been advanced to operate on such contact data to produce reconstructed 3D configurations. Studies have shown that these reconstructions can provide added value over raw interaction data with respect to downstream biological insights. However, only limited, low-resolution reconstructions have been realized for mammals due to computational bottlenecks.

Results: Here we propose a two-stage algorithm to partially overcome these computational barriers. The central idea is to initially utilize existing reconstruction techniques on an individual chromosome basis, using intra-chromosomal contacts, and then to relatively position these chromosome-level reconstructions using inter-chromosomal contacts. This two-stage strategy represents a natural approach in view of the within- versus between- chromosome distribution of contacts. It can increase resolution ≈ 20 fold for mouse and human. After describing the algorithm we present 3D architectures for mouse embryonic stem cells and human lymphoblastoid cells. We evaluate the impact of several factors on reconstruction reproducibility and explore a variety of sampling schemes. We further analyze replicate data at differing resolutions obtained from recently devised in situ Hi-C assays. In all instances we demonstrate insensitivity of the whole-genome 3D reconstruction obtained by the two-stage algorithm to the sampling strategy used.

Conclusions: Our two-stage algorithm has the potential to significantly increase the resolution of 3D genome reconstructions. The improvements are such that we can progress from 1 Mb resolution to 100 kb resolution, notable since this latter value has been identified as critical to inferring topological domains in analyses performed on the contact (rather than 3D) level.

No MeSH data available.


RMSD comparisons between replicates at 500 kb resolution. Comparisons across a range (0.10,0.20,…,1.0) of second stage equi-spaced sampling proportions for reconstructions from primary and replicate human GM12878 B-lymphoblastoid cell line pools [12] at a resolution of 500 kb. The referent reconstruction is based on a proportion of 1.0 for the primary series. Overplotting obscures coincident points at some sampling proportions
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4638111&req=5

Fig5: RMSD comparisons between replicates at 500 kb resolution. Comparisons across a range (0.10,0.20,…,1.0) of second stage equi-spaced sampling proportions for reconstructions from primary and replicate human GM12878 B-lymphoblastoid cell line pools [12] at a resolution of 500 kb. The referent reconstruction is based on a proportion of 1.0 for the primary series. Overplotting obscures coincident points at some sampling proportions

Mentions: Results are displayed in Figs. 4 and 5. Once again, we see invariance with respect to sampling proportion. Moreover, at both resolutions, distance as measured via RMSD to the referent is almost identical for the primary and replicate studies at all sampling fractions and these values are not substantially increased over the RMSD for the replicate series when no downsampling is employed. While examination of per sampling fraction primary versus replicate comparisons, without use of a global referent at both 1 Mb and 500 kb resolutions revealed no systematic trends, there was no indication of deteriorated performance at low sampling fractions. We also made comparisons between these resolutions. These were performed by thinning a given 500 kb reconstruction, for a particular sampling fraction, so that the genomic loci corresponding to each 3D point of the reconstruction were also represented in the 1 Mb reconstruction at that same sampling fraction. The thinning essentially amounts to considering every other point. Procrustes transformation was then used to align these reconstructions and the attendant RMSD obtained. These RMSDs were very small across the suite of sampling fractions ranging from 1.6 ×10−3 to 4.0 ×10−3. This good agreement in part reflects the fact that the data underlying the 1 Mb reconstructions is arrived at by binning the 500 kb data, as opposed to being independently generated.Fig. 4


Reconstruction of 3D genome architecture via a two-stage algorithm.

Segal MR, Bengtsson HL - BMC Bioinformatics (2015)

RMSD comparisons between replicates at 500 kb resolution. Comparisons across a range (0.10,0.20,…,1.0) of second stage equi-spaced sampling proportions for reconstructions from primary and replicate human GM12878 B-lymphoblastoid cell line pools [12] at a resolution of 500 kb. The referent reconstruction is based on a proportion of 1.0 for the primary series. Overplotting obscures coincident points at some sampling proportions
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4638111&req=5

Fig5: RMSD comparisons between replicates at 500 kb resolution. Comparisons across a range (0.10,0.20,…,1.0) of second stage equi-spaced sampling proportions for reconstructions from primary and replicate human GM12878 B-lymphoblastoid cell line pools [12] at a resolution of 500 kb. The referent reconstruction is based on a proportion of 1.0 for the primary series. Overplotting obscures coincident points at some sampling proportions
Mentions: Results are displayed in Figs. 4 and 5. Once again, we see invariance with respect to sampling proportion. Moreover, at both resolutions, distance as measured via RMSD to the referent is almost identical for the primary and replicate studies at all sampling fractions and these values are not substantially increased over the RMSD for the replicate series when no downsampling is employed. While examination of per sampling fraction primary versus replicate comparisons, without use of a global referent at both 1 Mb and 500 kb resolutions revealed no systematic trends, there was no indication of deteriorated performance at low sampling fractions. We also made comparisons between these resolutions. These were performed by thinning a given 500 kb reconstruction, for a particular sampling fraction, so that the genomic loci corresponding to each 3D point of the reconstruction were also represented in the 1 Mb reconstruction at that same sampling fraction. The thinning essentially amounts to considering every other point. Procrustes transformation was then used to align these reconstructions and the attendant RMSD obtained. These RMSDs were very small across the suite of sampling fractions ranging from 1.6 ×10−3 to 4.0 ×10−3. This good agreement in part reflects the fact that the data underlying the 1 Mb reconstructions is arrived at by binning the 500 kb data, as opposed to being independently generated.Fig. 4

Bottom Line: After describing the algorithm we present 3D architectures for mouse embryonic stem cells and human lymphoblastoid cells.We further analyze replicate data at differing resolutions obtained from recently devised in situ Hi-C assays.The improvements are such that we can progress from 1 Mb resolution to 100 kb resolution, notable since this latter value has been identified as critical to inferring topological domains in analyses performed on the contact (rather than 3D) level.

View Article: PubMed Central - PubMed

Affiliation: Division of Bioinformatics, Department of Epidemiology and Biostatistics, University of California, 550 16th Street, San Francisco, 94158, CA, USA. mark@biostat.ucsf.edu.

ABSTRACT

Background: The three-dimensional (3D) configuration of chromosomes within the eukaryote nucleus is an important factor for several cellular functions, including gene expression regulation, and has also been linked with cancer-causing translocation events. While visualization of such architecture remains limited to low resolutions, the ability to infer structures at increasing resolutions has been enabled by recently-devised chromosome conformation capture techniques. In particular, when coupled with next generation sequencing, such methods yield an inventory of genome-wide chromatin contacts or interactions. Various algorithms have been advanced to operate on such contact data to produce reconstructed 3D configurations. Studies have shown that these reconstructions can provide added value over raw interaction data with respect to downstream biological insights. However, only limited, low-resolution reconstructions have been realized for mammals due to computational bottlenecks.

Results: Here we propose a two-stage algorithm to partially overcome these computational barriers. The central idea is to initially utilize existing reconstruction techniques on an individual chromosome basis, using intra-chromosomal contacts, and then to relatively position these chromosome-level reconstructions using inter-chromosomal contacts. This two-stage strategy represents a natural approach in view of the within- versus between- chromosome distribution of contacts. It can increase resolution ≈ 20 fold for mouse and human. After describing the algorithm we present 3D architectures for mouse embryonic stem cells and human lymphoblastoid cells. We evaluate the impact of several factors on reconstruction reproducibility and explore a variety of sampling schemes. We further analyze replicate data at differing resolutions obtained from recently devised in situ Hi-C assays. In all instances we demonstrate insensitivity of the whole-genome 3D reconstruction obtained by the two-stage algorithm to the sampling strategy used.

Conclusions: Our two-stage algorithm has the potential to significantly increase the resolution of 3D genome reconstructions. The improvements are such that we can progress from 1 Mb resolution to 100 kb resolution, notable since this latter value has been identified as critical to inferring topological domains in analyses performed on the contact (rather than 3D) level.

No MeSH data available.