Limits...
Assessing the limits of restraint-based 3D modeling of genomes and genomic domains.

Trussart M, Serra F, Baù D, Junier I, Serrano L, Marti-Renom MA - Nucleic Acids Res. (2015)

Bottom Line: These models were congruent with fluorescent imaging validation.Here we propose the first evaluation of a mean-field restraint-based reconstruction of genomes by considering diverse chromosome architectures and different levels of data noise and structural variability.The results show that: first, current scoring functions for 3D reconstruction correlate with the accuracy of the models; second, reconstructed models are robust to noise but sensitive to structural variability; third, the local structure organization of genomes, such as Topologically Associating Domains, results in more accurate models; fourth, to a certain extent, the models capture the intrinsic structural variability in the input matrices and fifth, the accuracy of the models can be a priori predicted by analyzing the properties of the interaction matrices.

View Article: PubMed Central - PubMed

Affiliation: EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain Universitat Pompeu Fabra (UPF), Barcelona, Spain.

Show MeSH

Related in: MedlinePlus

Model assessment. (A) Comparison of a 3D model ensemble of genome architectures for the chr40_TAD (top) and chr150_TAD (bottom) architectures. Superimposed input structures for set 0 (left models) and superimposed reconstructed 3D models (due to mirroring, TADbit generates right- and left-handed models (9)). Models are colored from particle 1 in blue to particle N in red, the start and end particles are highlighted with spheres. (B) Correlation between the restraints per particle and the accuracy of the reconstructed models as measured by the average dSCC score per architecture. Circle symbols correspond to non-TAD-like architectures. Rhomboid symbols correspond to TAD-like architecture. The colors indicate the toy genome density (green, blue and orange for 40, 75 and 150 bp/nm, respectively). (C) dRMSD distributions with respect to genome architecture. Colors correspond to the three density values with dark and pale colors corresponding to TAD-like and non-TAD-like architectures, respectively. Horizontal gray line and shade corresponds to the dRMSD distributing of comparing a ‘random genome’ of the same size and number of particles as the reconstructed models but with randomized coordinates. (D) Model accuracy as measured by dRMSD (left) and dSCC (right) with respect to the model density. Each density is colored as in panel A and contains seven distributions from the seven sets of structures from set 0 () to high structural variability set 6 () with dark to pale colors, respectively. Horizontal gray lines and shade as in panel C. (E) Correlation between the dRMSD values per reconstructed models and the Spearman correlation coefficient of the contact map from the reconstructed models and the original toy genome structures (TADbit- SCC). The points are colored proportional to the level of structural variability in the matrix (yellow to red from low set 0 () to high structural variability set 6 ()). Shapes represented as in panel B. (F) Same as panel E but now the points are colored by the level of noise in the data (yellow to red for low to high levels of noise, that is from α = 50 to 200). The regression coefficients indicate the correlation per noise level α. (G) Correlation between structural variability in the toy genome structures and in the reconstructed models. Colors and shapes as in panel B.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4402535&req=5

Figure 3: Model assessment. (A) Comparison of a 3D model ensemble of genome architectures for the chr40_TAD (top) and chr150_TAD (bottom) architectures. Superimposed input structures for set 0 (left models) and superimposed reconstructed 3D models (due to mirroring, TADbit generates right- and left-handed models (9)). Models are colored from particle 1 in blue to particle N in red, the start and end particles are highlighted with spheres. (B) Correlation between the restraints per particle and the accuracy of the reconstructed models as measured by the average dSCC score per architecture. Circle symbols correspond to non-TAD-like architectures. Rhomboid symbols correspond to TAD-like architecture. The colors indicate the toy genome density (green, blue and orange for 40, 75 and 150 bp/nm, respectively). (C) dRMSD distributions with respect to genome architecture. Colors correspond to the three density values with dark and pale colors corresponding to TAD-like and non-TAD-like architectures, respectively. Horizontal gray line and shade corresponds to the dRMSD distributing of comparing a ‘random genome’ of the same size and number of particles as the reconstructed models but with randomized coordinates. (D) Model accuracy as measured by dRMSD (left) and dSCC (right) with respect to the model density. Each density is colored as in panel A and contains seven distributions from the seven sets of structures from set 0 () to high structural variability set 6 () with dark to pale colors, respectively. Horizontal gray lines and shade as in panel C. (E) Correlation between the dRMSD values per reconstructed models and the Spearman correlation coefficient of the contact map from the reconstructed models and the original toy genome structures (TADbit- SCC). The points are colored proportional to the level of structural variability in the matrix (yellow to red from low set 0 () to high structural variability set 6 ()). Shapes represented as in panel B. (F) Same as panel E but now the points are colored by the level of noise in the data (yellow to red for low to high levels of noise, that is from α = 50 to 200). The regression coefficients indicate the correlation per noise level α. (G) Correlation between structural variability in the toy genome structures and in the reconstructed models. Colors and shapes as in panel B.

Mentions: To assess the accuracy of the genomic 3D models built by TADbit, we calculated two different accuracy measures between the reconstructed models and the toy genomic structures (that is, the dRMSD and the dSCC). Both measures of accuracy were calculated for all reconstructed models and averaged over architecturally similar toy genomes (Table 1). In total, we generated 168 simulated Hi-C matrices for the six toy genome architectures (that is, six architectures with seven levels of structural variability and each with four levels of noise in the data). The reconstructed architecture that best fitted the input structures corresponded to the 40 bp/nm density with a TAD-like architecture (chr40_TAD), with an average dRMSD of 60.5 nm and dSCC of 0.79. The architecture most difficult to reconstruct corresponded to 150 bp/nm density with no TAD-like features (chr150), with an average dRMSD of 86.4 nm and dSCC of 0.51. These values correspond to average measures over the 28 simulated Hi-C matrices per architecture, which include varying degrees of noise and structural variability. For example, within the chr40_TAD architecture, one of the best reconstructions corresponded to the matrix with mid noise level (α = 100), and low structural variability (), which resulted in a 3D model with dRMSD of 32.7 nm and dSCC of 0.94 (Figure 3A, top). Similarly, for the low-resolution architecture 150T, the best result (dRMSD = 45.4 nm and dSCC = 0.86) corresponded to a low level of noise (α = 50) and low structural variability () (Figure 3A, bottom). In summary, TADbit was able to produce accurate models for all six toy genome architectures with a varying degree of accuracy depending on the levels of noise and structural variability in the simulated Hi-C matrices.


Assessing the limits of restraint-based 3D modeling of genomes and genomic domains.

Trussart M, Serra F, Baù D, Junier I, Serrano L, Marti-Renom MA - Nucleic Acids Res. (2015)

Model assessment. (A) Comparison of a 3D model ensemble of genome architectures for the chr40_TAD (top) and chr150_TAD (bottom) architectures. Superimposed input structures for set 0 (left models) and superimposed reconstructed 3D models (due to mirroring, TADbit generates right- and left-handed models (9)). Models are colored from particle 1 in blue to particle N in red, the start and end particles are highlighted with spheres. (B) Correlation between the restraints per particle and the accuracy of the reconstructed models as measured by the average dSCC score per architecture. Circle symbols correspond to non-TAD-like architectures. Rhomboid symbols correspond to TAD-like architecture. The colors indicate the toy genome density (green, blue and orange for 40, 75 and 150 bp/nm, respectively). (C) dRMSD distributions with respect to genome architecture. Colors correspond to the three density values with dark and pale colors corresponding to TAD-like and non-TAD-like architectures, respectively. Horizontal gray line and shade corresponds to the dRMSD distributing of comparing a ‘random genome’ of the same size and number of particles as the reconstructed models but with randomized coordinates. (D) Model accuracy as measured by dRMSD (left) and dSCC (right) with respect to the model density. Each density is colored as in panel A and contains seven distributions from the seven sets of structures from set 0 () to high structural variability set 6 () with dark to pale colors, respectively. Horizontal gray lines and shade as in panel C. (E) Correlation between the dRMSD values per reconstructed models and the Spearman correlation coefficient of the contact map from the reconstructed models and the original toy genome structures (TADbit- SCC). The points are colored proportional to the level of structural variability in the matrix (yellow to red from low set 0 () to high structural variability set 6 ()). Shapes represented as in panel B. (F) Same as panel E but now the points are colored by the level of noise in the data (yellow to red for low to high levels of noise, that is from α = 50 to 200). The regression coefficients indicate the correlation per noise level α. (G) Correlation between structural variability in the toy genome structures and in the reconstructed models. Colors and shapes as in panel B.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4402535&req=5

Figure 3: Model assessment. (A) Comparison of a 3D model ensemble of genome architectures for the chr40_TAD (top) and chr150_TAD (bottom) architectures. Superimposed input structures for set 0 (left models) and superimposed reconstructed 3D models (due to mirroring, TADbit generates right- and left-handed models (9)). Models are colored from particle 1 in blue to particle N in red, the start and end particles are highlighted with spheres. (B) Correlation between the restraints per particle and the accuracy of the reconstructed models as measured by the average dSCC score per architecture. Circle symbols correspond to non-TAD-like architectures. Rhomboid symbols correspond to TAD-like architecture. The colors indicate the toy genome density (green, blue and orange for 40, 75 and 150 bp/nm, respectively). (C) dRMSD distributions with respect to genome architecture. Colors correspond to the three density values with dark and pale colors corresponding to TAD-like and non-TAD-like architectures, respectively. Horizontal gray line and shade corresponds to the dRMSD distributing of comparing a ‘random genome’ of the same size and number of particles as the reconstructed models but with randomized coordinates. (D) Model accuracy as measured by dRMSD (left) and dSCC (right) with respect to the model density. Each density is colored as in panel A and contains seven distributions from the seven sets of structures from set 0 () to high structural variability set 6 () with dark to pale colors, respectively. Horizontal gray lines and shade as in panel C. (E) Correlation between the dRMSD values per reconstructed models and the Spearman correlation coefficient of the contact map from the reconstructed models and the original toy genome structures (TADbit- SCC). The points are colored proportional to the level of structural variability in the matrix (yellow to red from low set 0 () to high structural variability set 6 ()). Shapes represented as in panel B. (F) Same as panel E but now the points are colored by the level of noise in the data (yellow to red for low to high levels of noise, that is from α = 50 to 200). The regression coefficients indicate the correlation per noise level α. (G) Correlation between structural variability in the toy genome structures and in the reconstructed models. Colors and shapes as in panel B.
Mentions: To assess the accuracy of the genomic 3D models built by TADbit, we calculated two different accuracy measures between the reconstructed models and the toy genomic structures (that is, the dRMSD and the dSCC). Both measures of accuracy were calculated for all reconstructed models and averaged over architecturally similar toy genomes (Table 1). In total, we generated 168 simulated Hi-C matrices for the six toy genome architectures (that is, six architectures with seven levels of structural variability and each with four levels of noise in the data). The reconstructed architecture that best fitted the input structures corresponded to the 40 bp/nm density with a TAD-like architecture (chr40_TAD), with an average dRMSD of 60.5 nm and dSCC of 0.79. The architecture most difficult to reconstruct corresponded to 150 bp/nm density with no TAD-like features (chr150), with an average dRMSD of 86.4 nm and dSCC of 0.51. These values correspond to average measures over the 28 simulated Hi-C matrices per architecture, which include varying degrees of noise and structural variability. For example, within the chr40_TAD architecture, one of the best reconstructions corresponded to the matrix with mid noise level (α = 100), and low structural variability (), which resulted in a 3D model with dRMSD of 32.7 nm and dSCC of 0.94 (Figure 3A, top). Similarly, for the low-resolution architecture 150T, the best result (dRMSD = 45.4 nm and dSCC = 0.86) corresponded to a low level of noise (α = 50) and low structural variability () (Figure 3A, bottom). In summary, TADbit was able to produce accurate models for all six toy genome architectures with a varying degree of accuracy depending on the levels of noise and structural variability in the simulated Hi-C matrices.

Bottom Line: These models were congruent with fluorescent imaging validation.Here we propose the first evaluation of a mean-field restraint-based reconstruction of genomes by considering diverse chromosome architectures and different levels of data noise and structural variability.The results show that: first, current scoring functions for 3D reconstruction correlate with the accuracy of the models; second, reconstructed models are robust to noise but sensitive to structural variability; third, the local structure organization of genomes, such as Topologically Associating Domains, results in more accurate models; fourth, to a certain extent, the models capture the intrinsic structural variability in the input matrices and fifth, the accuracy of the models can be a priori predicted by analyzing the properties of the interaction matrices.

View Article: PubMed Central - PubMed

Affiliation: EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain Universitat Pompeu Fabra (UPF), Barcelona, Spain.

Show MeSH
Related in: MedlinePlus