Limits...
Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest.

Lee J, Lee K, Joung I, Joo K, Brooks BR, Lee J - BMC Bioinformatics (2015)

Bottom Line: The benchmark results on 22 CASP9 targets show that the variability values from Sigma-RF are of higher correlations with the true distance deviation than those from Modeller.We assessed the effect of new sigma values by performing the single-domain homology modeling of 22 CASP9 targets and 24 CASP10 targets.For most of the targets tested, we could obtain more accurate 3D models from the identical alignments by using the Sigma-RF results than by using Modeller ones.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, 5635 Fishers Ln, Bethesda, 20852, USA. juyong.lee@nih.gov.

ABSTRACT

Background: In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaussian distribution, the variability of a spatial restraint, is closely related to the reliability of the restraint information extracted from a template, and it should be accurately estimated for successful template-based protein structure modeling.

Results: To predict the variability of the spatial restraints in template-based modeling, we have devised a prediction model, Sigma-RF, by using the random forest (RF) algorithm. The benchmark results on 22 CASP9 targets show that the variability values from Sigma-RF are of higher correlations with the true distance deviation than those from Modeller. We assessed the effect of new sigma values by performing the single-domain homology modeling of 22 CASP9 targets and 24 CASP10 targets. For most of the targets tested, we could obtain more accurate 3D models from the identical alignments by using the Sigma-RF results than by using Modeller ones.

Conclusions: We find that the average alignment quality of residues located between and at two aligned residues, quasi-local information, is the most contributing factor, by investigating the importance of input features used in the RF machine learning. This average alignment quality is shown to be more important than the previously identified quantity of a local information: the product of alignment qualities at two aligned residues.

Show MeSH
A comparison of template-based modeling results of T0517 and T0523 by theσRF andσnative values. The energy landscapes of template-based modeling results of (A) T0517 and (D) T0523 by σRF, σModeller and σnative. The representative structures of low and high TM-score results are superposed: (B) T0517 and (E) T0523. The average restraint energy differences, ERF−Enative, of the mirror-image structures of (C) T0517 and (F) T0523 evaluated by σRF and σnative are shown as 3D histogram plots. Positive z-axis values indicate that corresponding distance restraints are favored by σnative and disfavored by σRF.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4374281&req=5

Fig4: A comparison of template-based modeling results of T0517 and T0523 by theσRF andσnative values. The energy landscapes of template-based modeling results of (A) T0517 and (D) T0523 by σRF, σModeller and σnative. The representative structures of low and high TM-score results are superposed: (B) T0517 and (E) T0523. The average restraint energy differences, ERF−Enative, of the mirror-image structures of (C) T0517 and (F) T0523 evaluated by σRF and σnative are shown as 3D histogram plots. Positive z-axis values indicate that corresponding distance restraints are favored by σnative and disfavored by σRF.

Mentions: It should be noted that, for some targets, the average TM-scores of σRF results are even higher than those of σnative results. To identify the reason for this unintuitive result, we examined the energy landscapes of two targets, T0517 and T0523 (see Figure 4). From the energy landscapes (Figure 4A and 4D), it is clear that final 100 conformations are clustered into two groups for all three cases of σ. The majority of conformations are located near TM-score=0.75 with lower energies while some conformations are located near TM-score=0.3 with higher energies. The superposition of structures from the two regions shows that the lower TM-score structures correspond to mirror images of more native-like structures (see Figure 4B and 4E). The occurrence of mirror-images has been observed in many other modeling approaches based on the optimization of distance restraints [38-41].Figure 4


Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest.

Lee J, Lee K, Joung I, Joo K, Brooks BR, Lee J - BMC Bioinformatics (2015)

A comparison of template-based modeling results of T0517 and T0523 by theσRF andσnative values. The energy landscapes of template-based modeling results of (A) T0517 and (D) T0523 by σRF, σModeller and σnative. The representative structures of low and high TM-score results are superposed: (B) T0517 and (E) T0523. The average restraint energy differences, ERF−Enative, of the mirror-image structures of (C) T0517 and (F) T0523 evaluated by σRF and σnative are shown as 3D histogram plots. Positive z-axis values indicate that corresponding distance restraints are favored by σnative and disfavored by σRF.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4374281&req=5

Fig4: A comparison of template-based modeling results of T0517 and T0523 by theσRF andσnative values. The energy landscapes of template-based modeling results of (A) T0517 and (D) T0523 by σRF, σModeller and σnative. The representative structures of low and high TM-score results are superposed: (B) T0517 and (E) T0523. The average restraint energy differences, ERF−Enative, of the mirror-image structures of (C) T0517 and (F) T0523 evaluated by σRF and σnative are shown as 3D histogram plots. Positive z-axis values indicate that corresponding distance restraints are favored by σnative and disfavored by σRF.
Mentions: It should be noted that, for some targets, the average TM-scores of σRF results are even higher than those of σnative results. To identify the reason for this unintuitive result, we examined the energy landscapes of two targets, T0517 and T0523 (see Figure 4). From the energy landscapes (Figure 4A and 4D), it is clear that final 100 conformations are clustered into two groups for all three cases of σ. The majority of conformations are located near TM-score=0.75 with lower energies while some conformations are located near TM-score=0.3 with higher energies. The superposition of structures from the two regions shows that the lower TM-score structures correspond to mirror images of more native-like structures (see Figure 4B and 4E). The occurrence of mirror-images has been observed in many other modeling approaches based on the optimization of distance restraints [38-41].Figure 4

Bottom Line: The benchmark results on 22 CASP9 targets show that the variability values from Sigma-RF are of higher correlations with the true distance deviation than those from Modeller.We assessed the effect of new sigma values by performing the single-domain homology modeling of 22 CASP9 targets and 24 CASP10 targets.For most of the targets tested, we could obtain more accurate 3D models from the identical alignments by using the Sigma-RF results than by using Modeller ones.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, 5635 Fishers Ln, Bethesda, 20852, USA. juyong.lee@nih.gov.

ABSTRACT

Background: In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaussian distribution, the variability of a spatial restraint, is closely related to the reliability of the restraint information extracted from a template, and it should be accurately estimated for successful template-based protein structure modeling.

Results: To predict the variability of the spatial restraints in template-based modeling, we have devised a prediction model, Sigma-RF, by using the random forest (RF) algorithm. The benchmark results on 22 CASP9 targets show that the variability values from Sigma-RF are of higher correlations with the true distance deviation than those from Modeller. We assessed the effect of new sigma values by performing the single-domain homology modeling of 22 CASP9 targets and 24 CASP10 targets. For most of the targets tested, we could obtain more accurate 3D models from the identical alignments by using the Sigma-RF results than by using Modeller ones.

Conclusions: We find that the average alignment quality of residues located between and at two aligned residues, quasi-local information, is the most contributing factor, by investigating the importance of input features used in the RF machine learning. This average alignment quality is shown to be more important than the previously identified quantity of a local information: the product of alignment qualities at two aligned residues.

Show MeSH