Limits...
Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest.

Lee J, Lee K, Joung I, Joo K, Brooks BR, Lee J - BMC Bioinformatics (2015)

Bottom Line: The benchmark results on 22 CASP9 targets show that the variability values from Sigma-RF are of higher correlations with the true distance deviation than those from Modeller.We assessed the effect of new sigma values by performing the single-domain homology modeling of 22 CASP9 targets and 24 CASP10 targets.For most of the targets tested, we could obtain more accurate 3D models from the identical alignments by using the Sigma-RF results than by using Modeller ones.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, 5635 Fishers Ln, Bethesda, 20852, USA. juyong.lee@nih.gov.

ABSTRACT

Background: In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaussian distribution, the variability of a spatial restraint, is closely related to the reliability of the restraint information extracted from a template, and it should be accurately estimated for successful template-based protein structure modeling.

Results: To predict the variability of the spatial restraints in template-based modeling, we have devised a prediction model, Sigma-RF, by using the random forest (RF) algorithm. The benchmark results on 22 CASP9 targets show that the variability values from Sigma-RF are of higher correlations with the true distance deviation than those from Modeller. We assessed the effect of new sigma values by performing the single-domain homology modeling of 22 CASP9 targets and 24 CASP10 targets. For most of the targets tested, we could obtain more accurate 3D models from the identical alignments by using the Sigma-RF results than by using Modeller ones.

Conclusions: We find that the average alignment quality of residues located between and at two aligned residues, quasi-local information, is the most contributing factor, by investigating the importance of input features used in the RF machine learning. This average alignment quality is shown to be more important than the previously identified quantity of a local information: the product of alignment qualities at two aligned residues.

Show MeSH
A comparison of TM-scores and lDDT-scores of 3D models generated by Modeller usingσRF andσModeller from those usingσnative. The TM-score results are shown in panel A and B, and the lDDT-score results are shown in panel C and D. For all plots, X-axes represent the quality measure differences between models obtained by σModeller and σnative. Y-axes represent the differences between models obtained by σRF and σnative. The green lines represent the y=x line, which corresponds to the identical model quality. The number of dots over the green line corresponds to the targets that are improved by using σRF.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4374281&req=5

Fig3: A comparison of TM-scores and lDDT-scores of 3D models generated by Modeller usingσRF andσModeller from those usingσnative. The TM-score results are shown in panel A and B, and the lDDT-score results are shown in panel C and D. For all plots, X-axes represent the quality measure differences between models obtained by σModeller and σnative. Y-axes represent the differences between models obtained by σRF and σnative. The green lines represent the y=x line, which corresponds to the identical model quality. The number of dots over the green line corresponds to the targets that are improved by using σRF.

Mentions: We also performed the homology modeling of benchmark targets using the original Modeller package to identify whether predicting better σ value is useful without using ModellerCSA (Table 5 and Additional file 3 and 4). The results show that using σRF with Modeller significantly improves the quality of the best model. The TMmax values of 36 targets improved (Figure 3A). However, unlike the results of ModellerCSA, other measures, TMEmin, TMavg, lDDTEmin and lDDTavg values are showing no improvement (the middle and right panels of Figure 3). This difference may be attributed to the lack of extensive conformational sampling. ModellerCSA performs much more extensive conformational sampling than Modeller and always finds lower energy conformations. Thus the minimum energy conformations obtained by Modeller are likely to be remote from the true energy minimum, which makes TMEmin results less meaningful.Figure 3


Sigma-RF: prediction of the variability of spatial restraints in template-based modeling by random forest.

Lee J, Lee K, Joung I, Joo K, Brooks BR, Lee J - BMC Bioinformatics (2015)

A comparison of TM-scores and lDDT-scores of 3D models generated by Modeller usingσRF andσModeller from those usingσnative. The TM-score results are shown in panel A and B, and the lDDT-score results are shown in panel C and D. For all plots, X-axes represent the quality measure differences between models obtained by σModeller and σnative. Y-axes represent the differences between models obtained by σRF and σnative. The green lines represent the y=x line, which corresponds to the identical model quality. The number of dots over the green line corresponds to the targets that are improved by using σRF.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4374281&req=5

Fig3: A comparison of TM-scores and lDDT-scores of 3D models generated by Modeller usingσRF andσModeller from those usingσnative. The TM-score results are shown in panel A and B, and the lDDT-score results are shown in panel C and D. For all plots, X-axes represent the quality measure differences between models obtained by σModeller and σnative. Y-axes represent the differences between models obtained by σRF and σnative. The green lines represent the y=x line, which corresponds to the identical model quality. The number of dots over the green line corresponds to the targets that are improved by using σRF.
Mentions: We also performed the homology modeling of benchmark targets using the original Modeller package to identify whether predicting better σ value is useful without using ModellerCSA (Table 5 and Additional file 3 and 4). The results show that using σRF with Modeller significantly improves the quality of the best model. The TMmax values of 36 targets improved (Figure 3A). However, unlike the results of ModellerCSA, other measures, TMEmin, TMavg, lDDTEmin and lDDTavg values are showing no improvement (the middle and right panels of Figure 3). This difference may be attributed to the lack of extensive conformational sampling. ModellerCSA performs much more extensive conformational sampling than Modeller and always finds lower energy conformations. Thus the minimum energy conformations obtained by Modeller are likely to be remote from the true energy minimum, which makes TMEmin results less meaningful.Figure 3

Bottom Line: The benchmark results on 22 CASP9 targets show that the variability values from Sigma-RF are of higher correlations with the true distance deviation than those from Modeller.We assessed the effect of new sigma values by performing the single-domain homology modeling of 22 CASP9 targets and 24 CASP10 targets.For most of the targets tested, we could obtain more accurate 3D models from the identical alignments by using the Sigma-RF results than by using Modeller ones.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, 5635 Fishers Ln, Bethesda, 20852, USA. juyong.lee@nih.gov.

ABSTRACT

Background: In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaussian distribution, the variability of a spatial restraint, is closely related to the reliability of the restraint information extracted from a template, and it should be accurately estimated for successful template-based protein structure modeling.

Results: To predict the variability of the spatial restraints in template-based modeling, we have devised a prediction model, Sigma-RF, by using the random forest (RF) algorithm. The benchmark results on 22 CASP9 targets show that the variability values from Sigma-RF are of higher correlations with the true distance deviation than those from Modeller. We assessed the effect of new sigma values by performing the single-domain homology modeling of 22 CASP9 targets and 24 CASP10 targets. For most of the targets tested, we could obtain more accurate 3D models from the identical alignments by using the Sigma-RF results than by using Modeller ones.

Conclusions: We find that the average alignment quality of residues located between and at two aligned residues, quasi-local information, is the most contributing factor, by investigating the importance of input features used in the RF machine learning. This average alignment quality is shown to be more important than the previously identified quantity of a local information: the product of alignment qualities at two aligned residues.

Show MeSH