Limits...
Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.

Scheeff ED, Bourne PE - BMC Bioinformatics (2006)

Bottom Line: We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions.In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments.Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.

View Article: PubMed Central - HTML - PubMed

Affiliation: San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093-0537, USA. scheeff@salk.edu

ABSTRACT

Background: One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.

Results: We explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.

Conclusion: When attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used.

Show MeSH

Related in: MedlinePlus

Comparison of HMMs built using an older protein sequence database for iterative construction ("old db") with those built using a current sequence database ("new db"), presented as a coverage vs. error plot. Results are colored similarly for corresponding model types, with the results based on the older database in a lighter color. A different version of the HMMER software was also used for the two result sets; details of model types and construction are provided in the text. Iterative parameters used for construction of all models were from PS1 (Table 2).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1622756&req=5

Figure 5: Comparison of HMMs built using an older protein sequence database for iterative construction ("old db") with those built using a current sequence database ("new db"), presented as a coverage vs. error plot. Results are colored similarly for corresponding model types, with the results based on the older database in a lighter color. A different version of the HMMER software was also used for the two result sets; details of model types and construction are provided in the text. Iterative parameters used for construction of all models were from PS1 (Table 2).

Mentions: Before collection of the final results (presented in Figures 2, 3, 4), we updated the version of HMMER used to build our models, and the version of the sequence database used to collect homologs to our test sequences in our iterative protocol (see methods for details). Apart from providing final results based on the most current data, this allowed us to determine the robustness of our observations in the face of the inevitable growth of sequence databases (and changes to software packages). Direct comparison of the results for PS1 prior to these changes (from Figure 1) and after these changes (from Figure 2) reveal that, while the performance of all methods improved markedly, the essential trends remained in place (Figure 5). These results suggest that the benefits provided by SLAHMMs will continue to be realized, even as sequence databases further expand.


Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.

Scheeff ED, Bourne PE - BMC Bioinformatics (2006)

Comparison of HMMs built using an older protein sequence database for iterative construction ("old db") with those built using a current sequence database ("new db"), presented as a coverage vs. error plot. Results are colored similarly for corresponding model types, with the results based on the older database in a lighter color. A different version of the HMMER software was also used for the two result sets; details of model types and construction are provided in the text. Iterative parameters used for construction of all models were from PS1 (Table 2).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1622756&req=5

Figure 5: Comparison of HMMs built using an older protein sequence database for iterative construction ("old db") with those built using a current sequence database ("new db"), presented as a coverage vs. error plot. Results are colored similarly for corresponding model types, with the results based on the older database in a lighter color. A different version of the HMMER software was also used for the two result sets; details of model types and construction are provided in the text. Iterative parameters used for construction of all models were from PS1 (Table 2).
Mentions: Before collection of the final results (presented in Figures 2, 3, 4), we updated the version of HMMER used to build our models, and the version of the sequence database used to collect homologs to our test sequences in our iterative protocol (see methods for details). Apart from providing final results based on the most current data, this allowed us to determine the robustness of our observations in the face of the inevitable growth of sequence databases (and changes to software packages). Direct comparison of the results for PS1 prior to these changes (from Figure 1) and after these changes (from Figure 2) reveal that, while the performance of all methods improved markedly, the essential trends remained in place (Figure 5). These results suggest that the benefits provided by SLAHMMs will continue to be realized, even as sequence databases further expand.

Bottom Line: We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions.In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments.Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.

View Article: PubMed Central - HTML - PubMed

Affiliation: San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093-0537, USA. scheeff@salk.edu

ABSTRACT

Background: One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.

Results: We explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.

Conclusion: When attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used.

Show MeSH
Related in: MedlinePlus