Limits...
Combining physicochemical and evolutionary information for protein contact prediction.

Schneider M, Brock O - PLoS ONE (2014)

Bottom Line: The resulting contact predictions are highly accurate.As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present.We show that the predicted contacts help to improve ab initio structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universit├Ąt Berlin, Berlin, Germany.

ABSTRACT
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

Show MeSH
Alignment depth composition of the CASP9-10_hard, EPC-map_test, D329 and SVMCON_test data sets.Proteins are grouped into bins based on their number of sequences in the alignment. Colors correspond to a particular bin, from dark blue (few sequences) to red (many sequences). Data sets are sorted from difficult (CASP9-10_hard) to easy (SVMCON_test). The last panel shows the pooled results.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4206277&req=5

pone-0108438-g005: Alignment depth composition of the CASP9-10_hard, EPC-map_test, D329 and SVMCON_test data sets.Proteins are grouped into bins based on their number of sequences in the alignment. Colors correspond to a particular bin, from dark blue (few sequences) to red (many sequences). Data sets are sorted from difficult (CASP9-10_hard) to easy (SVMCON_test). The last panel shows the pooled results.

Mentions: We structure our further discussion of prediction performance based on data set difficulty, as judged by the distribution of available sequences in the MSA, i.e. alignment depths (Figure 5). For the most difficult data set, CASP9-10_hard, EPC-map (mean accuracy 0.322) improves the mean prediction accuracy by 9.7% over the next best method (see Figure 4). Interestingly, neither the best structure-based method (Counting) nor the best method that uses evolutionary information (GREMLIN) delivers good results for this data set (mean accuracies of 0.173 and 0.193, respectively). However, the combination approach taken by EPC-map unlocks the potential of both, evolutionary and physicochemical information methods.


Combining physicochemical and evolutionary information for protein contact prediction.

Schneider M, Brock O - PLoS ONE (2014)

Alignment depth composition of the CASP9-10_hard, EPC-map_test, D329 and SVMCON_test data sets.Proteins are grouped into bins based on their number of sequences in the alignment. Colors correspond to a particular bin, from dark blue (few sequences) to red (many sequences). Data sets are sorted from difficult (CASP9-10_hard) to easy (SVMCON_test). The last panel shows the pooled results.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4206277&req=5

pone-0108438-g005: Alignment depth composition of the CASP9-10_hard, EPC-map_test, D329 and SVMCON_test data sets.Proteins are grouped into bins based on their number of sequences in the alignment. Colors correspond to a particular bin, from dark blue (few sequences) to red (many sequences). Data sets are sorted from difficult (CASP9-10_hard) to easy (SVMCON_test). The last panel shows the pooled results.
Mentions: We structure our further discussion of prediction performance based on data set difficulty, as judged by the distribution of available sequences in the MSA, i.e. alignment depths (Figure 5). For the most difficult data set, CASP9-10_hard, EPC-map (mean accuracy 0.322) improves the mean prediction accuracy by 9.7% over the next best method (see Figure 4). Interestingly, neither the best structure-based method (Counting) nor the best method that uses evolutionary information (GREMLIN) delivers good results for this data set (mean accuracies of 0.173 and 0.193, respectively). However, the combination approach taken by EPC-map unlocks the potential of both, evolutionary and physicochemical information methods.

Bottom Line: The resulting contact predictions are highly accurate.As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present.We show that the predicted contacts help to improve ab initio structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universit├Ąt Berlin, Berlin, Germany.

ABSTRACT
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

Show MeSH