Limits...
Combining physicochemical and evolutionary information for protein contact prediction.

Schneider M, Brock O - PLoS ONE (2014)

Bottom Line: The resulting contact predictions are highly accurate.As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present.We show that the predicted contacts help to improve ab initio structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universit├Ąt Berlin, Berlin, Germany.

ABSTRACT
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

Show MeSH
Prediction performance for proteins with increasing sequence alignment depth.Results are shown for all proteins pooled from the CASP9-10_hard, EPC-map_test, D329 and SVMCON_test data sets. Different methods are shown as color coded violin plots. The lower and upper end of the black vertical bars in each violin denote the accuracy at the 25 and 75 percentile, respectively. White horizontal bars indicate the median, red horizontal bars the mean accuracy. The distribution of the prediction accuracies for individual proteins is indicated by the shape of the violin. EPC-map is consistently more accurate than the other tested methods, regardless how many sequences are available.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4206277&req=5

pone-0108438-g006: Prediction performance for proteins with increasing sequence alignment depth.Results are shown for all proteins pooled from the CASP9-10_hard, EPC-map_test, D329 and SVMCON_test data sets. Different methods are shown as color coded violin plots. The lower and upper end of the black vertical bars in each violin denote the accuracy at the 25 and 75 percentile, respectively. White horizontal bars indicate the median, red horizontal bars the mean accuracy. The distribution of the prediction accuracies for individual proteins is indicated by the shape of the violin. EPC-map is consistently more accurate than the other tested methods, regardless how many sequences are available.

Mentions: Figure 6 shows the prediction performance with increasing alignment depth. The performance of all methods increases with the amount of available sequences. Evolutionary methods (PSICOV, GREMLIN), perform poorly in cases with less than sequences, while being clearly superior to decoy-based (Counting) and machine-learning based methods (NNcon, PhyCMAP) in cases with more than sequences. On the other hand, decoy-based and machine-learning based methods perform robustly in the (] and (] intervals, but do not benefit as much from more than sequences as evolutionary methods. EPC-map improves prediction accuracy over the second best method, regardless how many sequences are available. This makes EPC-map a versatile approach to contact prediction that performs robustly for proteins with low and high numbers of homologous sequences.


Combining physicochemical and evolutionary information for protein contact prediction.

Schneider M, Brock O - PLoS ONE (2014)

Prediction performance for proteins with increasing sequence alignment depth.Results are shown for all proteins pooled from the CASP9-10_hard, EPC-map_test, D329 and SVMCON_test data sets. Different methods are shown as color coded violin plots. The lower and upper end of the black vertical bars in each violin denote the accuracy at the 25 and 75 percentile, respectively. White horizontal bars indicate the median, red horizontal bars the mean accuracy. The distribution of the prediction accuracies for individual proteins is indicated by the shape of the violin. EPC-map is consistently more accurate than the other tested methods, regardless how many sequences are available.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4206277&req=5

pone-0108438-g006: Prediction performance for proteins with increasing sequence alignment depth.Results are shown for all proteins pooled from the CASP9-10_hard, EPC-map_test, D329 and SVMCON_test data sets. Different methods are shown as color coded violin plots. The lower and upper end of the black vertical bars in each violin denote the accuracy at the 25 and 75 percentile, respectively. White horizontal bars indicate the median, red horizontal bars the mean accuracy. The distribution of the prediction accuracies for individual proteins is indicated by the shape of the violin. EPC-map is consistently more accurate than the other tested methods, regardless how many sequences are available.
Mentions: Figure 6 shows the prediction performance with increasing alignment depth. The performance of all methods increases with the amount of available sequences. Evolutionary methods (PSICOV, GREMLIN), perform poorly in cases with less than sequences, while being clearly superior to decoy-based (Counting) and machine-learning based methods (NNcon, PhyCMAP) in cases with more than sequences. On the other hand, decoy-based and machine-learning based methods perform robustly in the (] and (] intervals, but do not benefit as much from more than sequences as evolutionary methods. EPC-map improves prediction accuracy over the second best method, regardless how many sequences are available. This makes EPC-map a versatile approach to contact prediction that performs robustly for proteins with low and high numbers of homologous sequences.

Bottom Line: The resulting contact predictions are highly accurate.As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present.We show that the predicted contacts help to improve ab initio structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universit├Ąt Berlin, Berlin, Germany.

ABSTRACT
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

Show MeSH