Limits...
Combining physicochemical and evolutionary information for protein contact prediction.

Schneider M, Brock O - PLoS ONE (2014)

Bottom Line: The resulting contact predictions are highly accurate.As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present.We show that the predicted contacts help to improve ab initio structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universit├Ąt Berlin, Berlin, Germany.

ABSTRACT
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

Show MeSH
Prediction performance overview for the CASP10 and CASP10hard data sets. The figure shows the long-range contact prediction performance of the top scoring L/5 contacts. Different methods are shown as color coded violin plots. The lower and upper end of the black vertical bars in each violin denote the accuracy at the 25 and 75 percentile, respectively. White horizontal bars indicate the median, red horizontal bars the mean accuracy. The distribution of the prediction accuracies for individual proteins is indicated by the shape of the violin.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4206277&req=5

pone-0108438-g003: Prediction performance overview for the CASP10 and CASP10hard data sets. The figure shows the long-range contact prediction performance of the top scoring L/5 contacts. Different methods are shown as color coded violin plots. The lower and upper end of the black vertical bars in each violin denote the accuracy at the 25 and 75 percentile, respectively. White horizontal bars indicate the median, red horizontal bars the mean accuracy. The distribution of the prediction accuracies for individual proteins is indicated by the shape of the violin.

Mentions: Figure 3 summarizes the long-range contact prediction performance on the CASP10 data set. Detailed information about the medium- and long-range performance on different cutoffs is given in Tables S10 and S11 in File S1. EPC-map reaches a mean accuracy of 0.492, the second-best method (GREMLIN) reaches a mean accuracy of 0.448, followed by PhyCMAP with 0.325 mean accuracy. MULTICOM-construct(DNCON), the best performing method of the CASP10 experiment [23], has a mean accuracy of 0.285 on the CASP10 dataset. Thus, EPC-map is 4.4% more accurate than GREMLIN and 20.7% more accurate than MULTICOM-construct(DNCON) on the entire CASP10 dataset.


Combining physicochemical and evolutionary information for protein contact prediction.

Schneider M, Brock O - PLoS ONE (2014)

Prediction performance overview for the CASP10 and CASP10hard data sets. The figure shows the long-range contact prediction performance of the top scoring L/5 contacts. Different methods are shown as color coded violin plots. The lower and upper end of the black vertical bars in each violin denote the accuracy at the 25 and 75 percentile, respectively. White horizontal bars indicate the median, red horizontal bars the mean accuracy. The distribution of the prediction accuracies for individual proteins is indicated by the shape of the violin.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4206277&req=5

pone-0108438-g003: Prediction performance overview for the CASP10 and CASP10hard data sets. The figure shows the long-range contact prediction performance of the top scoring L/5 contacts. Different methods are shown as color coded violin plots. The lower and upper end of the black vertical bars in each violin denote the accuracy at the 25 and 75 percentile, respectively. White horizontal bars indicate the median, red horizontal bars the mean accuracy. The distribution of the prediction accuracies for individual proteins is indicated by the shape of the violin.
Mentions: Figure 3 summarizes the long-range contact prediction performance on the CASP10 data set. Detailed information about the medium- and long-range performance on different cutoffs is given in Tables S10 and S11 in File S1. EPC-map reaches a mean accuracy of 0.492, the second-best method (GREMLIN) reaches a mean accuracy of 0.448, followed by PhyCMAP with 0.325 mean accuracy. MULTICOM-construct(DNCON), the best performing method of the CASP10 experiment [23], has a mean accuracy of 0.285 on the CASP10 dataset. Thus, EPC-map is 4.4% more accurate than GREMLIN and 20.7% more accurate than MULTICOM-construct(DNCON) on the entire CASP10 dataset.

Bottom Line: The resulting contact predictions are highly accurate.As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present.We show that the predicted contacts help to improve ab initio structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universit├Ąt Berlin, Berlin, Germany.

ABSTRACT
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

Show MeSH