Limits...
Combining physicochemical and evolutionary information for protein contact prediction.

Schneider M, Brock O - PLoS ONE (2014)

Bottom Line: The resulting contact predictions are highly accurate.As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present.We show that the predicted contacts help to improve ab initio structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universit├Ąt Berlin, Berlin, Germany.

ABSTRACT
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

Show MeSH
Flowchart overview of EPC-map, combining evolutionary information (upper box) and physicochemical information (lower box).For evolutionary contact prediction, multiple-sequence alignments are constructed by searching the Uniprot20 database with HHblits. GREMLIN is then used to predict contacts from the alignments. For physicochemical contact prediction, decoys are generated with Rosetta. From each decoy, contact graphs are constructed and feature input vectors computed. An SVM ensemble predicts the contact probability from each feature vector. The SVM probability and occurrence statistics predict physicochemical contacts. Lastly, evolutionary and physicochemical contact prediction are combined to form the output of EPC-map.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4206277&req=5

pone-0108438-g001: Flowchart overview of EPC-map, combining evolutionary information (upper box) and physicochemical information (lower box).For evolutionary contact prediction, multiple-sequence alignments are constructed by searching the Uniprot20 database with HHblits. GREMLIN is then used to predict contacts from the alignments. For physicochemical contact prediction, decoys are generated with Rosetta. From each decoy, contact graphs are constructed and feature input vectors computed. An SVM ensemble predicts the contact probability from each feature vector. The SVM probability and occurrence statistics predict physicochemical contacts. Lastly, evolutionary and physicochemical contact prediction are combined to form the output of EPC-map.

Mentions: In this article, we introduce a novel contact prediction method, EPC-map, that predicts contacts using two sources of information: evolutionary information from multiple sequence alignments and information from physicochemical energy potentials (EPC-map stands for using Evolutionary and Physicochemical information to predict Contact maps). EPC-map relies on GREMLIN [10], an established method for sequence-based contact prediction, to leverage evolutionary information. To identify and leverage physicochemical information, we present a novel, machine-learning based classifier that uses a graph-based encoding of the structural context of contacts. This classifier distinguishes native from non-native contacts in ab initio decoys with unprecedented accuracy. A graphical outline of our method is shown in Figure 1.


Combining physicochemical and evolutionary information for protein contact prediction.

Schneider M, Brock O - PLoS ONE (2014)

Flowchart overview of EPC-map, combining evolutionary information (upper box) and physicochemical information (lower box).For evolutionary contact prediction, multiple-sequence alignments are constructed by searching the Uniprot20 database with HHblits. GREMLIN is then used to predict contacts from the alignments. For physicochemical contact prediction, decoys are generated with Rosetta. From each decoy, contact graphs are constructed and feature input vectors computed. An SVM ensemble predicts the contact probability from each feature vector. The SVM probability and occurrence statistics predict physicochemical contacts. Lastly, evolutionary and physicochemical contact prediction are combined to form the output of EPC-map.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4206277&req=5

pone-0108438-g001: Flowchart overview of EPC-map, combining evolutionary information (upper box) and physicochemical information (lower box).For evolutionary contact prediction, multiple-sequence alignments are constructed by searching the Uniprot20 database with HHblits. GREMLIN is then used to predict contacts from the alignments. For physicochemical contact prediction, decoys are generated with Rosetta. From each decoy, contact graphs are constructed and feature input vectors computed. An SVM ensemble predicts the contact probability from each feature vector. The SVM probability and occurrence statistics predict physicochemical contacts. Lastly, evolutionary and physicochemical contact prediction are combined to form the output of EPC-map.
Mentions: In this article, we introduce a novel contact prediction method, EPC-map, that predicts contacts using two sources of information: evolutionary information from multiple sequence alignments and information from physicochemical energy potentials (EPC-map stands for using Evolutionary and Physicochemical information to predict Contact maps). EPC-map relies on GREMLIN [10], an established method for sequence-based contact prediction, to leverage evolutionary information. To identify and leverage physicochemical information, we present a novel, machine-learning based classifier that uses a graph-based encoding of the structural context of contacts. This classifier distinguishes native from non-native contacts in ab initio decoys with unprecedented accuracy. A graphical outline of our method is shown in Figure 1.

Bottom Line: The resulting contact predictions are highly accurate.As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present.We show that the predicted contacts help to improve ab initio structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universit├Ąt Berlin, Berlin, Germany.

ABSTRACT
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

Show MeSH