Limits...
Combining physicochemical and evolutionary information for protein contact prediction.

Schneider M, Brock O - PLoS ONE (2014)

Bottom Line: The resulting contact predictions are highly accurate.As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present.We show that the predicted contacts help to improve ab initio structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany.

ABSTRACT
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

Show MeSH
Definition of graphs used to model the neighborhood of the contacting residues i and j: Nodes represent residues (circles), edges represent contacts (solid black lines).A: The neighborhood graph  for residue  contains all residues in contact with residues , and  (dark grey). B: The neighborhood graph . C: The shared neighborhood graph  for the contact between residues  and  is defined by the intersection of  and . Residues that belong to  are shown in blue. Shared neighborhood graphs capture the local context of the shared neighborhood of the contacting residues. D: The immediate neighborhood graph  is defined by all residues that are in contact to  or . Residues that belong to  are shown in blue. Immediate neighborhood graphs capture the direct neighborhood of the contacting residues.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4206277&req=5

pone-0108438-g002: Definition of graphs used to model the neighborhood of the contacting residues i and j: Nodes represent residues (circles), edges represent contacts (solid black lines).A: The neighborhood graph for residue contains all residues in contact with residues , and (dark grey). B: The neighborhood graph . C: The shared neighborhood graph for the contact between residues and is defined by the intersection of and . Residues that belong to are shown in blue. Shared neighborhood graphs capture the local context of the shared neighborhood of the contacting residues. D: The immediate neighborhood graph is defined by all residues that are in contact to or . Residues that belong to are shown in blue. Immediate neighborhood graphs capture the direct neighborhood of the contacting residues.

Mentions: To characterize the properties of a contact's neighborhood, we use undirected graphs (refer to Figure 2 for the remainder of this section). In these graphs, nodes correspond to residues and edges connect contacting residues. Nodes and edges are labeled with physicochemical, structural and evolutionary characteristics; these labels are described in the supporting information (Text S1 and Tables S1–S2 in File S1). First, we consider the neighborhood of individual residues. The neighborhood of residue is defined as all residues up to two positions away in sequence, i.e. residues , as well as all residues in contact with those, according to the definition of a contact given in Methods. For α-helices, the , , residues are used instead to include the residues with the same facing towards the contact on subsequent helix turns. We capture this notion of neighborhood of residue in a neighborhood graph (Figures 2A and B).


Combining physicochemical and evolutionary information for protein contact prediction.

Schneider M, Brock O - PLoS ONE (2014)

Definition of graphs used to model the neighborhood of the contacting residues i and j: Nodes represent residues (circles), edges represent contacts (solid black lines).A: The neighborhood graph  for residue  contains all residues in contact with residues , and  (dark grey). B: The neighborhood graph . C: The shared neighborhood graph  for the contact between residues  and  is defined by the intersection of  and . Residues that belong to  are shown in blue. Shared neighborhood graphs capture the local context of the shared neighborhood of the contacting residues. D: The immediate neighborhood graph  is defined by all residues that are in contact to  or . Residues that belong to  are shown in blue. Immediate neighborhood graphs capture the direct neighborhood of the contacting residues.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4206277&req=5

pone-0108438-g002: Definition of graphs used to model the neighborhood of the contacting residues i and j: Nodes represent residues (circles), edges represent contacts (solid black lines).A: The neighborhood graph for residue contains all residues in contact with residues , and (dark grey). B: The neighborhood graph . C: The shared neighborhood graph for the contact between residues and is defined by the intersection of and . Residues that belong to are shown in blue. Shared neighborhood graphs capture the local context of the shared neighborhood of the contacting residues. D: The immediate neighborhood graph is defined by all residues that are in contact to or . Residues that belong to are shown in blue. Immediate neighborhood graphs capture the direct neighborhood of the contacting residues.
Mentions: To characterize the properties of a contact's neighborhood, we use undirected graphs (refer to Figure 2 for the remainder of this section). In these graphs, nodes correspond to residues and edges connect contacting residues. Nodes and edges are labeled with physicochemical, structural and evolutionary characteristics; these labels are described in the supporting information (Text S1 and Tables S1–S2 in File S1). First, we consider the neighborhood of individual residues. The neighborhood of residue is defined as all residues up to two positions away in sequence, i.e. residues , as well as all residues in contact with those, according to the definition of a contact given in Methods. For α-helices, the , , residues are used instead to include the residues with the same facing towards the contact on subsequent helix turns. We capture this notion of neighborhood of residue in a neighborhood graph (Figures 2A and B).

Bottom Line: The resulting contact predictions are highly accurate.As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present.We show that the predicted contacts help to improve ab initio structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany.

ABSTRACT
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

Show MeSH