Limits...
Prediction of interacting proteins from homology-modeled complex structures using sequence and structure scores

View Article: PubMed Central - PubMed

ABSTRACT

Protein-protein interactions support most biological processes, and it is important to find specifically interacting partner proteins among homologous proteins in order to elucidate cellular functions such as signal transduction systems. Various high-throughput experimental methods for identifying these interactions have been invented, and used to generate a huge amount of data. Because these experiments have been applied to only a few organisms, and their accuracy is believed to be limited, it would be valuable to develop computational methods for predicting protein-protein interactions from their amino acid sequences or tertiary structural information. In this study, we describe a prediction method of interacting proteins based on homology-modeled complex structures. We employed the statistical residue-residue contact energy used in a previous study, and two types of new scores, simple electrostatic energy and sequence similarity between target sequences and template structures. The validity of each protein-protein complex model was measured using their single and combined scores. We applied our method to all the protein heterodimers of Saccharomyces cerevisiae. To evaluate the prediction performance of our method, we prepared two types of protein-protein interaction dataset: a complete dataset and high confidence dataset. The complete dataset (10,325 protein dimer models) contains all the yeast protein heterodimers whose complex structures can be modeled. Among them, pairs registered in the DIP database are defined as interacting pairs, and those not registered are defined as non-interacting protein pairs. The high confidence dataset (3,219 protein dimer models) is a more reliable subset of the complete dataset extracted using the criteria of the common subcellular localization. Both datasets show that sequence similarity has a much higher discrimination power than the other structure-based scores, but that the inclusion of contact energy results in significant improvement over predictions using sequence similarity alone. These results suggest that the sequence similarity is indispensable for the prediction, whereas structure scores can play supporting roles.

No MeSH data available.


The protein-protein interaction network of the interacting and non-interacting protein pairs included in the complete dataset. The graph was visualized by Cytoscape50. The nodes correspond to the target proteins; edges correspond to interactions. The interacting protein pairs are shown in red, the non-interacting ones in blue. The proteins including the domains of protein kinase catalytic subunit, WD40-repeat, G proteins, canonical RBD, ankyrin repeat, cyclin are colored green, cyan, red, yellow, gray and black, respectively. If the target protein includes more than two domains from the six types of domains, the node is colored according to the domain nearest to the N-terminus. The SCOP, which is the structural classification database of proteins, was used for identifying the domains51.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC5036659&req=5

f2-3_13: The protein-protein interaction network of the interacting and non-interacting protein pairs included in the complete dataset. The graph was visualized by Cytoscape50. The nodes correspond to the target proteins; edges correspond to interactions. The interacting protein pairs are shown in red, the non-interacting ones in blue. The proteins including the domains of protein kinase catalytic subunit, WD40-repeat, G proteins, canonical RBD, ankyrin repeat, cyclin are colored green, cyan, red, yellow, gray and black, respectively. If the target protein includes more than two domains from the six types of domains, the node is colored according to the domain nearest to the N-terminus. The SCOP, which is the structural classification database of proteins, was used for identifying the domains51.

Mentions: In order to have a full picture of these protein pairs, we drew a network of protein-protein interaction in the complete dataset (Fig. 2). In this network, nodes correspond to target proteins and edges correspond to target protein pairs whose dimer structure can be modeled. There are 1,036 nodes and 10,325 edges in the network. As there are approximately twenty-four times more non-interacting than interacting pairs, most of the edges are colored in blue. The network was separated into 64 clusters by single linkage clustering. Our network was more sparse than those appearing in previous experimental studies3,6, probably because we more stringently restricted the protein pairs that are able to be homology-modeled.


Prediction of interacting proteins from homology-modeled complex structures using sequence and structure scores
The protein-protein interaction network of the interacting and non-interacting protein pairs included in the complete dataset. The graph was visualized by Cytoscape50. The nodes correspond to the target proteins; edges correspond to interactions. The interacting protein pairs are shown in red, the non-interacting ones in blue. The proteins including the domains of protein kinase catalytic subunit, WD40-repeat, G proteins, canonical RBD, ankyrin repeat, cyclin are colored green, cyan, red, yellow, gray and black, respectively. If the target protein includes more than two domains from the six types of domains, the node is colored according to the domain nearest to the N-terminus. The SCOP, which is the structural classification database of proteins, was used for identifying the domains51.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC5036659&req=5

f2-3_13: The protein-protein interaction network of the interacting and non-interacting protein pairs included in the complete dataset. The graph was visualized by Cytoscape50. The nodes correspond to the target proteins; edges correspond to interactions. The interacting protein pairs are shown in red, the non-interacting ones in blue. The proteins including the domains of protein kinase catalytic subunit, WD40-repeat, G proteins, canonical RBD, ankyrin repeat, cyclin are colored green, cyan, red, yellow, gray and black, respectively. If the target protein includes more than two domains from the six types of domains, the node is colored according to the domain nearest to the N-terminus. The SCOP, which is the structural classification database of proteins, was used for identifying the domains51.
Mentions: In order to have a full picture of these protein pairs, we drew a network of protein-protein interaction in the complete dataset (Fig. 2). In this network, nodes correspond to target proteins and edges correspond to target protein pairs whose dimer structure can be modeled. There are 1,036 nodes and 10,325 edges in the network. As there are approximately twenty-four times more non-interacting than interacting pairs, most of the edges are colored in blue. The network was separated into 64 clusters by single linkage clustering. Our network was more sparse than those appearing in previous experimental studies3,6, probably because we more stringently restricted the protein pairs that are able to be homology-modeled.

View Article: PubMed Central - PubMed

ABSTRACT

Protein-protein interactions support most biological processes, and it is important to find specifically interacting partner proteins among homologous proteins in order to elucidate cellular functions such as signal transduction systems. Various high-throughput experimental methods for identifying these interactions have been invented, and used to generate a huge amount of data. Because these experiments have been applied to only a few organisms, and their accuracy is believed to be limited, it would be valuable to develop computational methods for predicting protein-protein interactions from their amino acid sequences or tertiary structural information. In this study, we describe a prediction method of interacting proteins based on homology-modeled complex structures. We employed the statistical residue-residue contact energy used in a previous study, and two types of new scores, simple electrostatic energy and sequence similarity between target sequences and template structures. The validity of each protein-protein complex model was measured using their single and combined scores. We applied our method to all the protein heterodimers of Saccharomyces cerevisiae. To evaluate the prediction performance of our method, we prepared two types of protein-protein interaction dataset: a complete dataset and high confidence dataset. The complete dataset (10,325 protein dimer models) contains all the yeast protein heterodimers whose complex structures can be modeled. Among them, pairs registered in the DIP database are defined as interacting pairs, and those not registered are defined as non-interacting protein pairs. The high confidence dataset (3,219 protein dimer models) is a more reliable subset of the complete dataset extracted using the criteria of the common subcellular localization. Both datasets show that sequence similarity has a much higher discrimination power than the other structure-based scores, but that the inclusion of contact energy results in significant improvement over predictions using sequence similarity alone. These results suggest that the sequence similarity is indispensable for the prediction, whereas structure scores can play supporting roles.

No MeSH data available.