Limits...
Prediction of interacting proteins from homology-modeled complex structures using sequence and structure scores

View Article: PubMed Central - PubMed

ABSTRACT

Protein-protein interactions support most biological processes, and it is important to find specifically interacting partner proteins among homologous proteins in order to elucidate cellular functions such as signal transduction systems. Various high-throughput experimental methods for identifying these interactions have been invented, and used to generate a huge amount of data. Because these experiments have been applied to only a few organisms, and their accuracy is believed to be limited, it would be valuable to develop computational methods for predicting protein-protein interactions from their amino acid sequences or tertiary structural information. In this study, we describe a prediction method of interacting proteins based on homology-modeled complex structures. We employed the statistical residue-residue contact energy used in a previous study, and two types of new scores, simple electrostatic energy and sequence similarity between target sequences and template structures. The validity of each protein-protein complex model was measured using their single and combined scores. We applied our method to all the protein heterodimers of Saccharomyces cerevisiae. To evaluate the prediction performance of our method, we prepared two types of protein-protein interaction dataset: a complete dataset and high confidence dataset. The complete dataset (10,325 protein dimer models) contains all the yeast protein heterodimers whose complex structures can be modeled. Among them, pairs registered in the DIP database are defined as interacting pairs, and those not registered are defined as non-interacting protein pairs. The high confidence dataset (3,219 protein dimer models) is a more reliable subset of the complete dataset extracted using the criteria of the common subcellular localization. Both datasets show that sequence similarity has a much higher discrimination power than the other structure-based scores, but that the inclusion of contact energy results in significant improvement over predictions using sequence similarity alone. These results suggest that the sequence similarity is indispensable for the prediction, whereas structure scores can play supporting roles.

No MeSH data available.


Distributions of Z-score of electrostatic energy calculated for the protein pairs included in the complete dataset.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC5036659&req=5

f4-3_13: Distributions of Z-score of electrostatic energy calculated for the protein pairs included in the complete dataset.

Mentions: The Z-score distributions of three features (contact energy, electrostatic energy and sequence similarity between target and template) of the complete dataset are shown in Figure 3–5. As we assume that similar random surface amino acid pairs are generated in Z-score calculations of both contact and electrostatic energy, these Z-scores are comparable to each other. Z-scores for the contact energy ranged lower, and were distributed more widely, than Z-score for the electrostatic energy. The averages of Z-score of the contact energy for interacting and non-interacting protein pairs were −4.6 and −2.2, respectively, whereas those for the electrostatic energy were −0.77 and −0.15. The variances of the contact energies are 7.6 (interacting) and 4.6 (non-interacting) and those of the electrostatic energies are 0.99 (interacting) and 0.67 (non-interacting). As the differences of the averages between the interacting and non-interacting interacting protein pairs were 2.4 (contact energy) and 0.62 (electrostatic energy), the discrimination power of the contact energy seemed to be better than that of the electrostatic energy. The distribution of sequence similarities for the interacting protein pairs was not bell-shaped (as was the case for the contact and electrostatic energies), and was skewed toward the left. The distribution of the interacting pairs was broader than that of the non-interacting pairs; the variances of the Z-score distribution of sequence similarity are 394.2 (interacting) and 20.7 (non-interacting). The high confidence dataset also yields similar distributions (data not shown).


Prediction of interacting proteins from homology-modeled complex structures using sequence and structure scores
Distributions of Z-score of electrostatic energy calculated for the protein pairs included in the complete dataset.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC5036659&req=5

f4-3_13: Distributions of Z-score of electrostatic energy calculated for the protein pairs included in the complete dataset.
Mentions: The Z-score distributions of three features (contact energy, electrostatic energy and sequence similarity between target and template) of the complete dataset are shown in Figure 3–5. As we assume that similar random surface amino acid pairs are generated in Z-score calculations of both contact and electrostatic energy, these Z-scores are comparable to each other. Z-scores for the contact energy ranged lower, and were distributed more widely, than Z-score for the electrostatic energy. The averages of Z-score of the contact energy for interacting and non-interacting protein pairs were −4.6 and −2.2, respectively, whereas those for the electrostatic energy were −0.77 and −0.15. The variances of the contact energies are 7.6 (interacting) and 4.6 (non-interacting) and those of the electrostatic energies are 0.99 (interacting) and 0.67 (non-interacting). As the differences of the averages between the interacting and non-interacting interacting protein pairs were 2.4 (contact energy) and 0.62 (electrostatic energy), the discrimination power of the contact energy seemed to be better than that of the electrostatic energy. The distribution of sequence similarities for the interacting protein pairs was not bell-shaped (as was the case for the contact and electrostatic energies), and was skewed toward the left. The distribution of the interacting pairs was broader than that of the non-interacting pairs; the variances of the Z-score distribution of sequence similarity are 394.2 (interacting) and 20.7 (non-interacting). The high confidence dataset also yields similar distributions (data not shown).

View Article: PubMed Central - PubMed

ABSTRACT

Protein-protein interactions support most biological processes, and it is important to find specifically interacting partner proteins among homologous proteins in order to elucidate cellular functions such as signal transduction systems. Various high-throughput experimental methods for identifying these interactions have been invented, and used to generate a huge amount of data. Because these experiments have been applied to only a few organisms, and their accuracy is believed to be limited, it would be valuable to develop computational methods for predicting protein-protein interactions from their amino acid sequences or tertiary structural information. In this study, we describe a prediction method of interacting proteins based on homology-modeled complex structures. We employed the statistical residue-residue contact energy used in a previous study, and two types of new scores, simple electrostatic energy and sequence similarity between target sequences and template structures. The validity of each protein-protein complex model was measured using their single and combined scores. We applied our method to all the protein heterodimers of Saccharomyces cerevisiae. To evaluate the prediction performance of our method, we prepared two types of protein-protein interaction dataset: a complete dataset and high confidence dataset. The complete dataset (10,325 protein dimer models) contains all the yeast protein heterodimers whose complex structures can be modeled. Among them, pairs registered in the DIP database are defined as interacting pairs, and those not registered are defined as non-interacting protein pairs. The high confidence dataset (3,219 protein dimer models) is a more reliable subset of the complete dataset extracted using the criteria of the common subcellular localization. Both datasets show that sequence similarity has a much higher discrimination power than the other structure-based scores, but that the inclusion of contact energy results in significant improvement over predictions using sequence similarity alone. These results suggest that the sequence similarity is indispensable for the prediction, whereas structure scores can play supporting roles.

No MeSH data available.