Limits...
Prediction of interacting proteins from homology-modeled complex structures using sequence and structure scores

View Article: PubMed Central - PubMed

ABSTRACT

Protein-protein interactions support most biological processes, and it is important to find specifically interacting partner proteins among homologous proteins in order to elucidate cellular functions such as signal transduction systems. Various high-throughput experimental methods for identifying these interactions have been invented, and used to generate a huge amount of data. Because these experiments have been applied to only a few organisms, and their accuracy is believed to be limited, it would be valuable to develop computational methods for predicting protein-protein interactions from their amino acid sequences or tertiary structural information. In this study, we describe a prediction method of interacting proteins based on homology-modeled complex structures. We employed the statistical residue-residue contact energy used in a previous study, and two types of new scores, simple electrostatic energy and sequence similarity between target sequences and template structures. The validity of each protein-protein complex model was measured using their single and combined scores. We applied our method to all the protein heterodimers of Saccharomyces cerevisiae. To evaluate the prediction performance of our method, we prepared two types of protein-protein interaction dataset: a complete dataset and high confidence dataset. The complete dataset (10,325 protein dimer models) contains all the yeast protein heterodimers whose complex structures can be modeled. Among them, pairs registered in the DIP database are defined as interacting pairs, and those not registered are defined as non-interacting protein pairs. The high confidence dataset (3,219 protein dimer models) is a more reliable subset of the complete dataset extracted using the criteria of the common subcellular localization. Both datasets show that sequence similarity has a much higher discrimination power than the other structure-based scores, but that the inclusion of contact energy results in significant improvement over predictions using sequence similarity alone. These results suggest that the sequence similarity is indispensable for the prediction, whereas structure scores can play supporting roles.

No MeSH data available.


Recall-precision plots for discrimination between interacting and non-interacting protein pairs using single and combined scores in the high confidence dataset. Abbreviations as in Figure 6.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC5036659&req=5

f7-3_13: Recall-precision plots for discrimination between interacting and non-interacting protein pairs using single and combined scores in the high confidence dataset. Abbreviations as in Figure 6.

Mentions: To evaluate the discrimination more strictly, we generated recall-precision plots for all three Z-scores, both individually and in combination. To generate combined scores, two or three Z-scores were added without any weights. Recall-precision plots are shown in Figure 6 (complete dataset) and Figure 7 (high confidence dataset); maximum F-measures of the recall-precision plot are summarized in Figure 8 (complete dataset) and Figure 9 (high confidence dataset). We also tested various weights such as Fischer’s discriminant method, but performance was not significantly improved. The basic characteristics of plots using the complete and high confidence dataset are similar, except that precision values and maximum F-measure of the high confidence dataset were generally higher than those of the complete dataset, probably because the number of non-interacting protein pairs (2,839 pairs) in the high confidence dataset was about one forth of that in the complete set (9,908 pairs). Similar biased results using co-localization datasets are reported in previous studies36,37.


Prediction of interacting proteins from homology-modeled complex structures using sequence and structure scores
Recall-precision plots for discrimination between interacting and non-interacting protein pairs using single and combined scores in the high confidence dataset. Abbreviations as in Figure 6.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC5036659&req=5

f7-3_13: Recall-precision plots for discrimination between interacting and non-interacting protein pairs using single and combined scores in the high confidence dataset. Abbreviations as in Figure 6.
Mentions: To evaluate the discrimination more strictly, we generated recall-precision plots for all three Z-scores, both individually and in combination. To generate combined scores, two or three Z-scores were added without any weights. Recall-precision plots are shown in Figure 6 (complete dataset) and Figure 7 (high confidence dataset); maximum F-measures of the recall-precision plot are summarized in Figure 8 (complete dataset) and Figure 9 (high confidence dataset). We also tested various weights such as Fischer’s discriminant method, but performance was not significantly improved. The basic characteristics of plots using the complete and high confidence dataset are similar, except that precision values and maximum F-measure of the high confidence dataset were generally higher than those of the complete dataset, probably because the number of non-interacting protein pairs (2,839 pairs) in the high confidence dataset was about one forth of that in the complete set (9,908 pairs). Similar biased results using co-localization datasets are reported in previous studies36,37.

View Article: PubMed Central - PubMed

ABSTRACT

Protein-protein interactions support most biological processes, and it is important to find specifically interacting partner proteins among homologous proteins in order to elucidate cellular functions such as signal transduction systems. Various high-throughput experimental methods for identifying these interactions have been invented, and used to generate a huge amount of data. Because these experiments have been applied to only a few organisms, and their accuracy is believed to be limited, it would be valuable to develop computational methods for predicting protein-protein interactions from their amino acid sequences or tertiary structural information. In this study, we describe a prediction method of interacting proteins based on homology-modeled complex structures. We employed the statistical residue-residue contact energy used in a previous study, and two types of new scores, simple electrostatic energy and sequence similarity between target sequences and template structures. The validity of each protein-protein complex model was measured using their single and combined scores. We applied our method to all the protein heterodimers of Saccharomyces cerevisiae. To evaluate the prediction performance of our method, we prepared two types of protein-protein interaction dataset: a complete dataset and high confidence dataset. The complete dataset (10,325 protein dimer models) contains all the yeast protein heterodimers whose complex structures can be modeled. Among them, pairs registered in the DIP database are defined as interacting pairs, and those not registered are defined as non-interacting protein pairs. The high confidence dataset (3,219 protein dimer models) is a more reliable subset of the complete dataset extracted using the criteria of the common subcellular localization. Both datasets show that sequence similarity has a much higher discrimination power than the other structure-based scores, but that the inclusion of contact energy results in significant improvement over predictions using sequence similarity alone. These results suggest that the sequence similarity is indispensable for the prediction, whereas structure scores can play supporting roles.

No MeSH data available.