Limits...
Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression.

De Bodt S, Proost S, Vandepoele K, Rouzé P, Van de Peer Y - BMC Genomics (2009)

Bottom Line: We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions.Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered.Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Technologiepark 927, B-9052 Gent, Belgium. stefanie.debodt@psb.vib-ugent.be

ABSTRACT

Background: Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome.

Results: In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization) and components (e.g. ARPs, actin-related proteins) exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively.

Conclusion: We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.

Show MeSH

Related in: MedlinePlus

Overlap between datasets of protein-protein interactions. Overlap between experimentally identified protein-protein interactions available in public databases (see methods) and predicted sets from this study and the Geisler-Lee et al. study [45]. Filtered interactions from this study are compared to interactions with a confidence value of two or higher from the Geisler-Lee et al. [45] study.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2719670&req=5

Figure 3: Overlap between datasets of protein-protein interactions. Overlap between experimentally identified protein-protein interactions available in public databases (see methods) and predicted sets from this study and the Geisler-Lee et al. study [45]. Filtered interactions from this study are compared to interactions with a confidence value of two or higher from the Geisler-Lee et al. [45] study.

Mentions: The protein-protein interactions detected in the filtered and predicted interactome were compared to experimentally shown and previously predicted protein-protein interactions. Fig. 3 and Additional file 8 depict the overlap between the different datasets. Overall, a small overlap is found between our filtered interactome and the experimentally shown interactions reported by the TAIR, MINT and IntAct databases (see Fig. 3; Additional file 8, panel A1). Similarly, a small overlap of the experimentally identified interactions with the previously predicted interactions of Geisler-Lee et al. [45] and Cui et al. [46] is observed (see Fig. 3; Additional file 8, panel A2). Similar to our approach, Geisler-Lee et al. [45] identified interologs using worm, fly, human and yeast as source organisms. In this study, a confidence value is calculated taking into account the number of times a protein-protein interaction is found as interolog and/or supported by genomic features such as co-expression based on the Pearson correlation coefficient and localization based on the Arabidopsis Subcellular Database (SUBA) [47]. The predicted protein-protein interaction dataset of Cui et al. [46] was constructed based on a Naive Bayesian Classifier. This method integrates different predictive data sources such as ortholog information, GO biological process, co-expression, gene fusion, gene neighborhood, phylogenetic profiles and domain architecture, to build a model to predict novel protein-protein interactions. A comparison of our filtered and predicted interactome with these two sets of previously predicted protein-protein interactions is shown in Additional file 8 (panel B1 and B2). Although a similar approach was taken by Geisler-Lee et al. [45], a considerable number of new interactions (17624) as well as experimentally identified interactions (75) not recovered by Geisler-Lee et al. [45] is found in our study (see Fig. 3). Differences between the two approaches that may cause the relatively small overlap are most probably the use of different protein-protein interaction databases for the source organisms (BIND, MIPS, BIOGRID, and DIP were used in Geisler-Lee et al. [45]) and the use of different confidence measures. These observations corroborate previous reports on the low coverage of current protein-protein interaction datasets and detection strategies.


Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression.

De Bodt S, Proost S, Vandepoele K, Rouzé P, Van de Peer Y - BMC Genomics (2009)

Overlap between datasets of protein-protein interactions. Overlap between experimentally identified protein-protein interactions available in public databases (see methods) and predicted sets from this study and the Geisler-Lee et al. study [45]. Filtered interactions from this study are compared to interactions with a confidence value of two or higher from the Geisler-Lee et al. [45] study.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2719670&req=5

Figure 3: Overlap between datasets of protein-protein interactions. Overlap between experimentally identified protein-protein interactions available in public databases (see methods) and predicted sets from this study and the Geisler-Lee et al. study [45]. Filtered interactions from this study are compared to interactions with a confidence value of two or higher from the Geisler-Lee et al. [45] study.
Mentions: The protein-protein interactions detected in the filtered and predicted interactome were compared to experimentally shown and previously predicted protein-protein interactions. Fig. 3 and Additional file 8 depict the overlap between the different datasets. Overall, a small overlap is found between our filtered interactome and the experimentally shown interactions reported by the TAIR, MINT and IntAct databases (see Fig. 3; Additional file 8, panel A1). Similarly, a small overlap of the experimentally identified interactions with the previously predicted interactions of Geisler-Lee et al. [45] and Cui et al. [46] is observed (see Fig. 3; Additional file 8, panel A2). Similar to our approach, Geisler-Lee et al. [45] identified interologs using worm, fly, human and yeast as source organisms. In this study, a confidence value is calculated taking into account the number of times a protein-protein interaction is found as interolog and/or supported by genomic features such as co-expression based on the Pearson correlation coefficient and localization based on the Arabidopsis Subcellular Database (SUBA) [47]. The predicted protein-protein interaction dataset of Cui et al. [46] was constructed based on a Naive Bayesian Classifier. This method integrates different predictive data sources such as ortholog information, GO biological process, co-expression, gene fusion, gene neighborhood, phylogenetic profiles and domain architecture, to build a model to predict novel protein-protein interactions. A comparison of our filtered and predicted interactome with these two sets of previously predicted protein-protein interactions is shown in Additional file 8 (panel B1 and B2). Although a similar approach was taken by Geisler-Lee et al. [45], a considerable number of new interactions (17624) as well as experimentally identified interactions (75) not recovered by Geisler-Lee et al. [45] is found in our study (see Fig. 3). Differences between the two approaches that may cause the relatively small overlap are most probably the use of different protein-protein interaction databases for the source organisms (BIND, MIPS, BIOGRID, and DIP were used in Geisler-Lee et al. [45]) and the use of different confidence measures. These observations corroborate previous reports on the low coverage of current protein-protein interaction datasets and detection strategies.

Bottom Line: We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions.Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered.Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Technologiepark 927, B-9052 Gent, Belgium. stefanie.debodt@psb.vib-ugent.be

ABSTRACT

Background: Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome.

Results: In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization) and components (e.g. ARPs, actin-related proteins) exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively.

Conclusion: We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.

Show MeSH
Related in: MedlinePlus