Limits...
Proteome scanning to predict PDZ domain interactions using support vector machines.

Hui S, Bader GD - BMC Bioinformatics (2010)

Bottom Line: Using cross-validation and a series of independent tests, we showed that our SVM successfully predicts interactions in different organisms.A comparison of performance measures (F1 measure and FPR) for the SVM and published predictors demonstrated our SVM's improved accuracy and precision at proteome scanning.We showed that it correctly predicts known interactions from proteomes of different organisms and is more accurate and precise at proteome scanning compared with published state-of-the-art predictors.

View Article: PubMed Central - HTML - PubMed

Affiliation: Donnelly Center for Cellular and Biomolecular Research, Banting and Best Department of Medical Research, University of Toronto, Toronto ON, Canada.

ABSTRACT

Background: PDZ domains mediate protein-protein interactions involved in important biological processes through the recognition of short linear motifs in their target proteins. Two recent independent studies have used protein microarray or phage display technology to detect PDZ domain interactions with peptide ligands on a large scale. Several computational predictors of PDZ domain interactions have been developed, however they are trained using only protein microarray data and focus on limited subsets of PDZ domains. An accurate predictor of genomic PDZ domain interactions would allow the proteomes of organisms to be scanned for potential binders. Such an application would require an accurate and precise predictor to avoid generating too many false positive hits given the large amount of possible interactors in a given proteome. Once validated these predictions will help to increase the coverage of current PDZ domain interaction networks and further our understanding of the roles that PDZ domains play in a variety of biological processes.

Results: We developed a PDZ domain interaction predictor using a support vector machine (SVM) trained with both protein microarray and phage display data. In order to use the phage display data for training, which only contains positive interactions, we developed a method to generate artificial negative interactions. Using cross-validation and a series of independent tests, we showed that our SVM successfully predicts interactions in different organisms. We then used the SVM to scan the proteomes of human, worm and fly to predict binders for several PDZ domains. Predictions were validated using known genomic interactions and published protein microarray experiments. Based on our results, new protein interactions potentially associated with Usher and Bardet-Biedl syndromes were predicted. A comparison of performance measures (F1 measure and FPR) for the SVM and published predictors demonstrated our SVM's improved accuracy and precision at proteome scanning.

Conclusions: We built an SVM using mouse and human experimental training data to predict PDZ domain interactions. We showed that it correctly predicts known interactions from proteomes of different organisms and is more accurate and precise at proteome scanning compared with published state-of-the-art predictors.

Show MeSH

Related in: MedlinePlus

Comparison of independent genomic test performance of different SVMs. Blue × denotes data or method used by our SVM in all panels. (Top Row) A comparison of SVMs trained using data from one experiment: mouse from Chen et al. (magenta) or human from Tonikian et al. (light blue), from two experiments: mouse and human (green) and from two experiments with data enriched in genomic-like or non genomic-like human data: mouse and genomic-like human (blue) and mouse and non genomic-like human (red). (Middle Row) A comparison of SVMs trained using data encoded using different feature encodings: binary sequences (red), physicochemical properties (green), contact map (blue). (Bottom Row) A comparison of SVMs trained using different methods for generating artificial negatives for phage display: random peptides (red), shuffled peptides (green), randomly selected peptides (magenta), PWM selected peptides (blue). One hundred different SVMs trained using different random, shuffled and randomly selected peptides were built.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2967561&req=5

Figure 3: Comparison of independent genomic test performance of different SVMs. Blue × denotes data or method used by our SVM in all panels. (Top Row) A comparison of SVMs trained using data from one experiment: mouse from Chen et al. (magenta) or human from Tonikian et al. (light blue), from two experiments: mouse and human (green) and from two experiments with data enriched in genomic-like or non genomic-like human data: mouse and genomic-like human (blue) and mouse and non genomic-like human (red). (Middle Row) A comparison of SVMs trained using data encoded using different feature encodings: binary sequences (red), physicochemical properties (green), contact map (blue). (Bottom Row) A comparison of SVMs trained using different methods for generating artificial negatives for phage display: random peptides (red), shuffled peptides (green), randomly selected peptides (magenta), PWM selected peptides (blue). One hundred different SVMs trained using different random, shuffled and randomly selected peptides were built.

Mentions: We first validated our use of mouse protein microarray and human genomic-like phage display data for training. We compared our SVM to those built using data from single experimental data types (mouse/protein microarray or human/phage display), both experimental data types (mouse/protein microarray and human/phage display) and both experimental data types but with human phage display data enriched in genomic-like or non genomic-like interactions. For all SVMs, contact map features were used to encode the data and PWMs were used to generate artificial negatives. A comparison of predictor performance showed that our SVM was better than the other predictors for the worm and fly tests (Figure 3). All predictors had lower scores for the mouse orphan test. To explain the latter observation, for each test we computed the binding site similarity of each testing domain to its nearest training neighbour. We found that the mouse orphan domains were on average 65% similar to their nearest training neighbours, while the worm and fly testing domains were on average 80% and 87% similar to their nearest training neighbours respectively. Therefore the observed pattern of performance was consistent with our earlier observation that predictor performance decreased as the similarity between testing domains to their nearest training neighbours decreased. These results validate our use of both mouse protein microarray and human genomic-like phage display interactions for SVM training.


Proteome scanning to predict PDZ domain interactions using support vector machines.

Hui S, Bader GD - BMC Bioinformatics (2010)

Comparison of independent genomic test performance of different SVMs. Blue × denotes data or method used by our SVM in all panels. (Top Row) A comparison of SVMs trained using data from one experiment: mouse from Chen et al. (magenta) or human from Tonikian et al. (light blue), from two experiments: mouse and human (green) and from two experiments with data enriched in genomic-like or non genomic-like human data: mouse and genomic-like human (blue) and mouse and non genomic-like human (red). (Middle Row) A comparison of SVMs trained using data encoded using different feature encodings: binary sequences (red), physicochemical properties (green), contact map (blue). (Bottom Row) A comparison of SVMs trained using different methods for generating artificial negatives for phage display: random peptides (red), shuffled peptides (green), randomly selected peptides (magenta), PWM selected peptides (blue). One hundred different SVMs trained using different random, shuffled and randomly selected peptides were built.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2967561&req=5

Figure 3: Comparison of independent genomic test performance of different SVMs. Blue × denotes data or method used by our SVM in all panels. (Top Row) A comparison of SVMs trained using data from one experiment: mouse from Chen et al. (magenta) or human from Tonikian et al. (light blue), from two experiments: mouse and human (green) and from two experiments with data enriched in genomic-like or non genomic-like human data: mouse and genomic-like human (blue) and mouse and non genomic-like human (red). (Middle Row) A comparison of SVMs trained using data encoded using different feature encodings: binary sequences (red), physicochemical properties (green), contact map (blue). (Bottom Row) A comparison of SVMs trained using different methods for generating artificial negatives for phage display: random peptides (red), shuffled peptides (green), randomly selected peptides (magenta), PWM selected peptides (blue). One hundred different SVMs trained using different random, shuffled and randomly selected peptides were built.
Mentions: We first validated our use of mouse protein microarray and human genomic-like phage display data for training. We compared our SVM to those built using data from single experimental data types (mouse/protein microarray or human/phage display), both experimental data types (mouse/protein microarray and human/phage display) and both experimental data types but with human phage display data enriched in genomic-like or non genomic-like interactions. For all SVMs, contact map features were used to encode the data and PWMs were used to generate artificial negatives. A comparison of predictor performance showed that our SVM was better than the other predictors for the worm and fly tests (Figure 3). All predictors had lower scores for the mouse orphan test. To explain the latter observation, for each test we computed the binding site similarity of each testing domain to its nearest training neighbour. We found that the mouse orphan domains were on average 65% similar to their nearest training neighbours, while the worm and fly testing domains were on average 80% and 87% similar to their nearest training neighbours respectively. Therefore the observed pattern of performance was consistent with our earlier observation that predictor performance decreased as the similarity between testing domains to their nearest training neighbours decreased. These results validate our use of both mouse protein microarray and human genomic-like phage display interactions for SVM training.

Bottom Line: Using cross-validation and a series of independent tests, we showed that our SVM successfully predicts interactions in different organisms.A comparison of performance measures (F1 measure and FPR) for the SVM and published predictors demonstrated our SVM's improved accuracy and precision at proteome scanning.We showed that it correctly predicts known interactions from proteomes of different organisms and is more accurate and precise at proteome scanning compared with published state-of-the-art predictors.

View Article: PubMed Central - HTML - PubMed

Affiliation: Donnelly Center for Cellular and Biomolecular Research, Banting and Best Department of Medical Research, University of Toronto, Toronto ON, Canada.

ABSTRACT

Background: PDZ domains mediate protein-protein interactions involved in important biological processes through the recognition of short linear motifs in their target proteins. Two recent independent studies have used protein microarray or phage display technology to detect PDZ domain interactions with peptide ligands on a large scale. Several computational predictors of PDZ domain interactions have been developed, however they are trained using only protein microarray data and focus on limited subsets of PDZ domains. An accurate predictor of genomic PDZ domain interactions would allow the proteomes of organisms to be scanned for potential binders. Such an application would require an accurate and precise predictor to avoid generating too many false positive hits given the large amount of possible interactors in a given proteome. Once validated these predictions will help to increase the coverage of current PDZ domain interaction networks and further our understanding of the roles that PDZ domains play in a variety of biological processes.

Results: We developed a PDZ domain interaction predictor using a support vector machine (SVM) trained with both protein microarray and phage display data. In order to use the phage display data for training, which only contains positive interactions, we developed a method to generate artificial negative interactions. Using cross-validation and a series of independent tests, we showed that our SVM successfully predicts interactions in different organisms. We then used the SVM to scan the proteomes of human, worm and fly to predict binders for several PDZ domains. Predictions were validated using known genomic interactions and published protein microarray experiments. Based on our results, new protein interactions potentially associated with Usher and Bardet-Biedl syndromes were predicted. A comparison of performance measures (F1 measure and FPR) for the SVM and published predictors demonstrated our SVM's improved accuracy and precision at proteome scanning.

Conclusions: We built an SVM using mouse and human experimental training data to predict PDZ domain interactions. We showed that it correctly predicts known interactions from proteomes of different organisms and is more accurate and precise at proteome scanning compared with published state-of-the-art predictors.

Show MeSH
Related in: MedlinePlus