Limits...
PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor.

Heazlewood JL, Durek P, Hummel J, Selbig J, Weckwerth W, Walther D, Schulze WX - Nucleic Acids Res. (2007)

Bottom Line: For each protein, a phosphorylation site overview is presented in tabular form with detailed information on each identified phosphopeptide.An analysis of the current annotated Arabidopsis proteome yielded in 27,782 predicted phosphoserine sites distributed across 17,035 proteins.These prediction results are summarized graphically in the database together with the experimental phosphorylation sites in a whole sequence context.

View Article: PubMed Central - PubMed

Affiliation: ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley 6009, WA, Australia.

ABSTRACT
The PhosPhAt database provides a resource consolidating our current knowledge of mass spectrometry-based identified phosphorylation sites in Arabidopsis and combines it with phosphorylation site prediction specifically trained on experimentally identified Arabidopsis phosphorylation motifs. The database currently contains 1187 unique tryptic peptide sequences encompassing 1053 Arabidopsis proteins. Among the characterized phosphorylation sites, there are over 1000 with unambiguous site assignments, and nearly 500 for which the precise phosphorylation site could not be determined. The database is searchable by protein accession number, physical peptide characteristics, as well as by experimental conditions (tissue sampled, phosphopeptide enrichment method). For each protein, a phosphorylation site overview is presented in tabular form with detailed information on each identified phosphopeptide. We have utilized a set of 802 experimentally validated serine phosphorylation sites to develop a method for prediction of serine phosphorylation (pSer) in Arabidopsis. An analysis of the current annotated Arabidopsis proteome yielded in 27,782 predicted phosphoserine sites distributed across 17,035 proteins. These prediction results are summarized graphically in the database together with the experimental phosphorylation sites in a whole sequence context. The Arabidopsis Protein Phosphorylation Site Database (PhosPhAt) provides a valuable resource to the plant science community and can be accessed through the following link http://phosphat.mpimp-golm.mpg.de.

Show MeSH

Related in: MedlinePlus

Receiver operating characteristics curves of the prediction by pSer predictor in comparison to NetPhos 2.0 (21) (see Supplementary material for details). In the diagram, improved classification performance is indicated for predictors with increased area under the ROC. The area under the ROC curve was A1 = 0.81 ± 0.01 for the pSer predictor and A2 = 0.67 ± 0.01 for NetPhos and was significantly better with a z-score = (A1−A2)/SE(A1−A2) of 24.1 corresponding to a P-value of 3.3E−128 in the limiting case of a normal distribution according to the algorithm proposed in (22).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2238998&req=5

Figure 4: Receiver operating characteristics curves of the prediction by pSer predictor in comparison to NetPhos 2.0 (21) (see Supplementary material for details). In the diagram, improved classification performance is indicated for predictors with increased area under the ROC. The area under the ROC curve was A1 = 0.81 ± 0.01 for the pSer predictor and A2 = 0.67 ± 0.01 for NetPhos and was significantly better with a z-score = (A1−A2)/SE(A1−A2) of 24.1 corresponding to a P-value of 3.3E−128 in the limiting case of a normal distribution according to the algorithm proposed in (22).

Mentions: A comparison of the prediction performance of the plant-specific pSer predictor and the generic NetPhos 2.0 (21) reveals a significant improvement of recall, precision, as well as Matthew's correlation coefficient (CC) for Arabidopsis proteins (Figure 3). The CC reached with our plant-specific pSer predictor was 0.46 and, thus, significantly better than the CC for NetPhos 2.0 (CC = 0.22). In a 10-fold cross-validation test, 69% of phosphorylated serine sites from the training set were correctly recognized (Supplementary Table 1) compared to 68% recall for the NetPhos 2.0 server. Of the predicted sites, 61% were experimentally verified phosphoserine sites while the precision achieved with NetPhos 2.0 was 43%. The comparison of the receiver operating characteristic (ROC) curves revealed a highly significant improvement of the prediction performance with z-score of 24.1 according to the algorithm proposed by (22) corresponding to a P-value of 3.3E−128 in the limiting case of a normal distribution. The area under the ROC curve for the PhosPhAt plant-specific pSer predictor was 0.81 ± 0.01 and 0.67 ± 0.01 for NetPhos, respectively (Figure 4).Figure 3.


PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor.

Heazlewood JL, Durek P, Hummel J, Selbig J, Weckwerth W, Walther D, Schulze WX - Nucleic Acids Res. (2007)

Receiver operating characteristics curves of the prediction by pSer predictor in comparison to NetPhos 2.0 (21) (see Supplementary material for details). In the diagram, improved classification performance is indicated for predictors with increased area under the ROC. The area under the ROC curve was A1 = 0.81 ± 0.01 for the pSer predictor and A2 = 0.67 ± 0.01 for NetPhos and was significantly better with a z-score = (A1−A2)/SE(A1−A2) of 24.1 corresponding to a P-value of 3.3E−128 in the limiting case of a normal distribution according to the algorithm proposed in (22).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2238998&req=5

Figure 4: Receiver operating characteristics curves of the prediction by pSer predictor in comparison to NetPhos 2.0 (21) (see Supplementary material for details). In the diagram, improved classification performance is indicated for predictors with increased area under the ROC. The area under the ROC curve was A1 = 0.81 ± 0.01 for the pSer predictor and A2 = 0.67 ± 0.01 for NetPhos and was significantly better with a z-score = (A1−A2)/SE(A1−A2) of 24.1 corresponding to a P-value of 3.3E−128 in the limiting case of a normal distribution according to the algorithm proposed in (22).
Mentions: A comparison of the prediction performance of the plant-specific pSer predictor and the generic NetPhos 2.0 (21) reveals a significant improvement of recall, precision, as well as Matthew's correlation coefficient (CC) for Arabidopsis proteins (Figure 3). The CC reached with our plant-specific pSer predictor was 0.46 and, thus, significantly better than the CC for NetPhos 2.0 (CC = 0.22). In a 10-fold cross-validation test, 69% of phosphorylated serine sites from the training set were correctly recognized (Supplementary Table 1) compared to 68% recall for the NetPhos 2.0 server. Of the predicted sites, 61% were experimentally verified phosphoserine sites while the precision achieved with NetPhos 2.0 was 43%. The comparison of the receiver operating characteristic (ROC) curves revealed a highly significant improvement of the prediction performance with z-score of 24.1 according to the algorithm proposed by (22) corresponding to a P-value of 3.3E−128 in the limiting case of a normal distribution. The area under the ROC curve for the PhosPhAt plant-specific pSer predictor was 0.81 ± 0.01 and 0.67 ± 0.01 for NetPhos, respectively (Figure 4).Figure 3.

Bottom Line: For each protein, a phosphorylation site overview is presented in tabular form with detailed information on each identified phosphopeptide.An analysis of the current annotated Arabidopsis proteome yielded in 27,782 predicted phosphoserine sites distributed across 17,035 proteins.These prediction results are summarized graphically in the database together with the experimental phosphorylation sites in a whole sequence context.

View Article: PubMed Central - PubMed

Affiliation: ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley 6009, WA, Australia.

ABSTRACT
The PhosPhAt database provides a resource consolidating our current knowledge of mass spectrometry-based identified phosphorylation sites in Arabidopsis and combines it with phosphorylation site prediction specifically trained on experimentally identified Arabidopsis phosphorylation motifs. The database currently contains 1187 unique tryptic peptide sequences encompassing 1053 Arabidopsis proteins. Among the characterized phosphorylation sites, there are over 1000 with unambiguous site assignments, and nearly 500 for which the precise phosphorylation site could not be determined. The database is searchable by protein accession number, physical peptide characteristics, as well as by experimental conditions (tissue sampled, phosphopeptide enrichment method). For each protein, a phosphorylation site overview is presented in tabular form with detailed information on each identified phosphopeptide. We have utilized a set of 802 experimentally validated serine phosphorylation sites to develop a method for prediction of serine phosphorylation (pSer) in Arabidopsis. An analysis of the current annotated Arabidopsis proteome yielded in 27,782 predicted phosphoserine sites distributed across 17,035 proteins. These prediction results are summarized graphically in the database together with the experimental phosphorylation sites in a whole sequence context. The Arabidopsis Protein Phosphorylation Site Database (PhosPhAt) provides a valuable resource to the plant science community and can be accessed through the following link http://phosphat.mpimp-golm.mpg.de.

Show MeSH
Related in: MedlinePlus