Limits...
Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins.

Durek P, Schudoma C, Weckwerth W, Selbig J, Walther D - BMC Bioinformatics (2009)

Bottom Line: The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem.When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information.While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max-Planck Institute of Molecular Plant Physiology, Potsdam, Germany. durek@mpimp-golm.mpg.de

ABSTRACT

Background: Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites.

Results: We characterized the spatial context of phosphorylation sites and assessed its usability for improved phosphorylation site predictions. We identified 750 non-redundant, experimentally verified sites with three-dimensional (3D) structural information available in the protein data bank (PDB) and grouped them according to their respective kinase family. We studied the spatial distribution of amino acids around phosphorserines, phosphothreonines, and phosphotyrosines to extract signature 3D-profiles. Characteristic spatial distributions of amino acid residue types around phosphorylation sites were indeed discernable, especially when kinase-family-specific target sites were analyzed. To test the added value of using spatial information for the computational prediction of phosphorylation sites, Support Vector Machines were applied using both sequence as well as structural information. When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information.

Conclusion: While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites. A web-based service (Phos3D) implementing the developed structure-based P-site prediction method has been made available at (http://phos3d.mpimp-golm.mpg.de).

Show MeSH
Sequence logos and radial cumulative propensity plots (RCP-plots) of kinase-specific sequence motifs. Sequence logos and radial cumulative propensity plots (RCP-plots) of kinase specific sequence motifs, illustrating enrichment as well as depletion of particular amino acid types in the local sequence (sequence logo), sequence-local spatial environment including the 6 flanking amino acid residues on either side of the central serine/threonine/tyrosine, (left RCP-plot), spatially-local, but non-sequence local; i.e., excluding residues in the flanking sequence (middle plot), and combined information (right RCP-plot). For every amino acid type, the two different sub-sectors correspond to the statistics obtained by using the closest detected atom and the interaction center, respectively, and in clockwise order.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2683816&req=5

Figure 4: Sequence logos and radial cumulative propensity plots (RCP-plots) of kinase-specific sequence motifs. Sequence logos and radial cumulative propensity plots (RCP-plots) of kinase specific sequence motifs, illustrating enrichment as well as depletion of particular amino acid types in the local sequence (sequence logo), sequence-local spatial environment including the 6 flanking amino acid residues on either side of the central serine/threonine/tyrosine, (left RCP-plot), spatially-local, but non-sequence local; i.e., excluding residues in the flanking sequence (middle plot), and combined information (right RCP-plot). For every amino acid type, the two different sub-sectors correspond to the statistics obtained by using the closest detected atom and the interaction center, respectively, and in clockwise order.

Mentions: The AGC family consists of kinases recognizing serine targets with an arginine or lysine residue at a distance of 2–3 residues relative to the central serine within the local protein sequence and includes the PKA and PKC as well as GRK, BARK, MARK, PKB, PKG, and RSK kinase families which are not included in the study of spatial motifs presented here for paucity of corresponding data. Furthermore, the local sequence-based spatial profile is characterized by lower than expected occurrences of tryptophan and glutamate. Interestingly, the elevated occurrences of the positively charged amino acids arginine and lysine – the hallmark for the AGC kinase group – appears confined to the sequence-local neighborhood. An enrichment of arginine or lysine in the spatial context of PKA was not detectable. In the structural neighborhood ("non-sequence-local" graphs), the counts for both amino acids are not increased relative to the reference distribution. The PKC motifs exhibit an additional enrichment of serine in the sequence-local neighborhood, accompanied by a pronounced depletion of the amino acid residues histidine, glutamate, and tryptophan. The PKA motifs were observed to be depleted of the amino acid cysteine. For both families, PKA and PKC, a depletion of the hydrophobic amino acids alanine and leucine in the non-sequence-local neighborhood and an additional depletion of isoleucine in PKA motifs was detected (Figure 4).


Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins.

Durek P, Schudoma C, Weckwerth W, Selbig J, Walther D - BMC Bioinformatics (2009)

Sequence logos and radial cumulative propensity plots (RCP-plots) of kinase-specific sequence motifs. Sequence logos and radial cumulative propensity plots (RCP-plots) of kinase specific sequence motifs, illustrating enrichment as well as depletion of particular amino acid types in the local sequence (sequence logo), sequence-local spatial environment including the 6 flanking amino acid residues on either side of the central serine/threonine/tyrosine, (left RCP-plot), spatially-local, but non-sequence local; i.e., excluding residues in the flanking sequence (middle plot), and combined information (right RCP-plot). For every amino acid type, the two different sub-sectors correspond to the statistics obtained by using the closest detected atom and the interaction center, respectively, and in clockwise order.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2683816&req=5

Figure 4: Sequence logos and radial cumulative propensity plots (RCP-plots) of kinase-specific sequence motifs. Sequence logos and radial cumulative propensity plots (RCP-plots) of kinase specific sequence motifs, illustrating enrichment as well as depletion of particular amino acid types in the local sequence (sequence logo), sequence-local spatial environment including the 6 flanking amino acid residues on either side of the central serine/threonine/tyrosine, (left RCP-plot), spatially-local, but non-sequence local; i.e., excluding residues in the flanking sequence (middle plot), and combined information (right RCP-plot). For every amino acid type, the two different sub-sectors correspond to the statistics obtained by using the closest detected atom and the interaction center, respectively, and in clockwise order.
Mentions: The AGC family consists of kinases recognizing serine targets with an arginine or lysine residue at a distance of 2–3 residues relative to the central serine within the local protein sequence and includes the PKA and PKC as well as GRK, BARK, MARK, PKB, PKG, and RSK kinase families which are not included in the study of spatial motifs presented here for paucity of corresponding data. Furthermore, the local sequence-based spatial profile is characterized by lower than expected occurrences of tryptophan and glutamate. Interestingly, the elevated occurrences of the positively charged amino acids arginine and lysine – the hallmark for the AGC kinase group – appears confined to the sequence-local neighborhood. An enrichment of arginine or lysine in the spatial context of PKA was not detectable. In the structural neighborhood ("non-sequence-local" graphs), the counts for both amino acids are not increased relative to the reference distribution. The PKC motifs exhibit an additional enrichment of serine in the sequence-local neighborhood, accompanied by a pronounced depletion of the amino acid residues histidine, glutamate, and tryptophan. The PKA motifs were observed to be depleted of the amino acid cysteine. For both families, PKA and PKC, a depletion of the hydrophobic amino acids alanine and leucine in the non-sequence-local neighborhood and an additional depletion of isoleucine in PKA motifs was detected (Figure 4).

Bottom Line: The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem.When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information.While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max-Planck Institute of Molecular Plant Physiology, Potsdam, Germany. durek@mpimp-golm.mpg.de

ABSTRACT

Background: Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites.

Results: We characterized the spatial context of phosphorylation sites and assessed its usability for improved phosphorylation site predictions. We identified 750 non-redundant, experimentally verified sites with three-dimensional (3D) structural information available in the protein data bank (PDB) and grouped them according to their respective kinase family. We studied the spatial distribution of amino acids around phosphorserines, phosphothreonines, and phosphotyrosines to extract signature 3D-profiles. Characteristic spatial distributions of amino acid residue types around phosphorylation sites were indeed discernable, especially when kinase-family-specific target sites were analyzed. To test the added value of using spatial information for the computational prediction of phosphorylation sites, Support Vector Machines were applied using both sequence as well as structural information. When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information.

Conclusion: While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites. A web-based service (Phos3D) implementing the developed structure-based P-site prediction method has been made available at (http://phos3d.mpimp-golm.mpg.de).

Show MeSH