Limits...
Exploiting publicly available biological and biochemical information for the discovery of novel short linear motifs.

Sayadi A, Briganti L, Tramontano A, Via A - PLoS ONE (2011)

Bottom Line: Consequently, only a small fraction of them have been discovered so far.We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones.An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, Sapienza University of Rome, Rome, Italy.

ABSTRACT
The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length of the motifs and their variable degree of conservation makes their identification hard since it is difficult to correctly estimate the statistical significance of their occurrence. Consequently, only a small fraction of them have been discovered so far. We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones. An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried. An instance of the database populated with pre-computed data on seven organisms is accessible through a publicly available server and we believe it constitutes by itself a useful resource for the life sciences (http://www.biocomputing.it/modipath).

Show MeSH
The crystal structure of the human granulocyte colony-stimulating factor (GCSF) receptor.The structure of the GCSF receptor (PDB:2D9Q [42]) is reported in orange. Residues corresponding to the WS.WS motif (residues 295–299) are shown in blue.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3140502&req=5

pone-0022270-g002: The crystal structure of the human granulocyte colony-stimulating factor (GCSF) receptor.The structure of the GCSF receptor (PDB:2D9Q [42]) is reported in orange. Residues corresponding to the WS.WS motif (residues 295–299) are shown in blue.

Mentions: Another interesting motif that we automatically detected is WS.WS (Trp-Ser-any-Trp-Ser), which is specific for the Hematopoietic cell lineage pathway (KEGG ID: hsa04640) (hyper-geometric p-value<3.10e-11). The motif was found in the analysis of both the 40% and 25% non-redundant sequence datasets and is present in 9 proteins out of the 79 belonging to the pathway, whereas it occurs in only 59 other sequences of the 40% non-redundant UniProt human dataset. Figure S2 shows the PROSITE [17] and Pfam [38] domain composition of the nine KEGG proteins together with the position of the WS.WS motif in the sequence: the motif is found at the C-terminal of the PROSITE FN3 domain in six cases and outside of the domain in three cases. This suggests that, at least in some of these proteins, the occurrence of the motif is not due to evolutionary conservation but rather to functional contraints. The WS.WS motif appears to be necessary for the binding activity of the erythropoietin receptor (EpoR), a member of the cytokine and growth factor receptor family. These proteins share conserved features in their extracellular and cytoplasmic domains presumably necessary for proper folding and thereby efficient intracellular transport and cell-surface receptor binding. Yoshimura et al [39] demonstrated that mutations in the motif of EpoR abolish processing, ligand binding, and activation of the receptor, while Schimmenti et al [40] showed that WS.WS is necessary for EpoR binding to Epo. For two (Uniprot: P15509 and Q99062) out of the nine proteins hosting the motif, the crystal structure has been determined (PDB:3CXE [41] and 2D9Q [42]). In both cases, the motif instance is nicely found in an exposed loop of the protein structure (Figure 2).


Exploiting publicly available biological and biochemical information for the discovery of novel short linear motifs.

Sayadi A, Briganti L, Tramontano A, Via A - PLoS ONE (2011)

The crystal structure of the human granulocyte colony-stimulating factor (GCSF) receptor.The structure of the GCSF receptor (PDB:2D9Q [42]) is reported in orange. Residues corresponding to the WS.WS motif (residues 295–299) are shown in blue.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3140502&req=5

pone-0022270-g002: The crystal structure of the human granulocyte colony-stimulating factor (GCSF) receptor.The structure of the GCSF receptor (PDB:2D9Q [42]) is reported in orange. Residues corresponding to the WS.WS motif (residues 295–299) are shown in blue.
Mentions: Another interesting motif that we automatically detected is WS.WS (Trp-Ser-any-Trp-Ser), which is specific for the Hematopoietic cell lineage pathway (KEGG ID: hsa04640) (hyper-geometric p-value<3.10e-11). The motif was found in the analysis of both the 40% and 25% non-redundant sequence datasets and is present in 9 proteins out of the 79 belonging to the pathway, whereas it occurs in only 59 other sequences of the 40% non-redundant UniProt human dataset. Figure S2 shows the PROSITE [17] and Pfam [38] domain composition of the nine KEGG proteins together with the position of the WS.WS motif in the sequence: the motif is found at the C-terminal of the PROSITE FN3 domain in six cases and outside of the domain in three cases. This suggests that, at least in some of these proteins, the occurrence of the motif is not due to evolutionary conservation but rather to functional contraints. The WS.WS motif appears to be necessary for the binding activity of the erythropoietin receptor (EpoR), a member of the cytokine and growth factor receptor family. These proteins share conserved features in their extracellular and cytoplasmic domains presumably necessary for proper folding and thereby efficient intracellular transport and cell-surface receptor binding. Yoshimura et al [39] demonstrated that mutations in the motif of EpoR abolish processing, ligand binding, and activation of the receptor, while Schimmenti et al [40] showed that WS.WS is necessary for EpoR binding to Epo. For two (Uniprot: P15509 and Q99062) out of the nine proteins hosting the motif, the crystal structure has been determined (PDB:3CXE [41] and 2D9Q [42]). In both cases, the motif instance is nicely found in an exposed loop of the protein structure (Figure 2).

Bottom Line: Consequently, only a small fraction of them have been discovered so far.We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones.An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, Sapienza University of Rome, Rome, Italy.

ABSTRACT
The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length of the motifs and their variable degree of conservation makes their identification hard since it is difficult to correctly estimate the statistical significance of their occurrence. Consequently, only a small fraction of them have been discovered so far. We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones. An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried. An instance of the database populated with pre-computed data on seven organisms is accessible through a publicly available server and we believe it constitutes by itself a useful resource for the life sciences (http://www.biocomputing.it/modipath).

Show MeSH