Limits...
Exploiting publicly available biological and biochemical information for the discovery of novel short linear motifs.

Sayadi A, Briganti L, Tramontano A, Via A - PLoS ONE (2011)

Bottom Line: Consequently, only a small fraction of them have been discovered so far.We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones.An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, Sapienza University of Rome, Rome, Italy.

ABSTRACT
The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length of the motifs and their variable degree of conservation makes their identification hard since it is difficult to correctly estimate the statistical significance of their occurrence. Consequently, only a small fraction of them have been discovered so far. We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones. An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried. An instance of the database populated with pre-computed data on seven organisms is accessible through a publicly available server and we believe it constitutes by itself a useful resource for the life sciences (http://www.biocomputing.it/modipath).

Show MeSH
Motif occurrence Hyper-geometric distribution.Hyper-geometric p-value distribution for the number of motif occurrences in true (black) and reshuffled (red) KEGG pathways with respect to the number of motif occurrences in the UniProt dataset for H.sapiens. The p-value = 3e-9 approximately corresponds to a false discovery rate of 10%.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3140502&req=5

pone-0022270-g004: Motif occurrence Hyper-geometric distribution.Hyper-geometric p-value distribution for the number of motif occurrences in true (black) and reshuffled (red) KEGG pathways with respect to the number of motif occurrences in the UniProt dataset for H.sapiens. The p-value = 3e-9 approximately corresponds to a false discovery rate of 10%.

Mentions: In order to choose an un-biased hyper-geometric p-value threshold, we had to take into account the KEGG pathway peculiar composition, which is clearly not random. To this aim, we built random pathways by reshuffling the proteins of each pathway with proteins belonging to other pathways, leaving the number of proteins per pathway unmodified. Next, we plotted the hyper-geometric p-value distribution of motif occurrences in the random datasets with respect to their occurrences in the Uniprot dataset and compared it to the corresponding distribution for the true datasets (Figure 4). We estimated that the hyper-geometric p-value that better discriminates between true and false positives (random) is 3e-9, which corresponds to a false discovery rate (FDR) lower than 10%. The procedure was repeated ten times for H.sapiens producing essentially the same result. The result was the same when all human proteins of SwissProt were used for reshuffling (data not shown).


Exploiting publicly available biological and biochemical information for the discovery of novel short linear motifs.

Sayadi A, Briganti L, Tramontano A, Via A - PLoS ONE (2011)

Motif occurrence Hyper-geometric distribution.Hyper-geometric p-value distribution for the number of motif occurrences in true (black) and reshuffled (red) KEGG pathways with respect to the number of motif occurrences in the UniProt dataset for H.sapiens. The p-value = 3e-9 approximately corresponds to a false discovery rate of 10%.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3140502&req=5

pone-0022270-g004: Motif occurrence Hyper-geometric distribution.Hyper-geometric p-value distribution for the number of motif occurrences in true (black) and reshuffled (red) KEGG pathways with respect to the number of motif occurrences in the UniProt dataset for H.sapiens. The p-value = 3e-9 approximately corresponds to a false discovery rate of 10%.
Mentions: In order to choose an un-biased hyper-geometric p-value threshold, we had to take into account the KEGG pathway peculiar composition, which is clearly not random. To this aim, we built random pathways by reshuffling the proteins of each pathway with proteins belonging to other pathways, leaving the number of proteins per pathway unmodified. Next, we plotted the hyper-geometric p-value distribution of motif occurrences in the random datasets with respect to their occurrences in the Uniprot dataset and compared it to the corresponding distribution for the true datasets (Figure 4). We estimated that the hyper-geometric p-value that better discriminates between true and false positives (random) is 3e-9, which corresponds to a false discovery rate (FDR) lower than 10%. The procedure was repeated ten times for H.sapiens producing essentially the same result. The result was the same when all human proteins of SwissProt were used for reshuffling (data not shown).

Bottom Line: Consequently, only a small fraction of them have been discovered so far.We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones.An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, Sapienza University of Rome, Rome, Italy.

ABSTRACT
The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length of the motifs and their variable degree of conservation makes their identification hard since it is difficult to correctly estimate the statistical significance of their occurrence. Consequently, only a small fraction of them have been discovered so far. We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones. An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried. An instance of the database populated with pre-computed data on seven organisms is accessible through a publicly available server and we believe it constitutes by itself a useful resource for the life sciences (http://www.biocomputing.it/modipath).

Show MeSH