Limits...
QSLiMFinder: improved short linear motif prediction using specific query protein data.

Palopoli N, Lythgow KT, Edwards RJ - Bioinformatics (2015)

Bottom Line: QSLiMFinder was extensively benchmarked using known SLiM-containing proteins and simulated protein interaction datasets of real human proteins.Exploiting prior knowledge of a query protein likely to be involved in a SLiM-mediated interaction increased the proportion of true positives correctly returned and reduced the proportion of datasets returning a false positive prediction.The biggest improvement was seen if a short region of the query protein flanking the interaction site was known.

View Article: PubMed Central - PubMed

Affiliation: Centre for Biological Sciences, University of Southampton, Southampton, UK.

No MeSH data available.


Comparison of (a) QSLiMFinder (QSF) and (b) SLiMFinder (SF) results on SimBench datasets after searching with fragments of the Query protein of decreasing size. SN, the proportion of datasets returning a TP, is plotted against FPX, the proportion of datasets returning a FP, at different SLiMChance significance cut-offs (0.1, 0.05, 0.01, 0.005, 0.001, 5e-04, 1 e-04). Searches were made with the whole protein (‘none’, circles), with a window of five residues flanking the known ELM at each side (‘flank5’, triangles) or with the region of the motif only (‘site’, squares). For clarity, plots are truncated at the least significant cut-off for which FPX = 0
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4495300&req=5

btv155-F4: Comparison of (a) QSLiMFinder (QSF) and (b) SLiMFinder (SF) results on SimBench datasets after searching with fragments of the Query protein of decreasing size. SN, the proportion of datasets returning a TP, is plotted against FPX, the proportion of datasets returning a FP, at different SLiMChance significance cut-offs (0.1, 0.05, 0.01, 0.005, 0.001, 5e-04, 1 e-04). Searches were made with the whole protein (‘none’, circles), with a window of five residues flanking the known ELM at each side (‘flank5’, triangles) or with the region of the motif only (‘site’, squares). For clarity, plots are truncated at the least significant cut-off for which FPX = 0

Mentions: ELMBench datasets are commonly used for SLiM prediction benchmarking but are quite limited because (i) the number of ELMs is restricted, and (ii) the realism of a dataset in which every protein contains the SLiM is questionable for real world applications. We therefore sought to generate a more extensive benchmarking dataset, SimBench, which would more accurately reflect the nature of real world protein datasets for SLiM prediction and neither rely on, nor be unduly biased by, experimental data. For this, the 76 ELMred patterns with a normalized information content ≥ 3.0 (equivalent of 3+ fixed positions) were used to generate multiple datasets of real human proteins with different numbers of proteins and a range of signal-to-noise ratios, plus a matching number of control datasets of randomly selected human proteins. Again, QSLiMFinder shows greater SN than SLiMFinder, returning TP results for a greater proportion of SimBench datasets (Fig. 4). As expected, the effect is most pronounced when the query region is smallest, as this is when the motif space is most dramatically reduced. For the sake of clarity only those results obtained with the whole protein and the SLiM region with and without flanking residues are displayed, but results with windows of intermediate sizes lie in-between, as expected (data not shown).Fig. 4.


QSLiMFinder: improved short linear motif prediction using specific query protein data.

Palopoli N, Lythgow KT, Edwards RJ - Bioinformatics (2015)

Comparison of (a) QSLiMFinder (QSF) and (b) SLiMFinder (SF) results on SimBench datasets after searching with fragments of the Query protein of decreasing size. SN, the proportion of datasets returning a TP, is plotted against FPX, the proportion of datasets returning a FP, at different SLiMChance significance cut-offs (0.1, 0.05, 0.01, 0.005, 0.001, 5e-04, 1 e-04). Searches were made with the whole protein (‘none’, circles), with a window of five residues flanking the known ELM at each side (‘flank5’, triangles) or with the region of the motif only (‘site’, squares). For clarity, plots are truncated at the least significant cut-off for which FPX = 0
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4495300&req=5

btv155-F4: Comparison of (a) QSLiMFinder (QSF) and (b) SLiMFinder (SF) results on SimBench datasets after searching with fragments of the Query protein of decreasing size. SN, the proportion of datasets returning a TP, is plotted against FPX, the proportion of datasets returning a FP, at different SLiMChance significance cut-offs (0.1, 0.05, 0.01, 0.005, 0.001, 5e-04, 1 e-04). Searches were made with the whole protein (‘none’, circles), with a window of five residues flanking the known ELM at each side (‘flank5’, triangles) or with the region of the motif only (‘site’, squares). For clarity, plots are truncated at the least significant cut-off for which FPX = 0
Mentions: ELMBench datasets are commonly used for SLiM prediction benchmarking but are quite limited because (i) the number of ELMs is restricted, and (ii) the realism of a dataset in which every protein contains the SLiM is questionable for real world applications. We therefore sought to generate a more extensive benchmarking dataset, SimBench, which would more accurately reflect the nature of real world protein datasets for SLiM prediction and neither rely on, nor be unduly biased by, experimental data. For this, the 76 ELMred patterns with a normalized information content ≥ 3.0 (equivalent of 3+ fixed positions) were used to generate multiple datasets of real human proteins with different numbers of proteins and a range of signal-to-noise ratios, plus a matching number of control datasets of randomly selected human proteins. Again, QSLiMFinder shows greater SN than SLiMFinder, returning TP results for a greater proportion of SimBench datasets (Fig. 4). As expected, the effect is most pronounced when the query region is smallest, as this is when the motif space is most dramatically reduced. For the sake of clarity only those results obtained with the whole protein and the SLiM region with and without flanking residues are displayed, but results with windows of intermediate sizes lie in-between, as expected (data not shown).Fig. 4.

Bottom Line: QSLiMFinder was extensively benchmarked using known SLiM-containing proteins and simulated protein interaction datasets of real human proteins.Exploiting prior knowledge of a query protein likely to be involved in a SLiM-mediated interaction increased the proportion of true positives correctly returned and reduced the proportion of datasets returning a false positive prediction.The biggest improvement was seen if a short region of the query protein flanking the interaction site was known.

View Article: PubMed Central - PubMed

Affiliation: Centre for Biological Sciences, University of Southampton, Southampton, UK.

No MeSH data available.