Limits...
QSLiMFinder: improved short linear motif prediction using specific query protein data.

Palopoli N, Lythgow KT, Edwards RJ - Bioinformatics (2015)

Bottom Line: QSLiMFinder was extensively benchmarked using known SLiM-containing proteins and simulated protein interaction datasets of real human proteins.Exploiting prior knowledge of a query protein likely to be involved in a SLiM-mediated interaction increased the proportion of true positives correctly returned and reduced the proportion of datasets returning a false positive prediction.The biggest improvement was seen if a short region of the query protein flanking the interaction site was known.

View Article: PubMed Central - PubMed

Affiliation: Centre for Biological Sciences, University of Southampton, Southampton, UK.

No MeSH data available.


Comparison of (a) QSLiMFinder (QSF) and (b) SLiMFinder (SF) results on SimBench datasets with different signal-to-noise ratios. The proportion of datasets returning a true motif (SN) is plotted against the proportion of datasets returning a false hit (FPX) at each different SLiMChance significance cut-off (0.1, 0.05, 0.01, 0.005, 0.001, 5 e-04, 1 e-04). Selected combinations of signal (5, open symbols; 10, filled symbols) and dataset sizes (5, circles; 10, diamonds; 50, squares; 100, triangles) are displayed. Searches were made using the whole protein with disorder masking. For clarity, plots are truncated at the least significant cut-off for which FPX = 0
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4495300&req=5

btv155-F7: Comparison of (a) QSLiMFinder (QSF) and (b) SLiMFinder (SF) results on SimBench datasets with different signal-to-noise ratios. The proportion of datasets returning a true motif (SN) is plotted against the proportion of datasets returning a false hit (FPX) at each different SLiMChance significance cut-off (0.1, 0.05, 0.01, 0.005, 0.001, 5 e-04, 1 e-04). Selected combinations of signal (5, open symbols; 10, filled symbols) and dataset sizes (5, circles; 10, diamonds; 50, squares; 100, triangles) are displayed. Searches were made using the whole protein with disorder masking. For clarity, plots are truncated at the least significant cut-off for which FPX = 0

Mentions: Real protein datasets vary wildly in terms of the number of proteins they contain (Edwards et al., 2012). In general, an unknown fraction of these proteins will contain the SLiM being sought. The remaining proteins are ‘noise’, which interact with the target protein via a different mechanism. The SimBench data were generated with two different TP counts (5 or 10 per dataset) and five different signal-to-noise ratios to investigate the effects of data quality and quantity. As expected, the composition of the dataset is highly relevant to determine the trade-off between sensitivity and specificity. Intuitively, increasing the signal-to-noise ratio improves the sensitivity of prediction for both SLiMFinder and QSLiMFinder (Fig. 7). At equal signal-to-noise ratios, larger datasets also give a marked increase in true motifs, indicating that the SLiMChance over-representation statistics become more sensitive as the number of occurrences increases, which is not surprising given its foundation on the binomial distribution. However, in line with previous results, increasing the dataset size also increases the likelihood of a FP being returned (Edwards et al., 2007, 2012). This is most likely due to the effects of small local biases in amino acid composition being amplified as dataset sizes increase.Fig. 7.


QSLiMFinder: improved short linear motif prediction using specific query protein data.

Palopoli N, Lythgow KT, Edwards RJ - Bioinformatics (2015)

Comparison of (a) QSLiMFinder (QSF) and (b) SLiMFinder (SF) results on SimBench datasets with different signal-to-noise ratios. The proportion of datasets returning a true motif (SN) is plotted against the proportion of datasets returning a false hit (FPX) at each different SLiMChance significance cut-off (0.1, 0.05, 0.01, 0.005, 0.001, 5 e-04, 1 e-04). Selected combinations of signal (5, open symbols; 10, filled symbols) and dataset sizes (5, circles; 10, diamonds; 50, squares; 100, triangles) are displayed. Searches were made using the whole protein with disorder masking. For clarity, plots are truncated at the least significant cut-off for which FPX = 0
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4495300&req=5

btv155-F7: Comparison of (a) QSLiMFinder (QSF) and (b) SLiMFinder (SF) results on SimBench datasets with different signal-to-noise ratios. The proportion of datasets returning a true motif (SN) is plotted against the proportion of datasets returning a false hit (FPX) at each different SLiMChance significance cut-off (0.1, 0.05, 0.01, 0.005, 0.001, 5 e-04, 1 e-04). Selected combinations of signal (5, open symbols; 10, filled symbols) and dataset sizes (5, circles; 10, diamonds; 50, squares; 100, triangles) are displayed. Searches were made using the whole protein with disorder masking. For clarity, plots are truncated at the least significant cut-off for which FPX = 0
Mentions: Real protein datasets vary wildly in terms of the number of proteins they contain (Edwards et al., 2012). In general, an unknown fraction of these proteins will contain the SLiM being sought. The remaining proteins are ‘noise’, which interact with the target protein via a different mechanism. The SimBench data were generated with two different TP counts (5 or 10 per dataset) and five different signal-to-noise ratios to investigate the effects of data quality and quantity. As expected, the composition of the dataset is highly relevant to determine the trade-off between sensitivity and specificity. Intuitively, increasing the signal-to-noise ratio improves the sensitivity of prediction for both SLiMFinder and QSLiMFinder (Fig. 7). At equal signal-to-noise ratios, larger datasets also give a marked increase in true motifs, indicating that the SLiMChance over-representation statistics become more sensitive as the number of occurrences increases, which is not surprising given its foundation on the binomial distribution. However, in line with previous results, increasing the dataset size also increases the likelihood of a FP being returned (Edwards et al., 2007, 2012). This is most likely due to the effects of small local biases in amino acid composition being amplified as dataset sizes increase.Fig. 7.

Bottom Line: QSLiMFinder was extensively benchmarked using known SLiM-containing proteins and simulated protein interaction datasets of real human proteins.Exploiting prior knowledge of a query protein likely to be involved in a SLiM-mediated interaction increased the proportion of true positives correctly returned and reduced the proportion of datasets returning a false positive prediction.The biggest improvement was seen if a short region of the query protein flanking the interaction site was known.

View Article: PubMed Central - PubMed

Affiliation: Centre for Biological Sciences, University of Southampton, Southampton, UK.

No MeSH data available.