Limits...
Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants

View Article: PubMed Central - PubMed

ABSTRACT

In proteomic analyses of the plant secretome, the presence of putative leaderless secretory proteins (LSPs) is difficult to confirm due to the possibility of contamination from other sub-cellular compartments. In the absence of a plant-specific tool for predicting LSPs, the mammalian-trained SecretomeP has been applied to plant proteins in multiple studies to identify the most likely LSPs. This study investigates the effectiveness of using SecretomeP on plant proteins, identifies its limitations and provides a benchmark for its use. In the absence of experimentally verified LSPs we exploit the common-feature hypothesis behind SecretomeP and use known classically secreted proteins (CSPs) of plants as a proxy to evaluate its accuracy. We show that, contrary to the common-feature hypothesis, plant CSPs are a poor proxy for evaluating LSP detection due to variation in the SecretomeP prediction scores when the signal peptide (SP) is modified. Removing the SP region from CSPs and comparing the predictive performance against non-secretory proteins indicates that commonly used threshold scores of 0.5 and 0.6 result in false-positive rates in excess of 0.3 when applied to plants proteins. Setting the false-positive rate to 0.05, consistent with the original mammalian performance of SecretomeP, yields only a marginally higher true positive rate compared to false positives. Therefore the use of SecretomeP on plant proteins is not recommended. This study investigates the trade-offs of using SecretomeP on plant proteins and provides insights into predictive features for future development of plant-specific common-feature tools.

No MeSH data available.


The KDE distributions of SecretomeP scores (as described in Figure 3A) for protein sub-sets of ASURE. Extracellular (A), plasma membrane (B), nucleus (C), cytosol (D), nucleus and/or cytosol (E), and plastid (F), respectively.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037178&req=5

Figure 4: The KDE distributions of SecretomeP scores (as described in Figure 3A) for protein sub-sets of ASURE. Extracellular (A), plasma membrane (B), nucleus (C), cytosol (D), nucleus and/or cytosol (E), and plastid (F), respectively.

Mentions: The KDE plot of the modified WallProtDB dataset is distinct from the original sequences. The SP Remove, Reverse and SP C-term modifications all appear to shift the distribution toward the left, i.e., a higher density of lower scores (Figure 3A). Random shuffling of the SP region has the least change compared to other modifications. Given that it is the least disruptive modification to the sequence (altering the order of amino acids in only a small region of the protein that over 500 bootstraps could often resemble the original sequence), and yet it still results in a significant drop in scores, is further evidence that high prediction scores of SecretomeP are reliant on the SP. The density plot of the completely shuffled sequences (Random) were narrower with a high density of scores between 0.5 and 0.6 in all datasets. As a modification expected to alter the prediction score, this illustrates that disruption of the entire sequence does result in a change in scores, although as each score is an average of 500 bootstraps the exact range of these scores may not be informative. A more detailed investigation of the effect of SP sequence randomization and full sequence randomization on scores shows that the average value from bootstraps can hide some interesting variations (Supplementary Figure S2) but ultimately support the conclusion that the SP region can have a strong effect on SecretomeP output scores. For the ASURE dataset, the KDE plots of the secretory extracellular protein sub-set (Figure 4A) exhibit a similar pattern with a shift to lower scores. The mean scores did not quite reach a p-value to reject the common-feature hypothesis (Supplementary Table S1), likely due to the smaller sample size, however, these data suggest the same reliance of output scores on the SP as seen in the full WallProtDB dataset.


Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants
The KDE distributions of SecretomeP scores (as described in Figure 3A) for protein sub-sets of ASURE. Extracellular (A), plasma membrane (B), nucleus (C), cytosol (D), nucleus and/or cytosol (E), and plastid (F), respectively.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037178&req=5

Figure 4: The KDE distributions of SecretomeP scores (as described in Figure 3A) for protein sub-sets of ASURE. Extracellular (A), plasma membrane (B), nucleus (C), cytosol (D), nucleus and/or cytosol (E), and plastid (F), respectively.
Mentions: The KDE plot of the modified WallProtDB dataset is distinct from the original sequences. The SP Remove, Reverse and SP C-term modifications all appear to shift the distribution toward the left, i.e., a higher density of lower scores (Figure 3A). Random shuffling of the SP region has the least change compared to other modifications. Given that it is the least disruptive modification to the sequence (altering the order of amino acids in only a small region of the protein that over 500 bootstraps could often resemble the original sequence), and yet it still results in a significant drop in scores, is further evidence that high prediction scores of SecretomeP are reliant on the SP. The density plot of the completely shuffled sequences (Random) were narrower with a high density of scores between 0.5 and 0.6 in all datasets. As a modification expected to alter the prediction score, this illustrates that disruption of the entire sequence does result in a change in scores, although as each score is an average of 500 bootstraps the exact range of these scores may not be informative. A more detailed investigation of the effect of SP sequence randomization and full sequence randomization on scores shows that the average value from bootstraps can hide some interesting variations (Supplementary Figure S2) but ultimately support the conclusion that the SP region can have a strong effect on SecretomeP output scores. For the ASURE dataset, the KDE plots of the secretory extracellular protein sub-set (Figure 4A) exhibit a similar pattern with a shift to lower scores. The mean scores did not quite reach a p-value to reject the common-feature hypothesis (Supplementary Table S1), likely due to the smaller sample size, however, these data suggest the same reliance of output scores on the SP as seen in the full WallProtDB dataset.

View Article: PubMed Central - PubMed

ABSTRACT

In proteomic analyses of the plant secretome, the presence of putative leaderless secretory proteins (LSPs) is difficult to confirm due to the possibility of contamination from other sub-cellular compartments. In the absence of a plant-specific tool for predicting LSPs, the mammalian-trained SecretomeP has been applied to plant proteins in multiple studies to identify the most likely LSPs. This study investigates the effectiveness of using SecretomeP on plant proteins, identifies its limitations and provides a benchmark for its use. In the absence of experimentally verified LSPs we exploit the common-feature hypothesis behind SecretomeP and use known classically secreted proteins (CSPs) of plants as a proxy to evaluate its accuracy. We show that, contrary to the common-feature hypothesis, plant CSPs are a poor proxy for evaluating LSP detection due to variation in the SecretomeP prediction scores when the signal peptide (SP) is modified. Removing the SP region from CSPs and comparing the predictive performance against non-secretory proteins indicates that commonly used threshold scores of 0.5 and 0.6 result in false-positive rates in excess of 0.3 when applied to plants proteins. Setting the false-positive rate to 0.05, consistent with the original mammalian performance of SecretomeP, yields only a marginally higher true positive rate compared to false positives. Therefore the use of SecretomeP on plant proteins is not recommended. This study investigates the trade-offs of using SecretomeP on plant proteins and provides insights into predictive features for future development of plant-specific common-feature tools.

No MeSH data available.