Limits...
Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants

View Article: PubMed Central - PubMed

ABSTRACT

In proteomic analyses of the plant secretome, the presence of putative leaderless secretory proteins (LSPs) is difficult to confirm due to the possibility of contamination from other sub-cellular compartments. In the absence of a plant-specific tool for predicting LSPs, the mammalian-trained SecretomeP has been applied to plant proteins in multiple studies to identify the most likely LSPs. This study investigates the effectiveness of using SecretomeP on plant proteins, identifies its limitations and provides a benchmark for its use. In the absence of experimentally verified LSPs we exploit the common-feature hypothesis behind SecretomeP and use known classically secreted proteins (CSPs) of plants as a proxy to evaluate its accuracy. We show that, contrary to the common-feature hypothesis, plant CSPs are a poor proxy for evaluating LSP detection due to variation in the SecretomeP prediction scores when the signal peptide (SP) is modified. Removing the SP region from CSPs and comparing the predictive performance against non-secretory proteins indicates that commonly used threshold scores of 0.5 and 0.6 result in false-positive rates in excess of 0.3 when applied to plants proteins. Setting the false-positive rate to 0.05, consistent with the original mammalian performance of SecretomeP, yields only a marginally higher true positive rate compared to false positives. Therefore the use of SecretomeP on plant proteins is not recommended. This study investigates the trade-offs of using SecretomeP on plant proteins and provides insights into predictive features for future development of plant-specific common-feature tools.

No MeSH data available.


(A) The kernel density estimation (KDE) distributions of SecretomeP scores (v1.0) for original and modified sequences of the entire WallProtDB dataset. (Individual KDE distribution plots for sub-sets of WallProtDB are provided in Supplementary Figure S1). Rug plot marks along the x-axis indicate position of original scores. Vertical dashed lines at 0.5 and 0.6 represent the most common cutoff scores used in plant studies. (B) Correlation plots between original scores (x-axis) and modified scores (y-axis). Spearman correlation values are indicated for each modification.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037178&req=5

Figure 3: (A) The kernel density estimation (KDE) distributions of SecretomeP scores (v1.0) for original and modified sequences of the entire WallProtDB dataset. (Individual KDE distribution plots for sub-sets of WallProtDB are provided in Supplementary Figure S1). Rug plot marks along the x-axis indicate position of original scores. Vertical dashed lines at 0.5 and 0.6 represent the most common cutoff scores used in plant studies. (B) Correlation plots between original scores (x-axis) and modified scores (y-axis). Spearman correlation values are indicated for each modification.

Mentions: Since the mean score of predictions is a summary statistic and susceptible to outliers, the distribution of scores was plotted to visualize the effect of protein modifications. A kernel density estimation (KDE) plot was used for all original proteins in each dataset, with the distribution of each modification overlaid. The curve represents the smoothed Gaussian distributions of scores. The rug plot marks along the x-axis indicating the original scores for each protein. As suggested by the significant lowering of the SecretomeP output scores, a change in the score distribution was seen in KDE plots for each protein modification compared to the original WallProtDB dataset (Figure 3A). Protein modifications that were both expected and not expected to alter scores did so. Sub-sets of WallProtDB, based on either plant species or maximum sequence identity threshold, exhibit the same shifts showing that this observation is not due to the composition of WallProtDB (Supplementary Figure S1).


Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants
(A) The kernel density estimation (KDE) distributions of SecretomeP scores (v1.0) for original and modified sequences of the entire WallProtDB dataset. (Individual KDE distribution plots for sub-sets of WallProtDB are provided in Supplementary Figure S1). Rug plot marks along the x-axis indicate position of original scores. Vertical dashed lines at 0.5 and 0.6 represent the most common cutoff scores used in plant studies. (B) Correlation plots between original scores (x-axis) and modified scores (y-axis). Spearman correlation values are indicated for each modification.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037178&req=5

Figure 3: (A) The kernel density estimation (KDE) distributions of SecretomeP scores (v1.0) for original and modified sequences of the entire WallProtDB dataset. (Individual KDE distribution plots for sub-sets of WallProtDB are provided in Supplementary Figure S1). Rug plot marks along the x-axis indicate position of original scores. Vertical dashed lines at 0.5 and 0.6 represent the most common cutoff scores used in plant studies. (B) Correlation plots between original scores (x-axis) and modified scores (y-axis). Spearman correlation values are indicated for each modification.
Mentions: Since the mean score of predictions is a summary statistic and susceptible to outliers, the distribution of scores was plotted to visualize the effect of protein modifications. A kernel density estimation (KDE) plot was used for all original proteins in each dataset, with the distribution of each modification overlaid. The curve represents the smoothed Gaussian distributions of scores. The rug plot marks along the x-axis indicating the original scores for each protein. As suggested by the significant lowering of the SecretomeP output scores, a change in the score distribution was seen in KDE plots for each protein modification compared to the original WallProtDB dataset (Figure 3A). Protein modifications that were both expected and not expected to alter scores did so. Sub-sets of WallProtDB, based on either plant species or maximum sequence identity threshold, exhibit the same shifts showing that this observation is not due to the composition of WallProtDB (Supplementary Figure S1).

View Article: PubMed Central - PubMed

ABSTRACT

In proteomic analyses of the plant secretome, the presence of putative leaderless secretory proteins (LSPs) is difficult to confirm due to the possibility of contamination from other sub-cellular compartments. In the absence of a plant-specific tool for predicting LSPs, the mammalian-trained SecretomeP has been applied to plant proteins in multiple studies to identify the most likely LSPs. This study investigates the effectiveness of using SecretomeP on plant proteins, identifies its limitations and provides a benchmark for its use. In the absence of experimentally verified LSPs we exploit the common-feature hypothesis behind SecretomeP and use known classically secreted proteins (CSPs) of plants as a proxy to evaluate its accuracy. We show that, contrary to the common-feature hypothesis, plant CSPs are a poor proxy for evaluating LSP detection due to variation in the SecretomeP prediction scores when the signal peptide (SP) is modified. Removing the SP region from CSPs and comparing the predictive performance against non-secretory proteins indicates that commonly used threshold scores of 0.5 and 0.6 result in false-positive rates in excess of 0.3 when applied to plants proteins. Setting the false-positive rate to 0.05, consistent with the original mammalian performance of SecretomeP, yields only a marginally higher true positive rate compared to false positives. Therefore the use of SecretomeP on plant proteins is not recommended. This study investigates the trade-offs of using SecretomeP on plant proteins and provides insights into predictive features for future development of plant-specific common-feature tools.

No MeSH data available.