Limits...
Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants

View Article: PubMed Central - PubMed

ABSTRACT

In proteomic analyses of the plant secretome, the presence of putative leaderless secretory proteins (LSPs) is difficult to confirm due to the possibility of contamination from other sub-cellular compartments. In the absence of a plant-specific tool for predicting LSPs, the mammalian-trained SecretomeP has been applied to plant proteins in multiple studies to identify the most likely LSPs. This study investigates the effectiveness of using SecretomeP on plant proteins, identifies its limitations and provides a benchmark for its use. In the absence of experimentally verified LSPs we exploit the common-feature hypothesis behind SecretomeP and use known classically secreted proteins (CSPs) of plants as a proxy to evaluate its accuracy. We show that, contrary to the common-feature hypothesis, plant CSPs are a poor proxy for evaluating LSP detection due to variation in the SecretomeP prediction scores when the signal peptide (SP) is modified. Removing the SP region from CSPs and comparing the predictive performance against non-secretory proteins indicates that commonly used threshold scores of 0.5 and 0.6 result in false-positive rates in excess of 0.3 when applied to plants proteins. Setting the false-positive rate to 0.05, consistent with the original mammalian performance of SecretomeP, yields only a marginally higher true positive rate compared to false positives. Therefore the use of SecretomeP on plant proteins is not recommended. This study investigates the trade-offs of using SecretomeP on plant proteins and provides insights into predictive features for future development of plant-specific common-feature tools.

No MeSH data available.


Related in: MedlinePlus

Receiver operating characteristic (ROC) curves for positive WallProtDB Arabidopsis proteins vs. the negative class of ASURE proteins from the nucleus and/or cytosol. Area Under Curve (AUC) values are shown for each protein modification. The AUC for random (pure chance/luck) is 0.5. As the threshold for classification is reduced, the true positive (TPR) and false positive (FPR) rates are mapped to the axes, giving visual insight into how the classifier balances true and false positives and which thresholds might be considered most appropriate. The preferred path of an ROC is toward the upper left hand corner signifying a high true positive rate and low false positive rate. Three points on the SP Remove curve are annotated corresponding to when either the FPR is <0.05 or threshold is set to 0.5 or 0.6. The exact values are listed in Supplementary Table S3.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037178&req=5

Figure 6: Receiver operating characteristic (ROC) curves for positive WallProtDB Arabidopsis proteins vs. the negative class of ASURE proteins from the nucleus and/or cytosol. Area Under Curve (AUC) values are shown for each protein modification. The AUC for random (pure chance/luck) is 0.5. As the threshold for classification is reduced, the true positive (TPR) and false positive (FPR) rates are mapped to the axes, giving visual insight into how the classifier balances true and false positives and which thresholds might be considered most appropriate. The preferred path of an ROC is toward the upper left hand corner signifying a high true positive rate and low false positive rate. Three points on the SP Remove curve are annotated corresponding to when either the FPR is <0.05 or threshold is set to 0.5 or 0.6. The exact values are listed in Supplementary Table S3.

Mentions: WallProtDB contains many similar sequences. Therefore, to be able to compare against the Arabidopsis negative data in an unbiased manner, we used the Arabidopsis WallProtDB proteins and the ASURE (nucleus and/or cytosol) sub-set to create ROC curves comparing modified positive to modified negative datasets (Figure 6). The SP Remove modification is used to infer the TPR and FPR scores, although the results from other modifications are also shown.


Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants
Receiver operating characteristic (ROC) curves for positive WallProtDB Arabidopsis proteins vs. the negative class of ASURE proteins from the nucleus and/or cytosol. Area Under Curve (AUC) values are shown for each protein modification. The AUC for random (pure chance/luck) is 0.5. As the threshold for classification is reduced, the true positive (TPR) and false positive (FPR) rates are mapped to the axes, giving visual insight into how the classifier balances true and false positives and which thresholds might be considered most appropriate. The preferred path of an ROC is toward the upper left hand corner signifying a high true positive rate and low false positive rate. Three points on the SP Remove curve are annotated corresponding to when either the FPR is <0.05 or threshold is set to 0.5 or 0.6. The exact values are listed in Supplementary Table S3.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037178&req=5

Figure 6: Receiver operating characteristic (ROC) curves for positive WallProtDB Arabidopsis proteins vs. the negative class of ASURE proteins from the nucleus and/or cytosol. Area Under Curve (AUC) values are shown for each protein modification. The AUC for random (pure chance/luck) is 0.5. As the threshold for classification is reduced, the true positive (TPR) and false positive (FPR) rates are mapped to the axes, giving visual insight into how the classifier balances true and false positives and which thresholds might be considered most appropriate. The preferred path of an ROC is toward the upper left hand corner signifying a high true positive rate and low false positive rate. Three points on the SP Remove curve are annotated corresponding to when either the FPR is <0.05 or threshold is set to 0.5 or 0.6. The exact values are listed in Supplementary Table S3.
Mentions: WallProtDB contains many similar sequences. Therefore, to be able to compare against the Arabidopsis negative data in an unbiased manner, we used the Arabidopsis WallProtDB proteins and the ASURE (nucleus and/or cytosol) sub-set to create ROC curves comparing modified positive to modified negative datasets (Figure 6). The SP Remove modification is used to infer the TPR and FPR scores, although the results from other modifications are also shown.

View Article: PubMed Central - PubMed

ABSTRACT

In proteomic analyses of the plant secretome, the presence of putative leaderless secretory proteins (LSPs) is difficult to confirm due to the possibility of contamination from other sub-cellular compartments. In the absence of a plant-specific tool for predicting LSPs, the mammalian-trained SecretomeP has been applied to plant proteins in multiple studies to identify the most likely LSPs. This study investigates the effectiveness of using SecretomeP on plant proteins, identifies its limitations and provides a benchmark for its use. In the absence of experimentally verified LSPs we exploit the common-feature hypothesis behind SecretomeP and use known classically secreted proteins (CSPs) of plants as a proxy to evaluate its accuracy. We show that, contrary to the common-feature hypothesis, plant CSPs are a poor proxy for evaluating LSP detection due to variation in the SecretomeP prediction scores when the signal peptide (SP) is modified. Removing the SP region from CSPs and comparing the predictive performance against non-secretory proteins indicates that commonly used threshold scores of 0.5 and 0.6 result in false-positive rates in excess of 0.3 when applied to plants proteins. Setting the false-positive rate to 0.05, consistent with the original mammalian performance of SecretomeP, yields only a marginally higher true positive rate compared to false positives. Therefore the use of SecretomeP on plant proteins is not recommended. This study investigates the trade-offs of using SecretomeP on plant proteins and provides insights into predictive features for future development of plant-specific common-feature tools.

No MeSH data available.


Related in: MedlinePlus