Limits...
Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants

View Article: PubMed Central - PubMed

ABSTRACT

In proteomic analyses of the plant secretome, the presence of putative leaderless secretory proteins (LSPs) is difficult to confirm due to the possibility of contamination from other sub-cellular compartments. In the absence of a plant-specific tool for predicting LSPs, the mammalian-trained SecretomeP has been applied to plant proteins in multiple studies to identify the most likely LSPs. This study investigates the effectiveness of using SecretomeP on plant proteins, identifies its limitations and provides a benchmark for its use. In the absence of experimentally verified LSPs we exploit the common-feature hypothesis behind SecretomeP and use known classically secreted proteins (CSPs) of plants as a proxy to evaluate its accuracy. We show that, contrary to the common-feature hypothesis, plant CSPs are a poor proxy for evaluating LSP detection due to variation in the SecretomeP prediction scores when the signal peptide (SP) is modified. Removing the SP region from CSPs and comparing the predictive performance against non-secretory proteins indicates that commonly used threshold scores of 0.5 and 0.6 result in false-positive rates in excess of 0.3 when applied to plants proteins. Setting the false-positive rate to 0.05, consistent with the original mammalian performance of SecretomeP, yields only a marginally higher true positive rate compared to false positives. Therefore the use of SecretomeP on plant proteins is not recommended. This study investigates the trade-offs of using SecretomeP on plant proteins and provides insights into predictive features for future development of plant-specific common-feature tools.

No MeSH data available.


Estimated true positive rate (TPR) and false positive rate (FPR) for SecretomeP as published for mammalian proteins (v1) based on internal cross-validation and bacterial proteins (v2) based on performance on classically secreted proteins (CSPs) from the SignalP 3.0 dataset. The estimated TPR of SecretomeP on plants as stated by Cheng et al. (2009) was 0.6, with no FPR given. The random line diagonal represents equal TPR and FPR, equivalent to random selection of classes.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037178&req=5

Figure 1: Estimated true positive rate (TPR) and false positive rate (FPR) for SecretomeP as published for mammalian proteins (v1) based on internal cross-validation and bacterial proteins (v2) based on performance on classically secreted proteins (CSPs) from the SignalP 3.0 dataset. The estimated TPR of SecretomeP on plants as stated by Cheng et al. (2009) was 0.6, with no FPR given. The random line diagonal represents equal TPR and FPR, equivalent to random selection of classes.

Mentions: Using the mammalian version of SecretomeP as a tool for LSPs in plants assumes that any common features are also shared between mammalian and plant-secreted proteins. The software programs used to capture protein features in the mammalian versions of SecretomeP are listed in Table 2. It also implicitly assumes that the reported threshold and accuracy metrics of the mammalian version of SecretomeP will apply to plants. The threshold value used to generate positive predictions is of particular importance: SecretomeP outputs scores in a range from 0 to 1 to indicate increasing confidence that a protein is secreted. The trade-off between true and false positives for any given threshold is essential in applying the tool to experimental output. The authors of SecretomeP recommended using a threshold of 0.6 when using the method on mammalian proteins, giving a true-positive rate (TPR) of 0.40 and false-positive rate (FPR) of 0.05 (Figure 1). This score was derived from cross-validated sub-sets of the modified CSP training data. When applied to the 13 human LSPs at the time, 10 of these were observed to be predicted at this threshold (Bendtsen et al., 2004).


Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants
Estimated true positive rate (TPR) and false positive rate (FPR) for SecretomeP as published for mammalian proteins (v1) based on internal cross-validation and bacterial proteins (v2) based on performance on classically secreted proteins (CSPs) from the SignalP 3.0 dataset. The estimated TPR of SecretomeP on plants as stated by Cheng et al. (2009) was 0.6, with no FPR given. The random line diagonal represents equal TPR and FPR, equivalent to random selection of classes.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037178&req=5

Figure 1: Estimated true positive rate (TPR) and false positive rate (FPR) for SecretomeP as published for mammalian proteins (v1) based on internal cross-validation and bacterial proteins (v2) based on performance on classically secreted proteins (CSPs) from the SignalP 3.0 dataset. The estimated TPR of SecretomeP on plants as stated by Cheng et al. (2009) was 0.6, with no FPR given. The random line diagonal represents equal TPR and FPR, equivalent to random selection of classes.
Mentions: Using the mammalian version of SecretomeP as a tool for LSPs in plants assumes that any common features are also shared between mammalian and plant-secreted proteins. The software programs used to capture protein features in the mammalian versions of SecretomeP are listed in Table 2. It also implicitly assumes that the reported threshold and accuracy metrics of the mammalian version of SecretomeP will apply to plants. The threshold value used to generate positive predictions is of particular importance: SecretomeP outputs scores in a range from 0 to 1 to indicate increasing confidence that a protein is secreted. The trade-off between true and false positives for any given threshold is essential in applying the tool to experimental output. The authors of SecretomeP recommended using a threshold of 0.6 when using the method on mammalian proteins, giving a true-positive rate (TPR) of 0.40 and false-positive rate (FPR) of 0.05 (Figure 1). This score was derived from cross-validated sub-sets of the modified CSP training data. When applied to the 13 human LSPs at the time, 10 of these were observed to be predicted at this threshold (Bendtsen et al., 2004).

View Article: PubMed Central - PubMed

ABSTRACT

In proteomic analyses of the plant secretome, the presence of putative leaderless secretory proteins (LSPs) is difficult to confirm due to the possibility of contamination from other sub-cellular compartments. In the absence of a plant-specific tool for predicting LSPs, the mammalian-trained SecretomeP has been applied to plant proteins in multiple studies to identify the most likely LSPs. This study investigates the effectiveness of using SecretomeP on plant proteins, identifies its limitations and provides a benchmark for its use. In the absence of experimentally verified LSPs we exploit the common-feature hypothesis behind SecretomeP and use known classically secreted proteins (CSPs) of plants as a proxy to evaluate its accuracy. We show that, contrary to the common-feature hypothesis, plant CSPs are a poor proxy for evaluating LSP detection due to variation in the SecretomeP prediction scores when the signal peptide (SP) is modified. Removing the SP region from CSPs and comparing the predictive performance against non-secretory proteins indicates that commonly used threshold scores of 0.5 and 0.6 result in false-positive rates in excess of 0.3 when applied to plants proteins. Setting the false-positive rate to 0.05, consistent with the original mammalian performance of SecretomeP, yields only a marginally higher true positive rate compared to false positives. Therefore the use of SecretomeP on plant proteins is not recommended. This study investigates the trade-offs of using SecretomeP on plant proteins and provides insights into predictive features for future development of plant-specific common-feature tools.

No MeSH data available.