Limits...
Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants

View Article: PubMed Central - PubMed

ABSTRACT

In proteomic analyses of the plant secretome, the presence of putative leaderless secretory proteins (LSPs) is difficult to confirm due to the possibility of contamination from other sub-cellular compartments. In the absence of a plant-specific tool for predicting LSPs, the mammalian-trained SecretomeP has been applied to plant proteins in multiple studies to identify the most likely LSPs. This study investigates the effectiveness of using SecretomeP on plant proteins, identifies its limitations and provides a benchmark for its use. In the absence of experimentally verified LSPs we exploit the common-feature hypothesis behind SecretomeP and use known classically secreted proteins (CSPs) of plants as a proxy to evaluate its accuracy. We show that, contrary to the common-feature hypothesis, plant CSPs are a poor proxy for evaluating LSP detection due to variation in the SecretomeP prediction scores when the signal peptide (SP) is modified. Removing the SP region from CSPs and comparing the predictive performance against non-secretory proteins indicates that commonly used threshold scores of 0.5 and 0.6 result in false-positive rates in excess of 0.3 when applied to plants proteins. Setting the false-positive rate to 0.05, consistent with the original mammalian performance of SecretomeP, yields only a marginally higher true positive rate compared to false positives. Therefore the use of SecretomeP on plant proteins is not recommended. This study investigates the trade-offs of using SecretomeP on plant proteins and provides insights into predictive features for future development of plant-specific common-feature tools.

No MeSH data available.


After the SP region (orange) was identified using SignalP 4.1 for positive data or a fixed length of 30 amino acids for negative, five modifications were made to each original sequence in a dataset. Reverse inverted the amino acid order of the entire sequence; SP C-term placed the SP at the C-terminus, leaving the sequence length unchanged; SP Remove excluded the SP region and shortened the sequence length. SP Random and Random involved random shuffling of the SP region and the entire coding sequence, respectively. The sequences were then submitted to SecretomeP. Orange, SP region; Purple, mature protein sequence.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037178&req=5

Figure 2: After the SP region (orange) was identified using SignalP 4.1 for positive data or a fixed length of 30 amino acids for negative, five modifications were made to each original sequence in a dataset. Reverse inverted the amino acid order of the entire sequence; SP C-term placed the SP at the C-terminus, leaving the sequence length unchanged; SP Remove excluded the SP region and shortened the sequence length. SP Random and Random involved random shuffling of the SP region and the entire coding sequence, respectively. The sequences were then submitted to SecretomeP. Orange, SP region; Purple, mature protein sequence.

Mentions: Under the common-feature hypothesis, modifications to only the SP of CSPs should not influence the prediction score of SecretomeP. Five modifications were made under this assumption (Figure 2): (1) The SP was removed (SP Remove) but because this results in a shorter sequence, sequences were also modified to alter the SP without either changing the length of the original sequence or its amino acid composition. These modifications were (2) removing the SP from the N-terminus and placing it at the C-terminus (SP C-term) and (3) random shuffling the amino acids of the SP at the N-terminus (SP Random). Further modifications that (4) reversed (Reverse) or (5) shuffled (Random) the entire sequence were made to compare the effect on the prediction score when the presumed common features of plant CSPs and LSPs were purposely disrupted. Both these latter modifications were expected to negatively impact SecretomeP prediction scores.


Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants
After the SP region (orange) was identified using SignalP 4.1 for positive data or a fixed length of 30 amino acids for negative, five modifications were made to each original sequence in a dataset. Reverse inverted the amino acid order of the entire sequence; SP C-term placed the SP at the C-terminus, leaving the sequence length unchanged; SP Remove excluded the SP region and shortened the sequence length. SP Random and Random involved random shuffling of the SP region and the entire coding sequence, respectively. The sequences were then submitted to SecretomeP. Orange, SP region; Purple, mature protein sequence.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037178&req=5

Figure 2: After the SP region (orange) was identified using SignalP 4.1 for positive data or a fixed length of 30 amino acids for negative, five modifications were made to each original sequence in a dataset. Reverse inverted the amino acid order of the entire sequence; SP C-term placed the SP at the C-terminus, leaving the sequence length unchanged; SP Remove excluded the SP region and shortened the sequence length. SP Random and Random involved random shuffling of the SP region and the entire coding sequence, respectively. The sequences were then submitted to SecretomeP. Orange, SP region; Purple, mature protein sequence.
Mentions: Under the common-feature hypothesis, modifications to only the SP of CSPs should not influence the prediction score of SecretomeP. Five modifications were made under this assumption (Figure 2): (1) The SP was removed (SP Remove) but because this results in a shorter sequence, sequences were also modified to alter the SP without either changing the length of the original sequence or its amino acid composition. These modifications were (2) removing the SP from the N-terminus and placing it at the C-terminus (SP C-term) and (3) random shuffling the amino acids of the SP at the N-terminus (SP Random). Further modifications that (4) reversed (Reverse) or (5) shuffled (Random) the entire sequence were made to compare the effect on the prediction score when the presumed common features of plant CSPs and LSPs were purposely disrupted. Both these latter modifications were expected to negatively impact SecretomeP prediction scores.

View Article: PubMed Central - PubMed

ABSTRACT

In proteomic analyses of the plant secretome, the presence of putative leaderless secretory proteins (LSPs) is difficult to confirm due to the possibility of contamination from other sub-cellular compartments. In the absence of a plant-specific tool for predicting LSPs, the mammalian-trained SecretomeP has been applied to plant proteins in multiple studies to identify the most likely LSPs. This study investigates the effectiveness of using SecretomeP on plant proteins, identifies its limitations and provides a benchmark for its use. In the absence of experimentally verified LSPs we exploit the common-feature hypothesis behind SecretomeP and use known classically secreted proteins (CSPs) of plants as a proxy to evaluate its accuracy. We show that, contrary to the common-feature hypothesis, plant CSPs are a poor proxy for evaluating LSP detection due to variation in the SecretomeP prediction scores when the signal peptide (SP) is modified. Removing the SP region from CSPs and comparing the predictive performance against non-secretory proteins indicates that commonly used threshold scores of 0.5 and 0.6 result in false-positive rates in excess of 0.3 when applied to plants proteins. Setting the false-positive rate to 0.05, consistent with the original mammalian performance of SecretomeP, yields only a marginally higher true positive rate compared to false positives. Therefore the use of SecretomeP on plant proteins is not recommended. This study investigates the trade-offs of using SecretomeP on plant proteins and provides insights into predictive features for future development of plant-specific common-feature tools.

No MeSH data available.