Limits...
A word of caution about biological inference - Revisiting cysteine covalent state predictions.

Tüdős E, Mészáros B, Fiser A, Simon I - FEBS Open Bio (2014)

Bottom Line: However, the accuracy of predictions on randomized sequences or of non-cysteine residues remained high, suggesting that these predictions rather capture global features of proteins such as subcellular localization, which depends on composition.This illustrates that even high prediction accuracy is insufficient to validate implicit assumptions about a biological phenomenon.Correctly identifying the relevant underlying biochemical reasons for the success of a method is essential to gain proper biological insights and develop more accurate and novel bioinformatics tools.

View Article: PubMed Central - PubMed

Affiliation: Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, P.O. Box 286, H-1519 Budapest, Hungary.

ABSTRACT
The success of methods for predicting the redox state of cysteine residues from the sequence environment seemed to validate the basic assumption that this state is mainly determined locally. However, the accuracy of predictions on randomized sequences or of non-cysteine residues remained high, suggesting that these predictions rather capture global features of proteins such as subcellular localization, which depends on composition. This illustrates that even high prediction accuracy is insufficient to validate implicit assumptions about a biological phenomenon. Correctly identifying the relevant underlying biochemical reasons for the success of a method is essential to gain proper biological insights and develop more accurate and novel bioinformatics tools.

No MeSH data available.


Frequency of metal binding residues in the vicinity of Cys residues. X-axis shows the sequential distance from Cys, while the Y-axis is the percentage of metal binding residues at that position. The frequency values are normalized for the first 20 sequential positions, i.e. the sum of all columns adds up to 100.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4048844&req=5

f0005: Frequency of metal binding residues in the vicinity of Cys residues. X-axis shows the sequential distance from Cys, while the Y-axis is the percentage of metal binding residues at that position. The frequency values are normalized for the first 20 sequential positions, i.e. the sum of all columns adds up to 100.

Mentions: Once we re-established the method, we revisited the earlier hypothesis, according to which some positions are more important than others in defining the covalent state of Cys residues [15]. We compared the position specific values of the frequency ratios used to calculate the disulfide-bond forming potential. When comparing the pairwise correlation coefficients of the position specific potential vectors (values of the columns of the frequency tables in Supplementary Table 1), it seems that the third sequential neighbors of Cys residues may have some unique preference as opposed to all other positions (Supplementary Table 3). This effect can be attributed to the role of the ±3rd positions in coordinating metal-ion binding (Fig. 1). To test this hypothesis, we repeated the disulfide bond forming prediction method taking into account the frequency of occurrences in the ±3rd neighboring amino acid positions only and achieved a 59% accuracy (we obtained a similar 59% accuracy when tested for bound state prediction). Sequentially closer residues usually have more non-random distribution [14], probably due to certain structural constraints, therefore it is expected that taking into account only the nearest residues for the development of the prediction method may achieve a higher prediction accuracy than using only more distant ones. To explore this, all the possible sequential ranges were considered in subsequent calculations from 1 to 10 neighboring positions and the power of predictions were tested (Supplementary Table 4). Although accuracies increase with the widening of the considered sequence window, a “diminishing returns” effect can be seen. Accuracies seem to reach a plateau suggesting that more distant positions contribute less and less information.


A word of caution about biological inference - Revisiting cysteine covalent state predictions.

Tüdős E, Mészáros B, Fiser A, Simon I - FEBS Open Bio (2014)

Frequency of metal binding residues in the vicinity of Cys residues. X-axis shows the sequential distance from Cys, while the Y-axis is the percentage of metal binding residues at that position. The frequency values are normalized for the first 20 sequential positions, i.e. the sum of all columns adds up to 100.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4048844&req=5

f0005: Frequency of metal binding residues in the vicinity of Cys residues. X-axis shows the sequential distance from Cys, while the Y-axis is the percentage of metal binding residues at that position. The frequency values are normalized for the first 20 sequential positions, i.e. the sum of all columns adds up to 100.
Mentions: Once we re-established the method, we revisited the earlier hypothesis, according to which some positions are more important than others in defining the covalent state of Cys residues [15]. We compared the position specific values of the frequency ratios used to calculate the disulfide-bond forming potential. When comparing the pairwise correlation coefficients of the position specific potential vectors (values of the columns of the frequency tables in Supplementary Table 1), it seems that the third sequential neighbors of Cys residues may have some unique preference as opposed to all other positions (Supplementary Table 3). This effect can be attributed to the role of the ±3rd positions in coordinating metal-ion binding (Fig. 1). To test this hypothesis, we repeated the disulfide bond forming prediction method taking into account the frequency of occurrences in the ±3rd neighboring amino acid positions only and achieved a 59% accuracy (we obtained a similar 59% accuracy when tested for bound state prediction). Sequentially closer residues usually have more non-random distribution [14], probably due to certain structural constraints, therefore it is expected that taking into account only the nearest residues for the development of the prediction method may achieve a higher prediction accuracy than using only more distant ones. To explore this, all the possible sequential ranges were considered in subsequent calculations from 1 to 10 neighboring positions and the power of predictions were tested (Supplementary Table 4). Although accuracies increase with the widening of the considered sequence window, a “diminishing returns” effect can be seen. Accuracies seem to reach a plateau suggesting that more distant positions contribute less and less information.

Bottom Line: However, the accuracy of predictions on randomized sequences or of non-cysteine residues remained high, suggesting that these predictions rather capture global features of proteins such as subcellular localization, which depends on composition.This illustrates that even high prediction accuracy is insufficient to validate implicit assumptions about a biological phenomenon.Correctly identifying the relevant underlying biochemical reasons for the success of a method is essential to gain proper biological insights and develop more accurate and novel bioinformatics tools.

View Article: PubMed Central - PubMed

Affiliation: Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, P.O. Box 286, H-1519 Budapest, Hungary.

ABSTRACT
The success of methods for predicting the redox state of cysteine residues from the sequence environment seemed to validate the basic assumption that this state is mainly determined locally. However, the accuracy of predictions on randomized sequences or of non-cysteine residues remained high, suggesting that these predictions rather capture global features of proteins such as subcellular localization, which depends on composition. This illustrates that even high prediction accuracy is insufficient to validate implicit assumptions about a biological phenomenon. Correctly identifying the relevant underlying biochemical reasons for the success of a method is essential to gain proper biological insights and develop more accurate and novel bioinformatics tools.

No MeSH data available.