Limits...
Identification of amino acid propensities that are strong determinants of linear B-cell epitope using neural networks.

Su CH, Pal NR, Lin KL, Chung IF - PLoS ONE (2012)

Bottom Line: Our results show that the selected propensities are indeed good features, which also cooperate with other propensities to enhance the discriminating power for predicting epitopes.Our results confirm the effectiveness of active (group) feature selection by GFSMLP over the traditional passive approaches of evaluating various combinations of propensities.The GFSMLP-based feature selection can be extended to more than 500 remaining propensities to enhance our biological knowledge about epitopes and to obtain better prediction.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan, Republic of China.

ABSTRACT

Background: Identification of amino acid propensities that are strong determinants of linear B-cell epitope is very important to enrich our knowledge about epitopes. This can also help to obtain better epitope prediction. Typical linear B-cell epitope prediction methods combine various propensities in different ways to improve prediction accuracies. However, fewer but better features may yield better prediction. Moreover, for a propensity, when the sequence length is k, there will be k values, which should be treated as a single unit for feature selection and hence usual feature selection method will not work. Here we use a novel Group Feature Selecting Multilayered Perceptron, GFSMLP, which treats a group of related information as a single entity and selects useful propensities related to linear B-cell epitopes, and uses them to predict epitopes.

Methodology/ principal findings: We use eight widely known propensities and four data sets. We use GFSMLP to rank propensities by the frequency with which they are selected. We find that Chou's beta-turn and Ponnuswamy's polarity are better features for prediction of linear B-cell epitope. We examine the individual and combined discriminating power of the selected propensities and analyze the correlation between paired propensities. Our results show that the selected propensities are indeed good features, which also cooperate with other propensities to enhance the discriminating power for predicting epitopes. We find that individually polarity is not the best predictor, but it collaborates with others to yield good prediction. Usual feature selection methods cannot provide such information.

Conclusions/ significance: Our results confirm the effectiveness of active (group) feature selection by GFSMLP over the traditional passive approaches of evaluating various combinations of propensities. The GFSMLP-based feature selection can be extended to more than 500 remaining propensities to enhance our biological knowledge about epitopes and to obtain better prediction. A graphical-user-interface version of GFSMLP is available at: http://bio.classcloud.org/GFSMLP/.

Show MeSH
Encoding scheme for the calculation of correlations of pair amino acid propensities.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3275595&req=5

pone-0030617-g001: Encoding scheme for the calculation of correlations of pair amino acid propensities.

Mentions: For both B-cell epitopes and non B-cell epitopes for each of four data sets, we check the correlations of paired propensities, as shown in the eight sub-tables of Table S1. To compute the correlation on say epitope data, first we concatenate all epitope fragments. Then to compute the correlation between the propensity pair (#i, #j), we create two sequences of values, one by replacing each residue by the corresponding value of propensity #i and the other by replacing each residue by the corresponding value of propensity #j. Then we compute the Pearson correlation coefficient between the two sequences. The encoding of peptides for computation of correlation is explained in Fig. 1. Table S1 also provides us with several interesting observations: (a) Irrespective of data sets and their types (B-cell or non B-cell epitopes), we consistently obtain similar correlation between pairs of propensities. Since for each of four data sets there is not much difference between correlation matrices for B-cell epitopes and non B-cell epitopes, it might be taken as an explanation of why these eight propensities do not contribute sufficient discriminating power for the epitope prediction (as shown in Table 5). (b) Two strongly correlated propensities together cannot add additional discriminating power, but two uncorrelated pair may. Table S1 shows that only in a few cases the correlation is very low. On the other hand, the very high correlation value between propensity #2 and propensity #4 suggests that the behaviour of these two attributes would be similar, together they may not add much and we have already seen that these are true.


Identification of amino acid propensities that are strong determinants of linear B-cell epitope using neural networks.

Su CH, Pal NR, Lin KL, Chung IF - PLoS ONE (2012)

Encoding scheme for the calculation of correlations of pair amino acid propensities.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3275595&req=5

pone-0030617-g001: Encoding scheme for the calculation of correlations of pair amino acid propensities.
Mentions: For both B-cell epitopes and non B-cell epitopes for each of four data sets, we check the correlations of paired propensities, as shown in the eight sub-tables of Table S1. To compute the correlation on say epitope data, first we concatenate all epitope fragments. Then to compute the correlation between the propensity pair (#i, #j), we create two sequences of values, one by replacing each residue by the corresponding value of propensity #i and the other by replacing each residue by the corresponding value of propensity #j. Then we compute the Pearson correlation coefficient between the two sequences. The encoding of peptides for computation of correlation is explained in Fig. 1. Table S1 also provides us with several interesting observations: (a) Irrespective of data sets and their types (B-cell or non B-cell epitopes), we consistently obtain similar correlation between pairs of propensities. Since for each of four data sets there is not much difference between correlation matrices for B-cell epitopes and non B-cell epitopes, it might be taken as an explanation of why these eight propensities do not contribute sufficient discriminating power for the epitope prediction (as shown in Table 5). (b) Two strongly correlated propensities together cannot add additional discriminating power, but two uncorrelated pair may. Table S1 shows that only in a few cases the correlation is very low. On the other hand, the very high correlation value between propensity #2 and propensity #4 suggests that the behaviour of these two attributes would be similar, together they may not add much and we have already seen that these are true.

Bottom Line: Our results show that the selected propensities are indeed good features, which also cooperate with other propensities to enhance the discriminating power for predicting epitopes.Our results confirm the effectiveness of active (group) feature selection by GFSMLP over the traditional passive approaches of evaluating various combinations of propensities.The GFSMLP-based feature selection can be extended to more than 500 remaining propensities to enhance our biological knowledge about epitopes and to obtain better prediction.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan, Republic of China.

ABSTRACT

Background: Identification of amino acid propensities that are strong determinants of linear B-cell epitope is very important to enrich our knowledge about epitopes. This can also help to obtain better epitope prediction. Typical linear B-cell epitope prediction methods combine various propensities in different ways to improve prediction accuracies. However, fewer but better features may yield better prediction. Moreover, for a propensity, when the sequence length is k, there will be k values, which should be treated as a single unit for feature selection and hence usual feature selection method will not work. Here we use a novel Group Feature Selecting Multilayered Perceptron, GFSMLP, which treats a group of related information as a single entity and selects useful propensities related to linear B-cell epitopes, and uses them to predict epitopes.

Methodology/ principal findings: We use eight widely known propensities and four data sets. We use GFSMLP to rank propensities by the frequency with which they are selected. We find that Chou's beta-turn and Ponnuswamy's polarity are better features for prediction of linear B-cell epitope. We examine the individual and combined discriminating power of the selected propensities and analyze the correlation between paired propensities. Our results show that the selected propensities are indeed good features, which also cooperate with other propensities to enhance the discriminating power for predicting epitopes. We find that individually polarity is not the best predictor, but it collaborates with others to yield good prediction. Usual feature selection methods cannot provide such information.

Conclusions/ significance: Our results confirm the effectiveness of active (group) feature selection by GFSMLP over the traditional passive approaches of evaluating various combinations of propensities. The GFSMLP-based feature selection can be extended to more than 500 remaining propensities to enhance our biological knowledge about epitopes and to obtain better prediction. A graphical-user-interface version of GFSMLP is available at: http://bio.classcloud.org/GFSMLP/.

Show MeSH