Limits...
EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

Lian Y, Ge M, Pan XM - BMC Bioinformatics (2014)

Bottom Line: To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed.We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728.We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information.

View Article: PubMed Central - PubMed

Affiliation: The Key Laboratory of Bioinformatics, Ministry of Education, School of Life Sciences, Tsinghua University, Beijing, 100084, China. lianyao1112@gmail.com.

ABSTRACT

Background: B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task.

Results: In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728.

Conclusions: We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .

Show MeSH

Related in: MedlinePlus

ROC curves of the best and worst performance among 300 modeling trials using 10-fold cross-validation. Red: the best performance; Green: the worst performance.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4307399&req=5

Fig2: ROC curves of the best and worst performance among 300 modeling trials using 10-fold cross-validation. Red: the best performance; Green: the worst performance.

Mentions: We performed 300 experiments on 10-fold cross-validation utilizing 300 sub-datasets that are the same in the positive datasets but different in the negative datasets. For each trial, the positive dataset of 4405 epitopes are exactly same with BEOD’s 4405 epitopes while the negative dataset of 4405 non-epitopes are randomly selected from BEOD’s 8467 non-epitopes. The ROC plots for the best and worst performances among the 300 trials are shown in Figure 2. The performances of all 300 trials are summarized in Table 1. As shown in Table 1 and Figure 2, the variance of the 300 results is large, with Sn ranging from 83.5% to 81.7%, P from 77.6% to 55.7%, F-measure from 0.805 to 0.663, and AUC from 0.893 to 0.673. These large discrepancies corroborate our speculation of the noise of non-epitopes even if they are experimentally verified and support our means of randomly constructing many negative sub-datasets and reporting the average result instead of the best result. In conclusion, our sequence-based linear B-cell epitope prediction method achieved an average Sn of 81.8 ± 0.8% (95% CI), P of 64.1 ± 0.2% (95% CI), F-measure of 0.719 ± 0.08 (95% CI), and AUC of 0.728 using 10-fold cross-validation.Figure 2


EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

Lian Y, Ge M, Pan XM - BMC Bioinformatics (2014)

ROC curves of the best and worst performance among 300 modeling trials using 10-fold cross-validation. Red: the best performance; Green: the worst performance.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4307399&req=5

Fig2: ROC curves of the best and worst performance among 300 modeling trials using 10-fold cross-validation. Red: the best performance; Green: the worst performance.
Mentions: We performed 300 experiments on 10-fold cross-validation utilizing 300 sub-datasets that are the same in the positive datasets but different in the negative datasets. For each trial, the positive dataset of 4405 epitopes are exactly same with BEOD’s 4405 epitopes while the negative dataset of 4405 non-epitopes are randomly selected from BEOD’s 8467 non-epitopes. The ROC plots for the best and worst performances among the 300 trials are shown in Figure 2. The performances of all 300 trials are summarized in Table 1. As shown in Table 1 and Figure 2, the variance of the 300 results is large, with Sn ranging from 83.5% to 81.7%, P from 77.6% to 55.7%, F-measure from 0.805 to 0.663, and AUC from 0.893 to 0.673. These large discrepancies corroborate our speculation of the noise of non-epitopes even if they are experimentally verified and support our means of randomly constructing many negative sub-datasets and reporting the average result instead of the best result. In conclusion, our sequence-based linear B-cell epitope prediction method achieved an average Sn of 81.8 ± 0.8% (95% CI), P of 64.1 ± 0.2% (95% CI), F-measure of 0.719 ± 0.08 (95% CI), and AUC of 0.728 using 10-fold cross-validation.Figure 2

Bottom Line: To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed.We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728.We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information.

View Article: PubMed Central - PubMed

Affiliation: The Key Laboratory of Bioinformatics, Ministry of Education, School of Life Sciences, Tsinghua University, Beijing, 100084, China. lianyao1112@gmail.com.

ABSTRACT

Background: B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task.

Results: In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728.

Conclusions: We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .

Show MeSH
Related in: MedlinePlus