Limits...
Prediction of high-risk types of human papillomaviruses using statistical model of protein "sequence space".

Wang C, Hai Y, Liu X, Liu N, Yao Y, He P, Dai Q - Comput Math Methods Med (2015)

Bottom Line: Recently, several computational methods have been proposed based on protein sequence-based and structure-based information, but the information of their related proteins has not been used until now.The proposed method was tested on 68 samples with known HPV types and 4 samples without HPV types and further compared with the available approaches.The results show that the proposed method achieved the best performance among all the evaluated methods with accuracy 95.59% and F1-score 90.91%, which indicates that protein "sequence space" could potentially be used to improve prediction of high-risk types of HPVs.

View Article: PubMed Central - PubMed

Affiliation: College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China.

ABSTRACT
Discrimination of high-risk types of human papillomaviruses plays an important role in the diagnosis and remedy of cervical cancer. Recently, several computational methods have been proposed based on protein sequence-based and structure-based information, but the information of their related proteins has not been used until now. In this paper, we proposed using protein "sequence space" to explore this information and used it to predict high-risk types of HPVs. The proposed method was tested on 68 samples with known HPV types and 4 samples without HPV types and further compared with the available approaches. The results show that the proposed method achieved the best performance among all the evaluated methods with accuracy 95.59% and F1-score 90.91%, which indicates that protein "sequence space" could potentially be used to improve prediction of high-risk types of HPVs.

Show MeSH

Related in: MedlinePlus

Comparison of prediction accuracy of each class, overall accuracy, and F1-score of all the early and late proteins. The mutation matrices in X-coordinate are BLOSUM 40, BLOSUM 45, BLOSUM 62, BLOSUM 80, BLOSUM 100, PAM 40, PAM 80, PAM 120, PAM 200, and PAM 250.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4418008&req=5

fig1: Comparison of prediction accuracy of each class, overall accuracy, and F1-score of all the early and late proteins. The mutation matrices in X-coordinate are BLOSUM 40, BLOSUM 45, BLOSUM 62, BLOSUM 80, BLOSUM 100, PAM 40, PAM 80, PAM 120, PAM 200, and PAM 250.

Mentions: The HPV genome encodes a number of early (E1, E2, E4, E5, and E6) and late (L1 and L2) proteins [3, 5]. Several methods classified the high-risk and low-risk HPVs using the information from protein sequences, secondary structure, and pseudo amino acid composition [23–28]. But most of them used E6, E7, or L1 proteins. In this study, we constructed seven protein datasets of E1, E2, E4, E6, E7, L1, and L2 and compared their performance in HPV type prediction. The proteins of E5 were not included because their lengths are too small. The accuracy of each class, overall accuracy, and F1-score of all the early and late proteins were summarized in Figure 1.


Prediction of high-risk types of human papillomaviruses using statistical model of protein "sequence space".

Wang C, Hai Y, Liu X, Liu N, Yao Y, He P, Dai Q - Comput Math Methods Med (2015)

Comparison of prediction accuracy of each class, overall accuracy, and F1-score of all the early and late proteins. The mutation matrices in X-coordinate are BLOSUM 40, BLOSUM 45, BLOSUM 62, BLOSUM 80, BLOSUM 100, PAM 40, PAM 80, PAM 120, PAM 200, and PAM 250.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4418008&req=5

fig1: Comparison of prediction accuracy of each class, overall accuracy, and F1-score of all the early and late proteins. The mutation matrices in X-coordinate are BLOSUM 40, BLOSUM 45, BLOSUM 62, BLOSUM 80, BLOSUM 100, PAM 40, PAM 80, PAM 120, PAM 200, and PAM 250.
Mentions: The HPV genome encodes a number of early (E1, E2, E4, E5, and E6) and late (L1 and L2) proteins [3, 5]. Several methods classified the high-risk and low-risk HPVs using the information from protein sequences, secondary structure, and pseudo amino acid composition [23–28]. But most of them used E6, E7, or L1 proteins. In this study, we constructed seven protein datasets of E1, E2, E4, E6, E7, L1, and L2 and compared their performance in HPV type prediction. The proteins of E5 were not included because their lengths are too small. The accuracy of each class, overall accuracy, and F1-score of all the early and late proteins were summarized in Figure 1.

Bottom Line: Recently, several computational methods have been proposed based on protein sequence-based and structure-based information, but the information of their related proteins has not been used until now.The proposed method was tested on 68 samples with known HPV types and 4 samples without HPV types and further compared with the available approaches.The results show that the proposed method achieved the best performance among all the evaluated methods with accuracy 95.59% and F1-score 90.91%, which indicates that protein "sequence space" could potentially be used to improve prediction of high-risk types of HPVs.

View Article: PubMed Central - PubMed

Affiliation: College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China.

ABSTRACT
Discrimination of high-risk types of human papillomaviruses plays an important role in the diagnosis and remedy of cervical cancer. Recently, several computational methods have been proposed based on protein sequence-based and structure-based information, but the information of their related proteins has not been used until now. In this paper, we proposed using protein "sequence space" to explore this information and used it to predict high-risk types of HPVs. The proposed method was tested on 68 samples with known HPV types and 4 samples without HPV types and further compared with the available approaches. The results show that the proposed method achieved the best performance among all the evaluated methods with accuracy 95.59% and F1-score 90.91%, which indicates that protein "sequence space" could potentially be used to improve prediction of high-risk types of HPVs.

Show MeSH
Related in: MedlinePlus