Limits...
Incorporating amino acids composition and functional domains for identifying bacterial toxin proteins.

Su MG, Huang CH, Lee TY, Chen YJ, Wu HY - Biomed Res Int (2014)

Bottom Line: The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively.For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively.After incorporating functional domain information, the predictive performance is further improved.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan.

ABSTRACT
Aside from pathogenesis, bacterial toxins also have been used for medical purpose such as drugs for cancer and immune diseases. Correctly identifying bacterial toxins and their types (endotoxins and exotoxins) has great impact on the cell biology study and therapy development. However, experimental methods for bacterial toxins identification are time-consuming and labor-intensive, implying an urgent need for computational prediction. Thus, we are motivated to develop a method for computational identification of bacterial toxins based on amino acid sequences and functional domain information. In this study, a nonredundant dataset of 167 bacterial toxins including 77 exotoxins and 90 endotoxins is adopted to learn the predictive model by using support vector machines (SVMs). The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively. For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively. After incorporating functional domain information, the predictive performance is further improved. The proposed method has been demonstrated to be able to more effectively identify and classify bacterial toxins than the other two features on independent dataset, which may aid in bacterial biomedical development.

Show MeSH

Related in: MedlinePlus

Percent composition of 20 amino acids between endotoxin and exotoxin.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4109367&req=5

fig3: Percent composition of 20 amino acids between endotoxin and exotoxin.

Mentions: The difference between bacterial toxin and nontoxin proteins was analyzed in terms of its amino acid composition and the result was shown in Figure 2. To determine the differentially presented amino acid, the occurrence of each amino acid except for N which is of the highest frequency (0.03772) was averaged. After adding the standard deviation, 0.004225, to the average (0.00767), 0.011895 was considered to be the threshold. It can be observed that bacterial toxins are significantly distinguishable from nontoxin proteins at the amino acid composition level. For instance, alanine (A, 0.01317), asparagine (N, 0.03772), leucine (L, 0.01475), and tyrosine (Y, 0.01416) residues all exhibit a remarkable difference between bacterial toxin and nontoxin proteins. Asparagine (N) is the most significantly distinguishable among all residues. In order to examine the effectiveness of amino acid composition in identifying baterial toxins, an SVM model was trained using a 20-dimensional vector consisting of the composition scores for twenty amino acids. The amino acid composition-based model was evaluated by means of five-fold cross validation. As shown in Table 1, the model achieved sensitivity of 92.81%, specificity of 99.56%, and accuracy of 97.75%. Amino acid composition comparison between endotoxin and exotoxin was also performed and shown in Figure 3. The occurrence of each amino acid except for most distinguishable residue, K, was used to obtain an average (0.006347). After adjusting the average by sytandard deviation (0.004226), frequency larger than 0.010573 was considered to be differential. Arginine (R, 0.017043), lysine (K, 0.03654), and threonine (T, 0.014236) residues were found to have differential frequency between endotoxin and exotoxin proteins. Similarly, SVM model was trained using a 20-dimensional vector consisting of the composition scores for twenty amino acids and evaluated by means of five-fold cross validation. As shown in Table 2, the model achieved sensitivity of 93%, specificity of 93.93%, and accuracy of 94.02%.


Incorporating amino acids composition and functional domains for identifying bacterial toxin proteins.

Su MG, Huang CH, Lee TY, Chen YJ, Wu HY - Biomed Res Int (2014)

Percent composition of 20 amino acids between endotoxin and exotoxin.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4109367&req=5

fig3: Percent composition of 20 amino acids between endotoxin and exotoxin.
Mentions: The difference between bacterial toxin and nontoxin proteins was analyzed in terms of its amino acid composition and the result was shown in Figure 2. To determine the differentially presented amino acid, the occurrence of each amino acid except for N which is of the highest frequency (0.03772) was averaged. After adding the standard deviation, 0.004225, to the average (0.00767), 0.011895 was considered to be the threshold. It can be observed that bacterial toxins are significantly distinguishable from nontoxin proteins at the amino acid composition level. For instance, alanine (A, 0.01317), asparagine (N, 0.03772), leucine (L, 0.01475), and tyrosine (Y, 0.01416) residues all exhibit a remarkable difference between bacterial toxin and nontoxin proteins. Asparagine (N) is the most significantly distinguishable among all residues. In order to examine the effectiveness of amino acid composition in identifying baterial toxins, an SVM model was trained using a 20-dimensional vector consisting of the composition scores for twenty amino acids. The amino acid composition-based model was evaluated by means of five-fold cross validation. As shown in Table 1, the model achieved sensitivity of 92.81%, specificity of 99.56%, and accuracy of 97.75%. Amino acid composition comparison between endotoxin and exotoxin was also performed and shown in Figure 3. The occurrence of each amino acid except for most distinguishable residue, K, was used to obtain an average (0.006347). After adjusting the average by sytandard deviation (0.004226), frequency larger than 0.010573 was considered to be differential. Arginine (R, 0.017043), lysine (K, 0.03654), and threonine (T, 0.014236) residues were found to have differential frequency between endotoxin and exotoxin proteins. Similarly, SVM model was trained using a 20-dimensional vector consisting of the composition scores for twenty amino acids and evaluated by means of five-fold cross validation. As shown in Table 2, the model achieved sensitivity of 93%, specificity of 93.93%, and accuracy of 94.02%.

Bottom Line: The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively.For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively.After incorporating functional domain information, the predictive performance is further improved.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan.

ABSTRACT
Aside from pathogenesis, bacterial toxins also have been used for medical purpose such as drugs for cancer and immune diseases. Correctly identifying bacterial toxins and their types (endotoxins and exotoxins) has great impact on the cell biology study and therapy development. However, experimental methods for bacterial toxins identification are time-consuming and labor-intensive, implying an urgent need for computational prediction. Thus, we are motivated to develop a method for computational identification of bacterial toxins based on amino acid sequences and functional domain information. In this study, a nonredundant dataset of 167 bacterial toxins including 77 exotoxins and 90 endotoxins is adopted to learn the predictive model by using support vector machines (SVMs). The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively. For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively. After incorporating functional domain information, the predictive performance is further improved. The proposed method has been demonstrated to be able to more effectively identify and classify bacterial toxins than the other two features on independent dataset, which may aid in bacterial biomedical development.

Show MeSH
Related in: MedlinePlus