Limits...
Incorporating amino acids composition and functional domains for identifying bacterial toxin proteins.

Su MG, Huang CH, Lee TY, Chen YJ, Wu HY - Biomed Res Int (2014)

Bottom Line: The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively.For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively.After incorporating functional domain information, the predictive performance is further improved.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan.

ABSTRACT
Aside from pathogenesis, bacterial toxins also have been used for medical purpose such as drugs for cancer and immune diseases. Correctly identifying bacterial toxins and their types (endotoxins and exotoxins) has great impact on the cell biology study and therapy development. However, experimental methods for bacterial toxins identification are time-consuming and labor-intensive, implying an urgent need for computational prediction. Thus, we are motivated to develop a method for computational identification of bacterial toxins based on amino acid sequences and functional domain information. In this study, a nonredundant dataset of 167 bacterial toxins including 77 exotoxins and 90 endotoxins is adopted to learn the predictive model by using support vector machines (SVMs). The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively. For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively. After incorporating functional domain information, the predictive performance is further improved. The proposed method has been demonstrated to be able to more effectively identify and classify bacterial toxins than the other two features on independent dataset, which may aid in bacterial biomedical development.

Show MeSH

Related in: MedlinePlus

Probability difference of 20 × 20 amino acid pairs between bacterial toxin proteins and nontoxin proteins. The amino acid pair with red box indicates an overrepresentation in bacterial toxin proteins; on the other hand, green box means an underrepresentation.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4109367&req=5

fig4: Probability difference of 20 × 20 amino acid pairs between bacterial toxin proteins and nontoxin proteins. The amino acid pair with red box indicates an overrepresentation in bacterial toxin proteins; on the other hand, green box means an underrepresentation.

Mentions: The performance of dipeptide composition-based model for identifying bacterial toxin has sensitivity of 87.42%, specificity of 96.71%, and accuracy of 94.06% (as shown in Table 1). It can be observed that the amino acid composition-based method yields higher accuracy in identifying bacterial toxins. It may be due to the short sequence length of toxins as it is difficult to obtain significant number of dipeptides for small proteins [13]. The amino acid dipeptide composition of bacterial toxins and nontoxin proteins is further analyzed by selecting statistically significant dipeptides among the 400 amino acid pairs. Figure 4 shows the probability difference of 400 amino acid pairs between bacterial toxins and nontoxin proteins. In the 20 × 20 matrix, amino acid pairs marked in red indicates overrepresentation in bacterial toxins, while amino acid pairs marked in green indicates underrepresentation. As illustrated in Figure 4, NN pairs are overrepresented in bacterial toxins as well as N residues paired with I, L, and T. Similarly, the amino acid dipeptide composition-based method also yields lower accuracy in classifying exotoxin and endotoxin as compared to amino acid composition-based methods. The model achieved sensitivity of 92.22%, specificity of 85.71%, and accuracy of 89.22%, as shown in Table 2. Figure 5 portraits the probability difference of 400 amino acid pairs between endotoxin and exotoxin proteins. Amino acid pairs marked in red indicates overrepresentation in endotoxin, while amino acid pairs marked in green indicates overrepresentation in exotoxin. It can be observed that LE and TD pairs are overrepresented in endotoxin, while SK, KK, and NK pairs are overrepresented in exotoxin.


Incorporating amino acids composition and functional domains for identifying bacterial toxin proteins.

Su MG, Huang CH, Lee TY, Chen YJ, Wu HY - Biomed Res Int (2014)

Probability difference of 20 × 20 amino acid pairs between bacterial toxin proteins and nontoxin proteins. The amino acid pair with red box indicates an overrepresentation in bacterial toxin proteins; on the other hand, green box means an underrepresentation.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4109367&req=5

fig4: Probability difference of 20 × 20 amino acid pairs between bacterial toxin proteins and nontoxin proteins. The amino acid pair with red box indicates an overrepresentation in bacterial toxin proteins; on the other hand, green box means an underrepresentation.
Mentions: The performance of dipeptide composition-based model for identifying bacterial toxin has sensitivity of 87.42%, specificity of 96.71%, and accuracy of 94.06% (as shown in Table 1). It can be observed that the amino acid composition-based method yields higher accuracy in identifying bacterial toxins. It may be due to the short sequence length of toxins as it is difficult to obtain significant number of dipeptides for small proteins [13]. The amino acid dipeptide composition of bacterial toxins and nontoxin proteins is further analyzed by selecting statistically significant dipeptides among the 400 amino acid pairs. Figure 4 shows the probability difference of 400 amino acid pairs between bacterial toxins and nontoxin proteins. In the 20 × 20 matrix, amino acid pairs marked in red indicates overrepresentation in bacterial toxins, while amino acid pairs marked in green indicates underrepresentation. As illustrated in Figure 4, NN pairs are overrepresented in bacterial toxins as well as N residues paired with I, L, and T. Similarly, the amino acid dipeptide composition-based method also yields lower accuracy in classifying exotoxin and endotoxin as compared to amino acid composition-based methods. The model achieved sensitivity of 92.22%, specificity of 85.71%, and accuracy of 89.22%, as shown in Table 2. Figure 5 portraits the probability difference of 400 amino acid pairs between endotoxin and exotoxin proteins. Amino acid pairs marked in red indicates overrepresentation in endotoxin, while amino acid pairs marked in green indicates overrepresentation in exotoxin. It can be observed that LE and TD pairs are overrepresented in endotoxin, while SK, KK, and NK pairs are overrepresented in exotoxin.

Bottom Line: The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively.For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively.After incorporating functional domain information, the predictive performance is further improved.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan.

ABSTRACT
Aside from pathogenesis, bacterial toxins also have been used for medical purpose such as drugs for cancer and immune diseases. Correctly identifying bacterial toxins and their types (endotoxins and exotoxins) has great impact on the cell biology study and therapy development. However, experimental methods for bacterial toxins identification are time-consuming and labor-intensive, implying an urgent need for computational prediction. Thus, we are motivated to develop a method for computational identification of bacterial toxins based on amino acid sequences and functional domain information. In this study, a nonredundant dataset of 167 bacterial toxins including 77 exotoxins and 90 endotoxins is adopted to learn the predictive model by using support vector machines (SVMs). The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively. For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively. After incorporating functional domain information, the predictive performance is further improved. The proposed method has been demonstrated to be able to more effectively identify and classify bacterial toxins than the other two features on independent dataset, which may aid in bacterial biomedical development.

Show MeSH
Related in: MedlinePlus