Limits...
Protein attributes contribute to halo-stability, bioinformatics approach.

Ebrahimie E, Ebrahimi M, Sarvestani NR, Ebrahimi M - Saline Syst. (2011)

Bottom Line: No changes were found in the numbers of groups when K-Means and TwoStep clustering modeling were performed on datasets with or without feature selection filtering.We did not find any significant difference in the percent of correctness, performance evaluation, and mean correctness of various decision tree models with or without feature selection.For the first time, we analyzed the performance of different screening, clustering, and decision tree algorithms for discriminating halophilic and non-halophilic proteins and the results showed that amino acid composition can be used to discriminate between halo-tolerant and halo-sensitive proteins.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Group, Green Research Center, Qom University, Qom, Iran. mebrahimi14@yahoo.com.

ABSTRACT
Halophile proteins can tolerate high salt concentrations. Understanding halophilicity features is the first step toward engineering halostable crops. To this end, we examined protein features contributing to the halo-toleration of halophilic organisms. We compared more than 850 features for halophilic and non-halophilic proteins with various screening, clustering, decision tree, and generalized rule induction models to search for patterns that code for halo-toleration. Up to 251 protein attributes selected by various attribute weighting algorithms as important features contribute to halo-stability; from them 14 attributes selected by 90% of models and the count of hydrogen gained the highest value (1.0) in 70% of attribute weighting models, showing the importance of this attribute in feature selection modeling. The other attributes mostly were the frequencies of di-peptides. No changes were found in the numbers of groups when K-Means and TwoStep clustering modeling were performed on datasets with or without feature selection filtering. Although the depths of induced trees were not high, the accuracies of trees were higher than 94% and the frequency of hydrophobic residues pointed as the most important feature to build trees. The performance evaluation of decision tree models had the same values and the best correctness percentage recorded with the Exhaustive CHAID and CHAID models. We did not find any significant difference in the percent of correctness, performance evaluation, and mean correctness of various decision tree models with or without feature selection. For the first time, we analyzed the performance of different screening, clustering, and decision tree algorithms for discriminating halophilic and non-halophilic proteins and the results showed that amino acid composition can be used to discriminate between halo-tolerant and halo-sensitive proteins.

No MeSH data available.


A decision tree generated by the CHAID modeling method without feature selection filtering comparing halo-tolerant (T) with the halo-sensitive (S) proteins.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117752&req=5

Figure 1: A decision tree generated by the CHAID modeling method without feature selection filtering comparing halo-tolerant (T) with the halo-sensitive (S) proteins.

Mentions: When the CHAID model was applied to the data with or without feature selection, a tree with a depth of 3 was generated. The frequency of oxygen was the main attribute to build the tree branches. If the value for this feature was equal to 0, the protein originated from halo-tolerant (T) group. If the same value was higher than 0 and equal to or less than 0.095 and the frequency of Gln - Leu was equal to 0, the protein fell into T group, otherwise to S group. When the frequency of oxygen was higher than 0.095 and the count of Asp - Lys was higher than 1 and the count of Asp - Lys was equal to or less than 2, the proteins originated from S group, otherwise from T group. The same trees with the same features and values were generated when exhaustive CHAID model applied to datasets without feature selection filtering. When feature selection filtering applied on dataset, again a tree with a depth of 3 generated and the frequency of oxygen with the same values mentioned before used to create tree branches. In addition to the frequency of Gln - Leu, aliphatic index (value of 87.570) and the frequency of Cys - Cys (with turning point of 0) were used to create tree sub-branches. Nearly the same results obtained with exhaustive CHAID model applied on dataset with feature selection filtering (Figure 1).


Protein attributes contribute to halo-stability, bioinformatics approach.

Ebrahimie E, Ebrahimi M, Sarvestani NR, Ebrahimi M - Saline Syst. (2011)

A decision tree generated by the CHAID modeling method without feature selection filtering comparing halo-tolerant (T) with the halo-sensitive (S) proteins.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117752&req=5

Figure 1: A decision tree generated by the CHAID modeling method without feature selection filtering comparing halo-tolerant (T) with the halo-sensitive (S) proteins.
Mentions: When the CHAID model was applied to the data with or without feature selection, a tree with a depth of 3 was generated. The frequency of oxygen was the main attribute to build the tree branches. If the value for this feature was equal to 0, the protein originated from halo-tolerant (T) group. If the same value was higher than 0 and equal to or less than 0.095 and the frequency of Gln - Leu was equal to 0, the protein fell into T group, otherwise to S group. When the frequency of oxygen was higher than 0.095 and the count of Asp - Lys was higher than 1 and the count of Asp - Lys was equal to or less than 2, the proteins originated from S group, otherwise from T group. The same trees with the same features and values were generated when exhaustive CHAID model applied to datasets without feature selection filtering. When feature selection filtering applied on dataset, again a tree with a depth of 3 generated and the frequency of oxygen with the same values mentioned before used to create tree branches. In addition to the frequency of Gln - Leu, aliphatic index (value of 87.570) and the frequency of Cys - Cys (with turning point of 0) were used to create tree sub-branches. Nearly the same results obtained with exhaustive CHAID model applied on dataset with feature selection filtering (Figure 1).

Bottom Line: No changes were found in the numbers of groups when K-Means and TwoStep clustering modeling were performed on datasets with or without feature selection filtering.We did not find any significant difference in the percent of correctness, performance evaluation, and mean correctness of various decision tree models with or without feature selection.For the first time, we analyzed the performance of different screening, clustering, and decision tree algorithms for discriminating halophilic and non-halophilic proteins and the results showed that amino acid composition can be used to discriminate between halo-tolerant and halo-sensitive proteins.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Group, Green Research Center, Qom University, Qom, Iran. mebrahimi14@yahoo.com.

ABSTRACT
Halophile proteins can tolerate high salt concentrations. Understanding halophilicity features is the first step toward engineering halostable crops. To this end, we examined protein features contributing to the halo-toleration of halophilic organisms. We compared more than 850 features for halophilic and non-halophilic proteins with various screening, clustering, decision tree, and generalized rule induction models to search for patterns that code for halo-toleration. Up to 251 protein attributes selected by various attribute weighting algorithms as important features contribute to halo-stability; from them 14 attributes selected by 90% of models and the count of hydrogen gained the highest value (1.0) in 70% of attribute weighting models, showing the importance of this attribute in feature selection modeling. The other attributes mostly were the frequencies of di-peptides. No changes were found in the numbers of groups when K-Means and TwoStep clustering modeling were performed on datasets with or without feature selection filtering. Although the depths of induced trees were not high, the accuracies of trees were higher than 94% and the frequency of hydrophobic residues pointed as the most important feature to build trees. The performance evaluation of decision tree models had the same values and the best correctness percentage recorded with the Exhaustive CHAID and CHAID models. We did not find any significant difference in the percent of correctness, performance evaluation, and mean correctness of various decision tree models with or without feature selection. For the first time, we analyzed the performance of different screening, clustering, and decision tree algorithms for discriminating halophilic and non-halophilic proteins and the results showed that amino acid composition can be used to discriminate between halo-tolerant and halo-sensitive proteins.

No MeSH data available.