Limits...
Proteome sequence features carry signatures of the environmental niche of prokaryotes.

Smole Z, Nikolic N, Supek F, Šmuc T, Sbalzarini IF, Krisko A - BMC Evol. Biol. (2011)

Bottom Line: Further, on the same set of features we applied two fundamentally different machine learning methods, Support Vector Machines and Random Forests, to successfully classify between bacteria and archaea, halophiles and non-halophiles, as well as mesophiles, thermophiles and mesothermophiles.To our knowledge, this is the first time that three different classification cases (domain of life, halophilicity and thermophilicity) of proteome adaptation are successfully performed with the same set of 42 features.The characteristic features of a specific adaptation constitute a signature that may help understanding the mechanisms of adaptation to extreme environments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Cell Biology, ETH Zuerich, Schafmattstrase 18, 8093 Zuerich, Switzerland.

ABSTRACT

Background: Prokaryotic environmental adaptations occur at different levels within cells to ensure the preservation of genome integrity, proper protein folding and function as well as membrane fluidity. Although specific composition and structure of cellular components suitable for the variety of extreme conditions has already been postulated, a systematic study describing such adaptations has not yet been performed. We therefore explored whether the environmental niche of a prokaryote could be deduced from the sequence of its proteome. Finally, we aimed at finding the precise differences between proteome sequences of prokaryotes from different environments.

Results: We analyzed the proteomes of 192 prokaryotes from different habitats. We collected detailed information about the optimal growth conditions of each microorganism. Furthermore, we selected 42 physico-chemical properties of amino acids and computed their values for each proteome. Further, on the same set of features we applied two fundamentally different machine learning methods, Support Vector Machines and Random Forests, to successfully classify between bacteria and archaea, halophiles and non-halophiles, as well as mesophiles, thermophiles and mesothermophiles. Finally, we performed feature selection by using Random Forests.

Conclusions: To our knowledge, this is the first time that three different classification cases (domain of life, halophilicity and thermophilicity) of proteome adaptation are successfully performed with the same set of 42 features. The characteristic features of a specific adaptation constitute a signature that may help understanding the mechanisms of adaptation to extreme environments.

Show MeSH

Related in: MedlinePlus

Three unique features used for classifications regarding halophilicity revealed by the feature selection algorithm of RF. Pairs of box-and-whisker plots are shown for each feature: positive charge, normalized frequency of beta turn, and Phe content. Box-and-whisker plots represent non-halophiles and halophiles from top to bottom. The feature values are normalized from 0 to 1 from left to right. (+) signs represent outliers.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3045906&req=5

Figure 2: Three unique features used for classifications regarding halophilicity revealed by the feature selection algorithm of RF. Pairs of box-and-whisker plots are shown for each feature: positive charge, normalized frequency of beta turn, and Phe content. Box-and-whisker plots represent non-halophiles and halophiles from top to bottom. The feature values are normalized from 0 to 1 from left to right. (+) signs represent outliers.

Mentions: Among the dominant features that distinguish halophiles from non-halophiles were the frequency of acidic amino acid residues, and the proteome charge. Among the features unique to the classification according to halophilicity (Figure 2) is a decreased content of Phe residues which is a property of halophilic proteomes. Moreover, features such as positive charge and the normalized frequency of beta turn also appeared with high importance with a wider distribution of the feature in halophiles. Among other features that contributes to this classification (Additional file 4), halophiles seem to have almost 2 times more acidic amino acids (especially Glu) than non-halophiles and, as a consequence higher polarity, and higher proteome charge. Furthermore, the Asp composition is increased in halophiles, which is in accord with a general increase in polarity of halophilic proteomes.


Proteome sequence features carry signatures of the environmental niche of prokaryotes.

Smole Z, Nikolic N, Supek F, Šmuc T, Sbalzarini IF, Krisko A - BMC Evol. Biol. (2011)

Three unique features used for classifications regarding halophilicity revealed by the feature selection algorithm of RF. Pairs of box-and-whisker plots are shown for each feature: positive charge, normalized frequency of beta turn, and Phe content. Box-and-whisker plots represent non-halophiles and halophiles from top to bottom. The feature values are normalized from 0 to 1 from left to right. (+) signs represent outliers.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3045906&req=5

Figure 2: Three unique features used for classifications regarding halophilicity revealed by the feature selection algorithm of RF. Pairs of box-and-whisker plots are shown for each feature: positive charge, normalized frequency of beta turn, and Phe content. Box-and-whisker plots represent non-halophiles and halophiles from top to bottom. The feature values are normalized from 0 to 1 from left to right. (+) signs represent outliers.
Mentions: Among the dominant features that distinguish halophiles from non-halophiles were the frequency of acidic amino acid residues, and the proteome charge. Among the features unique to the classification according to halophilicity (Figure 2) is a decreased content of Phe residues which is a property of halophilic proteomes. Moreover, features such as positive charge and the normalized frequency of beta turn also appeared with high importance with a wider distribution of the feature in halophiles. Among other features that contributes to this classification (Additional file 4), halophiles seem to have almost 2 times more acidic amino acids (especially Glu) than non-halophiles and, as a consequence higher polarity, and higher proteome charge. Furthermore, the Asp composition is increased in halophiles, which is in accord with a general increase in polarity of halophilic proteomes.

Bottom Line: Further, on the same set of features we applied two fundamentally different machine learning methods, Support Vector Machines and Random Forests, to successfully classify between bacteria and archaea, halophiles and non-halophiles, as well as mesophiles, thermophiles and mesothermophiles.To our knowledge, this is the first time that three different classification cases (domain of life, halophilicity and thermophilicity) of proteome adaptation are successfully performed with the same set of 42 features.The characteristic features of a specific adaptation constitute a signature that may help understanding the mechanisms of adaptation to extreme environments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Cell Biology, ETH Zuerich, Schafmattstrase 18, 8093 Zuerich, Switzerland.

ABSTRACT

Background: Prokaryotic environmental adaptations occur at different levels within cells to ensure the preservation of genome integrity, proper protein folding and function as well as membrane fluidity. Although specific composition and structure of cellular components suitable for the variety of extreme conditions has already been postulated, a systematic study describing such adaptations has not yet been performed. We therefore explored whether the environmental niche of a prokaryote could be deduced from the sequence of its proteome. Finally, we aimed at finding the precise differences between proteome sequences of prokaryotes from different environments.

Results: We analyzed the proteomes of 192 prokaryotes from different habitats. We collected detailed information about the optimal growth conditions of each microorganism. Furthermore, we selected 42 physico-chemical properties of amino acids and computed their values for each proteome. Further, on the same set of features we applied two fundamentally different machine learning methods, Support Vector Machines and Random Forests, to successfully classify between bacteria and archaea, halophiles and non-halophiles, as well as mesophiles, thermophiles and mesothermophiles. Finally, we performed feature selection by using Random Forests.

Conclusions: To our knowledge, this is the first time that three different classification cases (domain of life, halophilicity and thermophilicity) of proteome adaptation are successfully performed with the same set of 42 features. The characteristic features of a specific adaptation constitute a signature that may help understanding the mechanisms of adaptation to extreme environments.

Show MeSH
Related in: MedlinePlus