Limits...
Global Shifts in Genome and Proteome Composition Are Very Tightly Coupled.

Brbić M, Warnecke T, Kriško A, Supek F - Genome Biol Evol (2015)

Bottom Line: We disentangle these effects by systematically evaluating the correspondence between intergenic nucleotide composition, where protein-level selection is absent, the AAC, and ecological parameters of 909 prokaryotes.Moreover, highly expressed genes do not exhibit more prominent environment-related AAC signatures than lowly expressed genes, despite contributing more to the effective proteome.We discuss these results in light of contravening evidence from biophysical data and further reading frame-specific analyses that suggest that adaptation takes place at the protein level.

View Article: PubMed Central - PubMed

Affiliation: Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia.

Show MeSH

Related in: MedlinePlus

Accuracy in classifying prokaryotes by environmental preference from the AAC of proteomes and from oligonucleotide frequencies in noncoding DNA. (A, B) Distributions of AACs (given as relative frequencies of each amino acid) across proteomes, as well as the residuals of the amino acid composition in SVM regression. Asterisks are Mann–Whitney tests (two-tailed) applied to distributions of residuals. *FDR < 25%; **FDR < 10%; ***FDR < 1%. ROC curves for discriminating thermophiles from mesophiles (C) and strict anaerobes from aerotolerant organisms (D). Orange curves show predictions from AAC in proteomes, green curves from noncoding DNA (G + C content, di- and trinucleotide frequencies) and phylogenetic descriptors (clade memberships), and blue curves from AAC after a normalization for oligonucleotide frequencies in noncoding DNA and for phylogenetic relatedness (residuals from regression of AAC on these features). AUROC scores are given in plot legends, where 1.0 indicates perfect performance, and 0.5 random guessing (shown as the diagonal line). Predictions in the ROC curves are from an SVM classifier, in 10-fold cross-validation. TPR, true positive rate; FPR, false positive rate. More environments shown in supplementary figure S4, Supplementary Material online.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4494046&req=5

evv088-F3: Accuracy in classifying prokaryotes by environmental preference from the AAC of proteomes and from oligonucleotide frequencies in noncoding DNA. (A, B) Distributions of AACs (given as relative frequencies of each amino acid) across proteomes, as well as the residuals of the amino acid composition in SVM regression. Asterisks are Mann–Whitney tests (two-tailed) applied to distributions of residuals. *FDR < 25%; **FDR < 10%; ***FDR < 1%. ROC curves for discriminating thermophiles from mesophiles (C) and strict anaerobes from aerotolerant organisms (D). Orange curves show predictions from AAC in proteomes, green curves from noncoding DNA (G + C content, di- and trinucleotide frequencies) and phylogenetic descriptors (clade memberships), and blue curves from AAC after a normalization for oligonucleotide frequencies in noncoding DNA and for phylogenetic relatedness (residuals from regression of AAC on these features). AUROC scores are given in plot legends, where 1.0 indicates perfect performance, and 0.5 random guessing (shown as the diagonal line). Predictions in the ROC curves are from an SVM classifier, in 10-fold cross-validation. TPR, true positive rate; FPR, false positive rate. More environments shown in supplementary figure S4, Supplementary Material online.

Mentions: Previous work demonstrates that AAC can separate thermophilic from mesophilic organisms with very high accuracy (Zeldovich et al. 2007; Smole et al. 2011), a finding replicated by our SVM classifier when considering the area under the receiver operating characteristic (ROC) curve (AUROC) as a measure of classification accuracy (fig. 3C; AUROC = 0.990). The AUROC expresses the probability that, in a randomly drawn thermophile–mesophile pair of microbes, the thermophile will be correctly recognized, with a value of 0.5 indicating random guessing. In contrast to the very high classification accuracy obtained when considering AAC prior to nucleotide normalization, we find that AAC residuals could accomplish the thermophile recognition task with a much lower success (AUROC = 0.738; fig. 3C). This suggests that a substantial component of the thermal AAC signature is grounded in oligonucleotide content, as becomes evident when comparing the distributions of the AAC residuals of thermophiles and mesophiles, alongside the raw AAC of both groups (fig. 3A). We obtain similar results when we try to discriminate halophiles from nonhalophiles (supplementary fig. S4A, Supplementary Material online; AAC AUROC = 0.968, AAC residual AUROC = 0.678), or aerotolerant from obligate anaerobe organisms (fig. 3D; 0.958 vs. 0.715), or similarly for obligately aerobic, host-associated, soil-dwelling, psychrophilic or radioresistant microbes (supplementary fig. S4B–F, Supplementary Material online). Consistently, the environment can be predicted from genomic oligonucleotide frequencies of intergenic DNA nearly as accurately as it can be from the AAC of the proteomes (fig. 3 and supplementary fig. S4, Supplementary Material online). This suggests that the contribution to raw AAC signatures made by variation that exclusively pertains to the amino acid level is often limited, at least for the ecological parameters considered here. Of note, although the classification from AAC residuals was severely compromised in comparison to the actual AAC, the AUROC scores were still significantly above the baseline of 0.5 (P < 0.001 for all environments; fig. 3 and supplementary fig. S4, Supplementary Material online). Therefore, this analysis does not exclude selection on AAC in different environments, but implies that its signal is subtle when compared against the backdrop of the AAC changes dependent on oligonucleotide composition.Fig. 3.—


Global Shifts in Genome and Proteome Composition Are Very Tightly Coupled.

Brbić M, Warnecke T, Kriško A, Supek F - Genome Biol Evol (2015)

Accuracy in classifying prokaryotes by environmental preference from the AAC of proteomes and from oligonucleotide frequencies in noncoding DNA. (A, B) Distributions of AACs (given as relative frequencies of each amino acid) across proteomes, as well as the residuals of the amino acid composition in SVM regression. Asterisks are Mann–Whitney tests (two-tailed) applied to distributions of residuals. *FDR < 25%; **FDR < 10%; ***FDR < 1%. ROC curves for discriminating thermophiles from mesophiles (C) and strict anaerobes from aerotolerant organisms (D). Orange curves show predictions from AAC in proteomes, green curves from noncoding DNA (G + C content, di- and trinucleotide frequencies) and phylogenetic descriptors (clade memberships), and blue curves from AAC after a normalization for oligonucleotide frequencies in noncoding DNA and for phylogenetic relatedness (residuals from regression of AAC on these features). AUROC scores are given in plot legends, where 1.0 indicates perfect performance, and 0.5 random guessing (shown as the diagonal line). Predictions in the ROC curves are from an SVM classifier, in 10-fold cross-validation. TPR, true positive rate; FPR, false positive rate. More environments shown in supplementary figure S4, Supplementary Material online.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4494046&req=5

evv088-F3: Accuracy in classifying prokaryotes by environmental preference from the AAC of proteomes and from oligonucleotide frequencies in noncoding DNA. (A, B) Distributions of AACs (given as relative frequencies of each amino acid) across proteomes, as well as the residuals of the amino acid composition in SVM regression. Asterisks are Mann–Whitney tests (two-tailed) applied to distributions of residuals. *FDR < 25%; **FDR < 10%; ***FDR < 1%. ROC curves for discriminating thermophiles from mesophiles (C) and strict anaerobes from aerotolerant organisms (D). Orange curves show predictions from AAC in proteomes, green curves from noncoding DNA (G + C content, di- and trinucleotide frequencies) and phylogenetic descriptors (clade memberships), and blue curves from AAC after a normalization for oligonucleotide frequencies in noncoding DNA and for phylogenetic relatedness (residuals from regression of AAC on these features). AUROC scores are given in plot legends, where 1.0 indicates perfect performance, and 0.5 random guessing (shown as the diagonal line). Predictions in the ROC curves are from an SVM classifier, in 10-fold cross-validation. TPR, true positive rate; FPR, false positive rate. More environments shown in supplementary figure S4, Supplementary Material online.
Mentions: Previous work demonstrates that AAC can separate thermophilic from mesophilic organisms with very high accuracy (Zeldovich et al. 2007; Smole et al. 2011), a finding replicated by our SVM classifier when considering the area under the receiver operating characteristic (ROC) curve (AUROC) as a measure of classification accuracy (fig. 3C; AUROC = 0.990). The AUROC expresses the probability that, in a randomly drawn thermophile–mesophile pair of microbes, the thermophile will be correctly recognized, with a value of 0.5 indicating random guessing. In contrast to the very high classification accuracy obtained when considering AAC prior to nucleotide normalization, we find that AAC residuals could accomplish the thermophile recognition task with a much lower success (AUROC = 0.738; fig. 3C). This suggests that a substantial component of the thermal AAC signature is grounded in oligonucleotide content, as becomes evident when comparing the distributions of the AAC residuals of thermophiles and mesophiles, alongside the raw AAC of both groups (fig. 3A). We obtain similar results when we try to discriminate halophiles from nonhalophiles (supplementary fig. S4A, Supplementary Material online; AAC AUROC = 0.968, AAC residual AUROC = 0.678), or aerotolerant from obligate anaerobe organisms (fig. 3D; 0.958 vs. 0.715), or similarly for obligately aerobic, host-associated, soil-dwelling, psychrophilic or radioresistant microbes (supplementary fig. S4B–F, Supplementary Material online). Consistently, the environment can be predicted from genomic oligonucleotide frequencies of intergenic DNA nearly as accurately as it can be from the AAC of the proteomes (fig. 3 and supplementary fig. S4, Supplementary Material online). This suggests that the contribution to raw AAC signatures made by variation that exclusively pertains to the amino acid level is often limited, at least for the ecological parameters considered here. Of note, although the classification from AAC residuals was severely compromised in comparison to the actual AAC, the AUROC scores were still significantly above the baseline of 0.5 (P < 0.001 for all environments; fig. 3 and supplementary fig. S4, Supplementary Material online). Therefore, this analysis does not exclude selection on AAC in different environments, but implies that its signal is subtle when compared against the backdrop of the AAC changes dependent on oligonucleotide composition.Fig. 3.—

Bottom Line: We disentangle these effects by systematically evaluating the correspondence between intergenic nucleotide composition, where protein-level selection is absent, the AAC, and ecological parameters of 909 prokaryotes.Moreover, highly expressed genes do not exhibit more prominent environment-related AAC signatures than lowly expressed genes, despite contributing more to the effective proteome.We discuss these results in light of contravening evidence from biophysical data and further reading frame-specific analyses that suggest that adaptation takes place at the protein level.

View Article: PubMed Central - PubMed

Affiliation: Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia.

Show MeSH
Related in: MedlinePlus