Global Shifts in Genome and Proteome Composition Are Very Tightly Coupled.
Bottom Line: Qualitatively similar results were obtained for 49 fungal genomes, where 80% of the variability in AAC could be explained by the composition of introns and intergenic regions.Moreover, highly expressed genes do not exhibit more prominent environment-related AAC signatures than lowly expressed genes, despite contributing more to the effective proteome.Thus, evolutionary shifts in overall AAC appear to occur almost exclusively through factors shaping the global oligonucleotide content of the genome.
Affiliation: Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia.Show MeSH
Related in: MedlinePlus
Mentions: Previous work demonstrates that AAC can separate thermophilic from mesophilic organisms with very high accuracy (Zeldovich et al. 2007; Smole et al. 2011), a finding replicated by our SVM classifier when considering the area under the receiver operating characteristic (ROC) curve (AUROC) as a measure of classification accuracy (fig. 3C; AUROC = 0.990). The AUROC expresses the probability that, in a randomly drawn thermophile–mesophile pair of microbes, the thermophile will be correctly recognized, with a value of 0.5 indicating random guessing. In contrast to the very high classification accuracy obtained when considering AAC prior to nucleotide normalization, we find that AAC residuals could accomplish the thermophile recognition task with a much lower success (AUROC = 0.738; fig. 3C). This suggests that a substantial component of the thermal AAC signature is grounded in oligonucleotide content, as becomes evident when comparing the distributions of the AAC residuals of thermophiles and mesophiles, alongside the raw AAC of both groups (fig. 3A). We obtain similar results when we try to discriminate halophiles from nonhalophiles (supplementary fig. S4A, Supplementary Material online; AAC AUROC = 0.968, AAC residual AUROC = 0.678), or aerotolerant from obligate anaerobe organisms (fig. 3D; 0.958 vs. 0.715), or similarly for obligately aerobic, host-associated, soil-dwelling, psychrophilic or radioresistant microbes (supplementary fig. S4B–F, Supplementary Material online). Consistently, the environment can be predicted from genomic oligonucleotide frequencies of intergenic DNA nearly as accurately as it can be from the AAC of the proteomes (fig. 3 and supplementary fig. S4, Supplementary Material online). This suggests that the contribution to raw AAC signatures made by variation that exclusively pertains to the amino acid level is often limited, at least for the ecological parameters considered here. Of note, although the classification from AAC residuals was severely compromised in comparison to the actual AAC, the AUROC scores were still significantly above the baseline of 0.5 (P < 0.001 for all environments; fig. 3 and supplementary fig. S4, Supplementary Material online). Therefore, this analysis does not exclude selection on AAC in different environments, but implies that its signal is subtle when compared against the backdrop of the AAC changes dependent on oligonucleotide composition.Fig. 3.—
Affiliation: Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia.