Global Shifts in Genome and Proteome Composition Are Very Tightly Coupled.
Bottom Line: Qualitatively similar results were obtained for 49 fungal genomes, where 80% of the variability in AAC could be explained by the composition of introns and intergenic regions.Moreover, highly expressed genes do not exhibit more prominent environment-related AAC signatures than lowly expressed genes, despite contributing more to the effective proteome.Thus, evolutionary shifts in overall AAC appear to occur almost exclusively through factors shaping the global oligonucleotide content of the genome.
Affiliation: Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia.Show MeSH
Mentions: Thus far, we have shown that intergenic oligonucleotide composition is an excellent predictor of AAC and that controlling for nucleotide composition leads to a substantial drop-off in classifier performance. Intuitively, this might imply that a given ecological signal primarily emanates from the nucleotide level and that the AAC is, to a greater or lesser extent, an epiphenomenon that passively tracks nucleotide composition. To further consider the relative contributions of nucleotide versus amino acid level selection, we considered the predictive capacity of the AAC in light of gene expression levels. Selection at the amino acid level should be stronger in highly expressed genes, increasing its relative contribution to the composite AAC signature that reflects both nucleotide and amino acid level processes. Consequently, AAC should be harder to predict from intergenic DNA for highly expressed genes compared with lowly expressed genes. Expression levels of proteins in conditions favorable to growth can be approximated from codon biases in protein-coding genes (Ikemura 1985). To this end, we use previous data for 911 prokaryotic genomes (Krisko et al. 2014), where a statistical test was used to assign a binary high/low expression label to genes (Supek et al. 2010). Using highly and lowly expressed genes separately to predict AAC from oligonucleotide composition, we find no significant difference in prediction accuracy (fig. 5A; mean difference of root-mean-square error [RMSE] over 20 amino acids = 0.002%, 95% CI: [ − 0.016%, 0.020%]). This suggests that higher expression does not lead to a greater preponderance of amino acid-related signatures in the AAC signal. We explicitly test this by examining the predictive power of AAC residuals derived from highly expressed genes for the organismal ecology and find that they are, overall, as poorly predictive as residuals derived from the remainder of the proteome, in contrast to the original AAC (fig. 5B). When examining individual environments separately, we again find no significant differences between the highly expressed genes and the rest of the proteome (at FDR < 10%; supplementary fig. S5, Supplementary Material online). This analysis is not affected by the phylogenetic relatedness of the points (organisms) in our regression data (supplementary fig. S10, Supplementary Material online).Fig. 5.—
Affiliation: Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia.