Limits...
Global Shifts in Genome and Proteome Composition Are Very Tightly Coupled.

Brbić M, Warnecke T, Kriško A, Supek F - Genome Biol Evol (2015)

Bottom Line: Qualitatively similar results were obtained for 49 fungal genomes, where 80% of the variability in AAC could be explained by the composition of introns and intergenic regions.Moreover, highly expressed genes do not exhibit more prominent environment-related AAC signatures than lowly expressed genes, despite contributing more to the effective proteome.Thus, evolutionary shifts in overall AAC appear to occur almost exclusively through factors shaping the global oligonucleotide content of the genome.

View Article: PubMed Central - PubMed

Affiliation: Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia.

Show MeSH
Lack of a particular environment-associated signal in the AAC of highly expressed proteins. (A) The RMSEs in predicting the frequencies of each amino acid from the composition of noncoding DNA (G + C, di- and trinucleotide content) and phylogenetic relatedness (clade membership) of organisms. RMSEs are compared for lowly versus highly expressed genes across all organisms. (B) Binned and pooled ROC curves for classifying the organisms by various environmental preferences from AAC, after having factored out the composition of noncoding DNA and phylogeny. ROC curves shown separately for classification only from highly expressed or only from lowly expressed genes. Full ROC curves for individual environments shown in supplementary figure S5, Supplementary Material online. Average and 95% CI of AUROC scores inlaid on plots.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4494046&req=5

evv088-F5: Lack of a particular environment-associated signal in the AAC of highly expressed proteins. (A) The RMSEs in predicting the frequencies of each amino acid from the composition of noncoding DNA (G + C, di- and trinucleotide content) and phylogenetic relatedness (clade membership) of organisms. RMSEs are compared for lowly versus highly expressed genes across all organisms. (B) Binned and pooled ROC curves for classifying the organisms by various environmental preferences from AAC, after having factored out the composition of noncoding DNA and phylogeny. ROC curves shown separately for classification only from highly expressed or only from lowly expressed genes. Full ROC curves for individual environments shown in supplementary figure S5, Supplementary Material online. Average and 95% CI of AUROC scores inlaid on plots.

Mentions: Thus far, we have shown that intergenic oligonucleotide composition is an excellent predictor of AAC and that controlling for nucleotide composition leads to a substantial drop-off in classifier performance. Intuitively, this might imply that a given ecological signal primarily emanates from the nucleotide level and that the AAC is, to a greater or lesser extent, an epiphenomenon that passively tracks nucleotide composition. To further consider the relative contributions of nucleotide versus amino acid level selection, we considered the predictive capacity of the AAC in light of gene expression levels. Selection at the amino acid level should be stronger in highly expressed genes, increasing its relative contribution to the composite AAC signature that reflects both nucleotide and amino acid level processes. Consequently, AAC should be harder to predict from intergenic DNA for highly expressed genes compared with lowly expressed genes. Expression levels of proteins in conditions favorable to growth can be approximated from codon biases in protein-coding genes (Ikemura 1985). To this end, we use previous data for 911 prokaryotic genomes (Krisko et al. 2014), where a statistical test was used to assign a binary high/low expression label to genes (Supek et al. 2010). Using highly and lowly expressed genes separately to predict AAC from oligonucleotide composition, we find no significant difference in prediction accuracy (fig. 5A; mean difference of root-mean-square error [RMSE] over 20 amino acids = 0.002%, 95% CI: [ − 0.016%, 0.020%]). This suggests that higher expression does not lead to a greater preponderance of amino acid-related signatures in the AAC signal. We explicitly test this by examining the predictive power of AAC residuals derived from highly expressed genes for the organismal ecology and find that they are, overall, as poorly predictive as residuals derived from the remainder of the proteome, in contrast to the original AAC (fig. 5B). When examining individual environments separately, we again find no significant differences between the highly expressed genes and the rest of the proteome (at FDR < 10%; supplementary fig. S5, Supplementary Material online). This analysis is not affected by the phylogenetic relatedness of the points (organisms) in our regression data (supplementary fig. S10, Supplementary Material online).Fig. 5.—


Global Shifts in Genome and Proteome Composition Are Very Tightly Coupled.

Brbić M, Warnecke T, Kriško A, Supek F - Genome Biol Evol (2015)

Lack of a particular environment-associated signal in the AAC of highly expressed proteins. (A) The RMSEs in predicting the frequencies of each amino acid from the composition of noncoding DNA (G + C, di- and trinucleotide content) and phylogenetic relatedness (clade membership) of organisms. RMSEs are compared for lowly versus highly expressed genes across all organisms. (B) Binned and pooled ROC curves for classifying the organisms by various environmental preferences from AAC, after having factored out the composition of noncoding DNA and phylogeny. ROC curves shown separately for classification only from highly expressed or only from lowly expressed genes. Full ROC curves for individual environments shown in supplementary figure S5, Supplementary Material online. Average and 95% CI of AUROC scores inlaid on plots.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4494046&req=5

evv088-F5: Lack of a particular environment-associated signal in the AAC of highly expressed proteins. (A) The RMSEs in predicting the frequencies of each amino acid from the composition of noncoding DNA (G + C, di- and trinucleotide content) and phylogenetic relatedness (clade membership) of organisms. RMSEs are compared for lowly versus highly expressed genes across all organisms. (B) Binned and pooled ROC curves for classifying the organisms by various environmental preferences from AAC, after having factored out the composition of noncoding DNA and phylogeny. ROC curves shown separately for classification only from highly expressed or only from lowly expressed genes. Full ROC curves for individual environments shown in supplementary figure S5, Supplementary Material online. Average and 95% CI of AUROC scores inlaid on plots.
Mentions: Thus far, we have shown that intergenic oligonucleotide composition is an excellent predictor of AAC and that controlling for nucleotide composition leads to a substantial drop-off in classifier performance. Intuitively, this might imply that a given ecological signal primarily emanates from the nucleotide level and that the AAC is, to a greater or lesser extent, an epiphenomenon that passively tracks nucleotide composition. To further consider the relative contributions of nucleotide versus amino acid level selection, we considered the predictive capacity of the AAC in light of gene expression levels. Selection at the amino acid level should be stronger in highly expressed genes, increasing its relative contribution to the composite AAC signature that reflects both nucleotide and amino acid level processes. Consequently, AAC should be harder to predict from intergenic DNA for highly expressed genes compared with lowly expressed genes. Expression levels of proteins in conditions favorable to growth can be approximated from codon biases in protein-coding genes (Ikemura 1985). To this end, we use previous data for 911 prokaryotic genomes (Krisko et al. 2014), where a statistical test was used to assign a binary high/low expression label to genes (Supek et al. 2010). Using highly and lowly expressed genes separately to predict AAC from oligonucleotide composition, we find no significant difference in prediction accuracy (fig. 5A; mean difference of root-mean-square error [RMSE] over 20 amino acids = 0.002%, 95% CI: [ − 0.016%, 0.020%]). This suggests that higher expression does not lead to a greater preponderance of amino acid-related signatures in the AAC signal. We explicitly test this by examining the predictive power of AAC residuals derived from highly expressed genes for the organismal ecology and find that they are, overall, as poorly predictive as residuals derived from the remainder of the proteome, in contrast to the original AAC (fig. 5B). When examining individual environments separately, we again find no significant differences between the highly expressed genes and the rest of the proteome (at FDR < 10%; supplementary fig. S5, Supplementary Material online). This analysis is not affected by the phylogenetic relatedness of the points (organisms) in our regression data (supplementary fig. S10, Supplementary Material online).Fig. 5.—

Bottom Line: Qualitatively similar results were obtained for 49 fungal genomes, where 80% of the variability in AAC could be explained by the composition of introns and intergenic regions.Moreover, highly expressed genes do not exhibit more prominent environment-related AAC signatures than lowly expressed genes, despite contributing more to the effective proteome.Thus, evolutionary shifts in overall AAC appear to occur almost exclusively through factors shaping the global oligonucleotide content of the genome.

View Article: PubMed Central - PubMed

Affiliation: Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia.

Show MeSH