Limits...
Global Shifts in Genome and Proteome Composition Are Very Tightly Coupled.

Brbić M, Warnecke T, Kriško A, Supek F - Genome Biol Evol (2015)

Bottom Line: We disentangle these effects by systematically evaluating the correspondence between intergenic nucleotide composition, where protein-level selection is absent, the AAC, and ecological parameters of 909 prokaryotes.Moreover, highly expressed genes do not exhibit more prominent environment-related AAC signatures than lowly expressed genes, despite contributing more to the effective proteome.We discuss these results in light of contravening evidence from biophysical data and further reading frame-specific analyses that suggest that adaptation takes place at the protein level.

View Article: PubMed Central - PubMed

Affiliation: Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia.

Show MeSH

Related in: MedlinePlus

The oligonucleotide frequencies in the noncoding DNA of prokaryotes are highly predictive of their proteome compositions. (A) Explained variance (as squared Pearson correlation coefficient, R2) in the amino acid usage of proteomes in a multiple regression against different sets of features; by considering only the G + C content (blue bars), and by progressively including also the dinucleotide frequencies (red), the trinucleotides (teal), and phylogenetic groups (purple). Error bars are standard deviations from ten runs of cross-validation. (B, C) The median variance explained using the same sets of features over all 20 amino acids (B) or only over the seven G + C balanced amino acids (THEVDQC) (C). The “bias estimate” is from bootstrapping (Materials and Methods).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4494046&req=5

evv088-F1: The oligonucleotide frequencies in the noncoding DNA of prokaryotes are highly predictive of their proteome compositions. (A) Explained variance (as squared Pearson correlation coefficient, R2) in the amino acid usage of proteomes in a multiple regression against different sets of features; by considering only the G + C content (blue bars), and by progressively including also the dinucleotide frequencies (red), the trinucleotides (teal), and phylogenetic groups (purple). Error bars are standard deviations from ten runs of cross-validation. (B, C) The median variance explained using the same sets of features over all 20 amino acids (B) or only over the seven G + C balanced amino acids (THEVDQC) (C). The “bias estimate” is from bootstrapping (Materials and Methods).

Mentions: Consistent with previous work (Singer and Hickey 2000; Lightfield et al. 2011), we find that G + C content alone can explain some of the AAC variation between genomes (fig. 1; median R2 over amino acids = 0.555) but leaves a substantial fraction of variance unexplained. This is not surprising as G + C variation has a single degree of freedom, insufficient to capture the diversity in AAC (and ecological preferences) among microbes, as illustrated by the seven amino acids with balanced G + C across codons (THEVDQC): AAC for this subset of amino acids is poorly predictable from G + C alone (fig. 1; median R2 = 0.115). In more general terms, we estimate that our data set has at least 6 and 7 degrees of freedom for the AAC and ecological preference, respectively (supplementary fig. S1, Supplementary Material online). This is important to note because in cases where AAC correlates with ecological parameters, but G + C does not—such as for thermophilicity (Hurst and Merchant 2001; Zeldovich et al. 2007) and halophilicity (Paul et al. 2008)—this should not be taken as sufficient evidence for adaptation at the amino acid level. Rather, absence of a clear association might reflect intrinsic limitations of G + C content as a predictor.Fig. 1.—


Global Shifts in Genome and Proteome Composition Are Very Tightly Coupled.

Brbić M, Warnecke T, Kriško A, Supek F - Genome Biol Evol (2015)

The oligonucleotide frequencies in the noncoding DNA of prokaryotes are highly predictive of their proteome compositions. (A) Explained variance (as squared Pearson correlation coefficient, R2) in the amino acid usage of proteomes in a multiple regression against different sets of features; by considering only the G + C content (blue bars), and by progressively including also the dinucleotide frequencies (red), the trinucleotides (teal), and phylogenetic groups (purple). Error bars are standard deviations from ten runs of cross-validation. (B, C) The median variance explained using the same sets of features over all 20 amino acids (B) or only over the seven G + C balanced amino acids (THEVDQC) (C). The “bias estimate” is from bootstrapping (Materials and Methods).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4494046&req=5

evv088-F1: The oligonucleotide frequencies in the noncoding DNA of prokaryotes are highly predictive of their proteome compositions. (A) Explained variance (as squared Pearson correlation coefficient, R2) in the amino acid usage of proteomes in a multiple regression against different sets of features; by considering only the G + C content (blue bars), and by progressively including also the dinucleotide frequencies (red), the trinucleotides (teal), and phylogenetic groups (purple). Error bars are standard deviations from ten runs of cross-validation. (B, C) The median variance explained using the same sets of features over all 20 amino acids (B) or only over the seven G + C balanced amino acids (THEVDQC) (C). The “bias estimate” is from bootstrapping (Materials and Methods).
Mentions: Consistent with previous work (Singer and Hickey 2000; Lightfield et al. 2011), we find that G + C content alone can explain some of the AAC variation between genomes (fig. 1; median R2 over amino acids = 0.555) but leaves a substantial fraction of variance unexplained. This is not surprising as G + C variation has a single degree of freedom, insufficient to capture the diversity in AAC (and ecological preferences) among microbes, as illustrated by the seven amino acids with balanced G + C across codons (THEVDQC): AAC for this subset of amino acids is poorly predictable from G + C alone (fig. 1; median R2 = 0.115). In more general terms, we estimate that our data set has at least 6 and 7 degrees of freedom for the AAC and ecological preference, respectively (supplementary fig. S1, Supplementary Material online). This is important to note because in cases where AAC correlates with ecological parameters, but G + C does not—such as for thermophilicity (Hurst and Merchant 2001; Zeldovich et al. 2007) and halophilicity (Paul et al. 2008)—this should not be taken as sufficient evidence for adaptation at the amino acid level. Rather, absence of a clear association might reflect intrinsic limitations of G + C content as a predictor.Fig. 1.—

Bottom Line: We disentangle these effects by systematically evaluating the correspondence between intergenic nucleotide composition, where protein-level selection is absent, the AAC, and ecological parameters of 909 prokaryotes.Moreover, highly expressed genes do not exhibit more prominent environment-related AAC signatures than lowly expressed genes, despite contributing more to the effective proteome.We discuss these results in light of contravening evidence from biophysical data and further reading frame-specific analyses that suggest that adaptation takes place at the protein level.

View Article: PubMed Central - PubMed

Affiliation: Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia Molecular Basis of Ageing, Mediterranean Institute for Life Sciences (MedILS), Split, Croatia.

Show MeSH
Related in: MedlinePlus