Limits...
Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH
The expected incidence of amino-acid replacements due to one-base codon mutations in human coding regions, under 64×4 relative to 4×4 intronic mutation.Results from top to bottom are for 0.34, 0.42, 0.50, and 0.57 GC123 (i.e., points 2, 8, 14, and 17 from the left in Fig. 34). A positive value on the left indicates that the 64×4 mutability is (value +1.0)-fold higher than it 4×4 counterpart (1.0 was substracted to obtain 0.0 when two rates are identical); and a negative value indicates that the 4×4 value is abs(value −1.0)-fold higher than its 64×4 counterpart. On the right are the differences between each plain replacement mutability under 64×4 mutation and its 4×4 counterpart, to highlight the replacement dominating the patterns in Figure 34. Values on the right were rescaled to make the largest positive difference equal to 5.0 (i.e., the 245.7 Arg –>Gln rate). Therefore, values above the 0.0-plane, both left and right, indicate an advantage for NBDM (i.e., a 64×4 mutability lower than its 4×4 counterpart). Numeric values are shown in Figure 36.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g035: The expected incidence of amino-acid replacements due to one-base codon mutations in human coding regions, under 64×4 relative to 4×4 intronic mutation.Results from top to bottom are for 0.34, 0.42, 0.50, and 0.57 GC123 (i.e., points 2, 8, 14, and 17 from the left in Fig. 34). A positive value on the left indicates that the 64×4 mutability is (value +1.0)-fold higher than it 4×4 counterpart (1.0 was substracted to obtain 0.0 when two rates are identical); and a negative value indicates that the 4×4 value is abs(value −1.0)-fold higher than its 64×4 counterpart. On the right are the differences between each plain replacement mutability under 64×4 mutation and its 4×4 counterpart, to highlight the replacement dominating the patterns in Figure 34. Values on the right were rescaled to make the largest positive difference equal to 5.0 (i.e., the 245.7 Arg –>Gln rate). Therefore, values above the 0.0-plane, both left and right, indicate an advantage for NBDM (i.e., a 64×4 mutability lower than its 4×4 counterpart). Numeric values are shown in Figure 36.

Mentions: In Figure 35 we contrast the expected occurrence under 64×4 vs. 4×4 mutation of the individual amino-acid changes in the 20×20 matrix that are due to single base changes, for four of the 18 groups of human coding regions and pairs of matrices used to generate Figure 34. The values plotted in Figure 35 are shown in Figure 36 as numbers and the reader is advised to consult both. The four groups represent the GC content closest to 0.5 and those at which 64×4 NBDM delivers the largest overall plain mutability advantage relative to 4×4 mutation (0.55), the largest disadvantage (0.42), and 2nd-largest low-GC advantage (0.34). The trend both left and right is that under 4×4 mutation more of the 20×20 replacements would happen more frequently than under 64×4 mutation. In the upper left plot, e.g., 99 values are smaller than 0.0 and 69 are larger, and 67 are below −0.1 and 47 above 0.1; whereas in the top right plot the numbers are 99 to 69 (obviously, given the numbers at the left plot) and 71 to 48, respectively. This trend is also found in the lower plots albeit it is less marked in the second-row plots (from the top), consistent with these being the two plots for the largest 64×4 disadvantage, although the alternative explanation with few large effects determining overall mutability differences should not be forgotten. When the ±boundary is larger, the numbers above and below the boundaries decrease, the trend becoming erratic and switching polarity for the most extreme effects which of course involve the very mutable CG dinucleotide and are erased under 4×4 mutation. All in all, however, the left-side plots indicate that several replacements are moderately less likely under 64×4 mutation than under 4×4 mutation –instead of a few being markedly less likely– but the plots on the right show that several mutability differences favoring 64×4 mutation are nonetheless quite large. This makes it harder to point out important replacements that natural selection may be suppressing or tolerating to greatest extent than if it had been the case that both the left- and right-side plots showed the same performers delivering similarly high notes. We will not discuss further the relationship between the differences in expected replacement generation in Figures 35 and 36 and the underlying differences between individual 64×4 and 4×4 base-mutation rates. This would require examining 64×64 matrices, and would make more sense if one knew the actual 64×4 matrices that generated the native intronic patterns.


Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

The expected incidence of amino-acid replacements due to one-base codon mutations in human coding regions, under 64×4 relative to 4×4 intronic mutation.Results from top to bottom are for 0.34, 0.42, 0.50, and 0.57 GC123 (i.e., points 2, 8, 14, and 17 from the left in Fig. 34). A positive value on the left indicates that the 64×4 mutability is (value +1.0)-fold higher than it 4×4 counterpart (1.0 was substracted to obtain 0.0 when two rates are identical); and a negative value indicates that the 4×4 value is abs(value −1.0)-fold higher than its 64×4 counterpart. On the right are the differences between each plain replacement mutability under 64×4 mutation and its 4×4 counterpart, to highlight the replacement dominating the patterns in Figure 34. Values on the right were rescaled to make the largest positive difference equal to 5.0 (i.e., the 245.7 Arg –>Gln rate). Therefore, values above the 0.0-plane, both left and right, indicate an advantage for NBDM (i.e., a 64×4 mutability lower than its 4×4 counterpart). Numeric values are shown in Figure 36.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g035: The expected incidence of amino-acid replacements due to one-base codon mutations in human coding regions, under 64×4 relative to 4×4 intronic mutation.Results from top to bottom are for 0.34, 0.42, 0.50, and 0.57 GC123 (i.e., points 2, 8, 14, and 17 from the left in Fig. 34). A positive value on the left indicates that the 64×4 mutability is (value +1.0)-fold higher than it 4×4 counterpart (1.0 was substracted to obtain 0.0 when two rates are identical); and a negative value indicates that the 4×4 value is abs(value −1.0)-fold higher than its 64×4 counterpart. On the right are the differences between each plain replacement mutability under 64×4 mutation and its 4×4 counterpart, to highlight the replacement dominating the patterns in Figure 34. Values on the right were rescaled to make the largest positive difference equal to 5.0 (i.e., the 245.7 Arg –>Gln rate). Therefore, values above the 0.0-plane, both left and right, indicate an advantage for NBDM (i.e., a 64×4 mutability lower than its 4×4 counterpart). Numeric values are shown in Figure 36.
Mentions: In Figure 35 we contrast the expected occurrence under 64×4 vs. 4×4 mutation of the individual amino-acid changes in the 20×20 matrix that are due to single base changes, for four of the 18 groups of human coding regions and pairs of matrices used to generate Figure 34. The values plotted in Figure 35 are shown in Figure 36 as numbers and the reader is advised to consult both. The four groups represent the GC content closest to 0.5 and those at which 64×4 NBDM delivers the largest overall plain mutability advantage relative to 4×4 mutation (0.55), the largest disadvantage (0.42), and 2nd-largest low-GC advantage (0.34). The trend both left and right is that under 4×4 mutation more of the 20×20 replacements would happen more frequently than under 64×4 mutation. In the upper left plot, e.g., 99 values are smaller than 0.0 and 69 are larger, and 67 are below −0.1 and 47 above 0.1; whereas in the top right plot the numbers are 99 to 69 (obviously, given the numbers at the left plot) and 71 to 48, respectively. This trend is also found in the lower plots albeit it is less marked in the second-row plots (from the top), consistent with these being the two plots for the largest 64×4 disadvantage, although the alternative explanation with few large effects determining overall mutability differences should not be forgotten. When the ±boundary is larger, the numbers above and below the boundaries decrease, the trend becoming erratic and switching polarity for the most extreme effects which of course involve the very mutable CG dinucleotide and are erased under 4×4 mutation. All in all, however, the left-side plots indicate that several replacements are moderately less likely under 64×4 mutation than under 4×4 mutation –instead of a few being markedly less likely– but the plots on the right show that several mutability differences favoring 64×4 mutation are nonetheless quite large. This makes it harder to point out important replacements that natural selection may be suppressing or tolerating to greatest extent than if it had been the case that both the left- and right-side plots showed the same performers delivering similarly high notes. We will not discuss further the relationship between the differences in expected replacement generation in Figures 35 and 36 and the underlying differences between individual 64×4 and 4×4 base-mutation rates. This would require examining 64×64 matrices, and would make more sense if one knew the actual 64×4 matrices that generated the native intronic patterns.

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH