Limits...
Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH
Trinucleotide occurrences in native and simulated intronic DNA vs. base-composition expectations, as a function of GC content.R2s and slopes (left, right; vertical axis) of the correlations between native or simulated intronic trinucleotide occurrences (top, middle) and their base-composition expectations, as a function of increasing GC content (horizontal axis; see also Methods). The simulated occurrences come from sequences generated by the intron- and coding-region-derived 64×4 matrices used for the previous figures (thicker and thinner lines, respectively). At the bottom are the R2s and slopes between native intronic values and the ones generated by 64×4 matrices. Thickest lines in the bottom plots indicate results with intronic 64×4s lacking strand effects.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g028: Trinucleotide occurrences in native and simulated intronic DNA vs. base-composition expectations, as a function of GC content.R2s and slopes (left, right; vertical axis) of the correlations between native or simulated intronic trinucleotide occurrences (top, middle) and their base-composition expectations, as a function of increasing GC content (horizontal axis; see also Methods). The simulated occurrences come from sequences generated by the intron- and coding-region-derived 64×4 matrices used for the previous figures (thicker and thinner lines, respectively). At the bottom are the R2s and slopes between native intronic values and the ones generated by 64×4 matrices. Thickest lines in the bottom plots indicate results with intronic 64×4s lacking strand effects.

Mentions: The native and simulated patterns presented above indicate clearly that NBDM shapes the primary structure of non-genic, intronic, and coding DNA. However, one could argue that the primary-structural foundations of the patterns and relationships shown so far are only subtle departures from what one would expect under a regime of context-independent mutation that delivers the base composition. This is not the case. In Figure 28 we show, for increasing GC content, the R2s and slopes of the correlation between trinucleotide occurrences in native intronic DNA and those in random sequences whose base composition is identical to that of the various GC-sorted groups of native intronic sequences. The R2s range from quite high (at most 90%) at low and high GCs, to 0% at intermediate GCs, i.e., they are lowest exactly when the native non-genic and intronic preference-occurrence R2s reach extraordinarily high 95%+ values (Fig. 22 and 25). Figure 28 also shows that trinucleotide occurrences in simulated DNA at equilibrium under the 64×4 matrices estimated from mouse-rat/Homo coding-DNA and under 64×4s derived from Homo-chimp/macaque intronic DNA, relate to the corresponding base-composition expectations in a very similar way as do native intronic occurrences. Finally, the figure shows that the native and simulated occurrences are strongly correlated with each other, especially the all-motifs, 4fold, and 6fold groups. The slopes, moreover, range between 1.0 and 2.0 indicating that the primary-structural features due to 64×4 NBDM are of the same order of magnitude as those observed in native intron DNA. Therefore NBDM-simulated and native intronic patterns are correlated because both sets of occurrences are very similarly structured, while the occurrences expected given the base composition differ strongly from the native occurrences, especially when GC content is intermediate.


Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

Trinucleotide occurrences in native and simulated intronic DNA vs. base-composition expectations, as a function of GC content.R2s and slopes (left, right; vertical axis) of the correlations between native or simulated intronic trinucleotide occurrences (top, middle) and their base-composition expectations, as a function of increasing GC content (horizontal axis; see also Methods). The simulated occurrences come from sequences generated by the intron- and coding-region-derived 64×4 matrices used for the previous figures (thicker and thinner lines, respectively). At the bottom are the R2s and slopes between native intronic values and the ones generated by 64×4 matrices. Thickest lines in the bottom plots indicate results with intronic 64×4s lacking strand effects.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g028: Trinucleotide occurrences in native and simulated intronic DNA vs. base-composition expectations, as a function of GC content.R2s and slopes (left, right; vertical axis) of the correlations between native or simulated intronic trinucleotide occurrences (top, middle) and their base-composition expectations, as a function of increasing GC content (horizontal axis; see also Methods). The simulated occurrences come from sequences generated by the intron- and coding-region-derived 64×4 matrices used for the previous figures (thicker and thinner lines, respectively). At the bottom are the R2s and slopes between native intronic values and the ones generated by 64×4 matrices. Thickest lines in the bottom plots indicate results with intronic 64×4s lacking strand effects.
Mentions: The native and simulated patterns presented above indicate clearly that NBDM shapes the primary structure of non-genic, intronic, and coding DNA. However, one could argue that the primary-structural foundations of the patterns and relationships shown so far are only subtle departures from what one would expect under a regime of context-independent mutation that delivers the base composition. This is not the case. In Figure 28 we show, for increasing GC content, the R2s and slopes of the correlation between trinucleotide occurrences in native intronic DNA and those in random sequences whose base composition is identical to that of the various GC-sorted groups of native intronic sequences. The R2s range from quite high (at most 90%) at low and high GCs, to 0% at intermediate GCs, i.e., they are lowest exactly when the native non-genic and intronic preference-occurrence R2s reach extraordinarily high 95%+ values (Fig. 22 and 25). Figure 28 also shows that trinucleotide occurrences in simulated DNA at equilibrium under the 64×4 matrices estimated from mouse-rat/Homo coding-DNA and under 64×4s derived from Homo-chimp/macaque intronic DNA, relate to the corresponding base-composition expectations in a very similar way as do native intronic occurrences. Finally, the figure shows that the native and simulated occurrences are strongly correlated with each other, especially the all-motifs, 4fold, and 6fold groups. The slopes, moreover, range between 1.0 and 2.0 indicating that the primary-structural features due to 64×4 NBDM are of the same order of magnitude as those observed in native intron DNA. Therefore NBDM-simulated and native intronic patterns are correlated because both sets of occurrences are very similarly structured, while the occurrences expected given the base composition differ strongly from the native occurrences, especially when GC content is intermediate.

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH