Limits...
Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH
Dinucleotide preferences in native vs. simulated DNA as a function of GC content.The R2s and slopes (left, right, vertical axis) of the correlation of dinucleotide preferences in native coding, intronic, and non-genic DNA (black, grey, and segmented grey, respectively) vs. those in DNA simulated using 64×4 matrices estimated from non-genic, coding-region, and intronic substitutions (thin segmented line; solid grey and black thin lines; and all thicker lines, respectively, as a function of increasing GC total (horizontal axis; but the thin black dotted lines are for 3∥1 dinucleotides simulated using intronic 64×4s with erased strand effects; the fit to intronic and non-genic dinucleotides with intronic 64×4s lacking strand effects was almost identical as with full strand effects). Coding-region preferences for 3∥1 dinucleotides were estimated from sequences simulated under Granthamian amino-acid selection.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g033: Dinucleotide preferences in native vs. simulated DNA as a function of GC content.The R2s and slopes (left, right, vertical axis) of the correlation of dinucleotide preferences in native coding, intronic, and non-genic DNA (black, grey, and segmented grey, respectively) vs. those in DNA simulated using 64×4 matrices estimated from non-genic, coding-region, and intronic substitutions (thin segmented line; solid grey and black thin lines; and all thicker lines, respectively, as a function of increasing GC total (horizontal axis; but the thin black dotted lines are for 3∥1 dinucleotides simulated using intronic 64×4s with erased strand effects; the fit to intronic and non-genic dinucleotides with intronic 64×4s lacking strand effects was almost identical as with full strand effects). Coding-region preferences for 3∥1 dinucleotides were estimated from sequences simulated under Granthamian amino-acid selection.

Mentions: For the simulation work above we used 64×4 matrices estimated from real substitutions and their immediate one-site upstream and one-site downstream contexts. Native preferences for trinucleotide motifs, however, are certainly influenced by a wider context so we tried to ascertain to which extent the 64×4 resolution of our simulations can reproduce the various native motif preferences. In Figures 32 and 33 we show the correlation between preferences for trinucleotide and dinucleotide motifs estimated from native baboon non-genic DNA, human intronic DNA, and Rattus coding regions and the corresponding preferences generated by the 64×4 matrices that were used also above. The figures show that the preferences estimated from simulated intronic and coding-region DNA are highly correlated to the native preferences, with the R2s ranging from 60 to 90%, the slopes falling in the proximity of 1.0; and the only exception being the two 2folds groups which tend to have lower R2s and slopes more distant from 1.0. The preferences from simulated non-genic DNA are those most cleanly correlated to the native ones but the two 2folds groups are not fit very well here either. These high correlations and slopes close to 1.0 indicate that the 64×4 matrices estimated from coding-region substitutions are capturing most of the neighbor-base-dependence of mutation in and close to genes. The fact that the fit is not complete, however, leaves room for substantial 1024×4 mutation effects (and wider) as well as for biasing effects due to selection that make the neighbor-base-dependence of coding-region substitutions not fully identical to that of mutation. On the other hand, in view of these results it is hard to entertain the possibility that the true pattern of mutation is one of a main 4×4 matrix with only one or very few neighbor-base-dependent effects. Note, finally, that erasing strand effects from the intronic 64×4s does not worsen the correlations (shown only for 3/1 dinucleotides), indicating again that the native preferences have been shaped by strand effects which alternate between the two strands over neutral-evolutionary time, at least in the germline.


Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

Dinucleotide preferences in native vs. simulated DNA as a function of GC content.The R2s and slopes (left, right, vertical axis) of the correlation of dinucleotide preferences in native coding, intronic, and non-genic DNA (black, grey, and segmented grey, respectively) vs. those in DNA simulated using 64×4 matrices estimated from non-genic, coding-region, and intronic substitutions (thin segmented line; solid grey and black thin lines; and all thicker lines, respectively, as a function of increasing GC total (horizontal axis; but the thin black dotted lines are for 3∥1 dinucleotides simulated using intronic 64×4s with erased strand effects; the fit to intronic and non-genic dinucleotides with intronic 64×4s lacking strand effects was almost identical as with full strand effects). Coding-region preferences for 3∥1 dinucleotides were estimated from sequences simulated under Granthamian amino-acid selection.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g033: Dinucleotide preferences in native vs. simulated DNA as a function of GC content.The R2s and slopes (left, right, vertical axis) of the correlation of dinucleotide preferences in native coding, intronic, and non-genic DNA (black, grey, and segmented grey, respectively) vs. those in DNA simulated using 64×4 matrices estimated from non-genic, coding-region, and intronic substitutions (thin segmented line; solid grey and black thin lines; and all thicker lines, respectively, as a function of increasing GC total (horizontal axis; but the thin black dotted lines are for 3∥1 dinucleotides simulated using intronic 64×4s with erased strand effects; the fit to intronic and non-genic dinucleotides with intronic 64×4s lacking strand effects was almost identical as with full strand effects). Coding-region preferences for 3∥1 dinucleotides were estimated from sequences simulated under Granthamian amino-acid selection.
Mentions: For the simulation work above we used 64×4 matrices estimated from real substitutions and their immediate one-site upstream and one-site downstream contexts. Native preferences for trinucleotide motifs, however, are certainly influenced by a wider context so we tried to ascertain to which extent the 64×4 resolution of our simulations can reproduce the various native motif preferences. In Figures 32 and 33 we show the correlation between preferences for trinucleotide and dinucleotide motifs estimated from native baboon non-genic DNA, human intronic DNA, and Rattus coding regions and the corresponding preferences generated by the 64×4 matrices that were used also above. The figures show that the preferences estimated from simulated intronic and coding-region DNA are highly correlated to the native preferences, with the R2s ranging from 60 to 90%, the slopes falling in the proximity of 1.0; and the only exception being the two 2folds groups which tend to have lower R2s and slopes more distant from 1.0. The preferences from simulated non-genic DNA are those most cleanly correlated to the native ones but the two 2folds groups are not fit very well here either. These high correlations and slopes close to 1.0 indicate that the 64×4 matrices estimated from coding-region substitutions are capturing most of the neighbor-base-dependence of mutation in and close to genes. The fact that the fit is not complete, however, leaves room for substantial 1024×4 mutation effects (and wider) as well as for biasing effects due to selection that make the neighbor-base-dependence of coding-region substitutions not fully identical to that of mutation. On the other hand, in view of these results it is hard to entertain the possibility that the true pattern of mutation is one of a main 4×4 matrix with only one or very few neighbor-base-dependent effects. Note, finally, that erasing strand effects from the intronic 64×4s does not worsen the correlations (shown only for 3/1 dinucleotides), indicating again that the native preferences have been shaped by strand effects which alternate between the two strands over neutral-evolutionary time, at least in the germline.

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH