Limits...
Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH

Related in: MedlinePlus

Simulated occurrence-preference slopes as a function of GC content.On the vertical axis is the slope of the correlation between average trinucleotide preferences and total trinucleotide occurrences as a function of increasing GC content (horizontal axis). On the right are the native patterns and on the left are those from simulated data. The data sets are the same as in the previous figure and are labelled identically. In plots with log vertical axis, the missing stretches of curves are due to negative slopes.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g026: Simulated occurrence-preference slopes as a function of GC content.On the vertical axis is the slope of the correlation between average trinucleotide preferences and total trinucleotide occurrences as a function of increasing GC content (horizontal axis). On the right are the native patterns and on the left are those from simulated data. The data sets are the same as in the previous figure and are labelled identically. In plots with log vertical axis, the missing stretches of curves are due to negative slopes.

Mentions: Remarkable similarity was also observed between the native intronic pattern and that generated by the intron-derived 64×4s with full strand effects and by the 64×4s estimated from mouse-rat/human coding-region alignments. These include i) higher R2s below 0.5 GC for 6folds, 4folds and all-motifs than for the two 2fold groups; ii) higher R2s below 0.5 GC for the 6fold and the two 2fold groups than for the 4fold and all-motifs groups; and iii) lower maximum R2s for the two 2folds than for the other three groups. However, the shift towards higher GC of the peak R2s of the two 2folds, which is observed neither in the native pattern nor in the pattern generated by the no-strand-effect intronic 64×4s nor in the pattern generated by the matrices derived from substitutions to mouse or rat, indicates i) that the strand effects in these 64×4s are typical neither for the long-term mutation regime that generated the native intronic patterns nor for the mouse/rat coding-region substitutions used to estimate the other set of 64×4s. This indicates that the strand effects inferrable from substitutions in human and chimp introns are not typical for the long-term mutational regime that has shaped intron primary structure. However, it is also thinkable that crucial NBDM effects are missing which if estimated accurately would allow say a 1024×4 matrix with full strand effects to reproduce the native intronic pattern (which would make the remarkable fit obtained with the no-strand-effect intronic 64×4s into an (un)felicitous coincidence). The patterning of the simulated slopes is shown in Figure 26 is much less similar to the native pattern than that of the simulated R2s, albeit the slopes obtained from mouse/rat coding-region 64×4s do show some agreement with the native slopes. On a related note, it is very likely that the simulated occurrence-preference R2s at 0.5 GC are lower than the native ones because –as it was already mentioned in the Materials and Methods– the 64×4 matrices were estimated using a mix of substitutions generated by possibly heterogeneous 64×4 regimes (since the alignments were sorted by GC content rather than by the unknown true rates of the 64×4 regimes that generated the harvested substitutions). This makes the at times striking similarities between simulated and native patterns presented above even more remarkable, as it does those presented below.


Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

Simulated occurrence-preference slopes as a function of GC content.On the vertical axis is the slope of the correlation between average trinucleotide preferences and total trinucleotide occurrences as a function of increasing GC content (horizontal axis). On the right are the native patterns and on the left are those from simulated data. The data sets are the same as in the previous figure and are labelled identically. In plots with log vertical axis, the missing stretches of curves are due to negative slopes.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g026: Simulated occurrence-preference slopes as a function of GC content.On the vertical axis is the slope of the correlation between average trinucleotide preferences and total trinucleotide occurrences as a function of increasing GC content (horizontal axis). On the right are the native patterns and on the left are those from simulated data. The data sets are the same as in the previous figure and are labelled identically. In plots with log vertical axis, the missing stretches of curves are due to negative slopes.
Mentions: Remarkable similarity was also observed between the native intronic pattern and that generated by the intron-derived 64×4s with full strand effects and by the 64×4s estimated from mouse-rat/human coding-region alignments. These include i) higher R2s below 0.5 GC for 6folds, 4folds and all-motifs than for the two 2fold groups; ii) higher R2s below 0.5 GC for the 6fold and the two 2fold groups than for the 4fold and all-motifs groups; and iii) lower maximum R2s for the two 2folds than for the other three groups. However, the shift towards higher GC of the peak R2s of the two 2folds, which is observed neither in the native pattern nor in the pattern generated by the no-strand-effect intronic 64×4s nor in the pattern generated by the matrices derived from substitutions to mouse or rat, indicates i) that the strand effects in these 64×4s are typical neither for the long-term mutation regime that generated the native intronic patterns nor for the mouse/rat coding-region substitutions used to estimate the other set of 64×4s. This indicates that the strand effects inferrable from substitutions in human and chimp introns are not typical for the long-term mutational regime that has shaped intron primary structure. However, it is also thinkable that crucial NBDM effects are missing which if estimated accurately would allow say a 1024×4 matrix with full strand effects to reproduce the native intronic pattern (which would make the remarkable fit obtained with the no-strand-effect intronic 64×4s into an (un)felicitous coincidence). The patterning of the simulated slopes is shown in Figure 26 is much less similar to the native pattern than that of the simulated R2s, albeit the slopes obtained from mouse/rat coding-region 64×4s do show some agreement with the native slopes. On a related note, it is very likely that the simulated occurrence-preference R2s at 0.5 GC are lower than the native ones because –as it was already mentioned in the Materials and Methods– the 64×4 matrices were estimated using a mix of substitutions generated by possibly heterogeneous 64×4 regimes (since the alignments were sorted by GC content rather than by the unknown true rates of the 64×4 regimes that generated the harvested substitutions). This makes the at times striking similarities between simulated and native patterns presented above even more remarkable, as it does those presented below.

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH
Related in: MedlinePlus