Limits...
Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH
Trinucleotide preferences and occurrences in coding, intronic, and non-genic DNA.The right half of the figure shows from top to bottom the occurrence-occurrence R2s for trinucleotides from coding vs. intronic DNA, coding vs. non-genic DNA, and non-genic vs. intronic DNA, as a function of increasing GC content (horizontal axis); flanked to the left by the corresponding preference-preference R2s for off-frame trinucleotides. The figure's left half shows the corresponding slopes. For the top two rows we used human whole-genome coding-region values (and codons were used as “coding-region trinucleotides”); and for the bottom row we used values from the 49%-GC non-genic subgroup.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g024: Trinucleotide preferences and occurrences in coding, intronic, and non-genic DNA.The right half of the figure shows from top to bottom the occurrence-occurrence R2s for trinucleotides from coding vs. intronic DNA, coding vs. non-genic DNA, and non-genic vs. intronic DNA, as a function of increasing GC content (horizontal axis); flanked to the left by the corresponding preference-preference R2s for off-frame trinucleotides. The figure's left half shows the corresponding slopes. For the top two rows we used human whole-genome coding-region values (and codons were used as “coding-region trinucleotides”); and for the bottom row we used values from the 49%-GC non-genic subgroup.

Mentions: The reaction to GC of the occurrence-preference slopes in the non-coding data is shown in Figure 23. With a minor exception, the slopes are always positive and, surprisingly, the slopes most similar to those from coding regions are the non-genic ones, not the intronic ones. In Figure 24 we present the reaction to GC content of occurrence-occurrence and preference-preference R2s for coding vs. intronic, coding vs. nongenic, and non-genic vs. intronic DNA. In general, preference R2s reach much higher values than occurrence R2s, which is consistent with what was seen above, and the values are highest for the all-motifs as well as the 4- and 6-folds motif groups, and lowest for the two 2folds groups especially at low GC. Of interest is that when GC is intermediate the preference R2s of the various motif families tend to have quite similar high values, with those of 2fold motifs making the biggest upwards transition and those of 4folds and 6folds making slight downward adjustments. Also of possible interest is the “S” shaped reaction of the 2f-3aas occurrence R2s to higher GC content, that these R2s rise earlier in the coding-vs.-non-genic case, that 6fold R2s show a similar but much weaker reaction, and that the horizontally paired patterns of occurrences and preferences are often qualitatively similar. One should, however, not forget that unlike codon occurrences and motif preferences estimated via SC randomization (and like 23/1 and 3/12 trinucleotide occurrences and preferences), average motif preferences estimated from multiple intronic or nongenic sequences should be strongly correlated with overall motif occurrences when the preferences are clearly non-random (i.e., whenever an average preference is clearly different from 0.00).


Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

Trinucleotide preferences and occurrences in coding, intronic, and non-genic DNA.The right half of the figure shows from top to bottom the occurrence-occurrence R2s for trinucleotides from coding vs. intronic DNA, coding vs. non-genic DNA, and non-genic vs. intronic DNA, as a function of increasing GC content (horizontal axis); flanked to the left by the corresponding preference-preference R2s for off-frame trinucleotides. The figure's left half shows the corresponding slopes. For the top two rows we used human whole-genome coding-region values (and codons were used as “coding-region trinucleotides”); and for the bottom row we used values from the 49%-GC non-genic subgroup.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g024: Trinucleotide preferences and occurrences in coding, intronic, and non-genic DNA.The right half of the figure shows from top to bottom the occurrence-occurrence R2s for trinucleotides from coding vs. intronic DNA, coding vs. non-genic DNA, and non-genic vs. intronic DNA, as a function of increasing GC content (horizontal axis); flanked to the left by the corresponding preference-preference R2s for off-frame trinucleotides. The figure's left half shows the corresponding slopes. For the top two rows we used human whole-genome coding-region values (and codons were used as “coding-region trinucleotides”); and for the bottom row we used values from the 49%-GC non-genic subgroup.
Mentions: The reaction to GC of the occurrence-preference slopes in the non-coding data is shown in Figure 23. With a minor exception, the slopes are always positive and, surprisingly, the slopes most similar to those from coding regions are the non-genic ones, not the intronic ones. In Figure 24 we present the reaction to GC content of occurrence-occurrence and preference-preference R2s for coding vs. intronic, coding vs. nongenic, and non-genic vs. intronic DNA. In general, preference R2s reach much higher values than occurrence R2s, which is consistent with what was seen above, and the values are highest for the all-motifs as well as the 4- and 6-folds motif groups, and lowest for the two 2folds groups especially at low GC. Of interest is that when GC is intermediate the preference R2s of the various motif families tend to have quite similar high values, with those of 2fold motifs making the biggest upwards transition and those of 4folds and 6folds making slight downward adjustments. Also of possible interest is the “S” shaped reaction of the 2f-3aas occurrence R2s to higher GC content, that these R2s rise earlier in the coding-vs.-non-genic case, that 6fold R2s show a similar but much weaker reaction, and that the horizontally paired patterns of occurrences and preferences are often qualitatively similar. One should, however, not forget that unlike codon occurrences and motif preferences estimated via SC randomization (and like 23/1 and 3/12 trinucleotide occurrences and preferences), average motif preferences estimated from multiple intronic or nongenic sequences should be strongly correlated with overall motif occurrences when the preferences are clearly non-random (i.e., whenever an average preference is clearly different from 0.00).

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH