Limits...
Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH
Codon occurrences in native vs. simulated coding DNA, as a function of GC content.The R2s and slopes (top, bottom; vertical axis) of the correlations between codon occurrences in native human genes and in simulated coding DNA generated by 4×4 or 64×4 matrices (left, right) with erased strand effects estimated from human-chimp/macaque intron alignments under Grantham selection of non-synonymous changes, as a function of increasing GC total (horizontal axis). Thinner-line patterns are for mouse genes and full-strand-effects 64×4s derived from mouse-rat/human coding-DNA alignments (leaning right). The fit with intronic 64×4s having full strand effects was worse.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g029: Codon occurrences in native vs. simulated coding DNA, as a function of GC content.The R2s and slopes (top, bottom; vertical axis) of the correlations between codon occurrences in native human genes and in simulated coding DNA generated by 4×4 or 64×4 matrices (left, right) with erased strand effects estimated from human-chimp/macaque intron alignments under Grantham selection of non-synonymous changes, as a function of increasing GC total (horizontal axis). Thinner-line patterns are for mouse genes and full-strand-effects 64×4s derived from mouse-rat/human coding-DNA alignments (leaning right). The fit with intronic 64×4s having full strand effects was worse.

Mentions: The situation for coding regions is shown in Figure 29 which presents, for a range of GC contents, the R2s and slopes of the correlation of the codon occurrences in native human CDSs with the occurrences in DNA simulated under Granthamian selection of non-synonymous changes generated by 4×4 and 64×4 matrices from mouse-rat/Homo coding-region alignments or from Homo-chimp/macaque intron alignments with erased strand effects. Note, incidentally, that i) expectations derived by assuming a single base composition per GC group, delivered very poor fit since a single base composition is not very suitable to fit the effects of amino-acid selection, and that ii) allowing a different base composition at each of the three codon positions delivers better fit but is biologically implausible. Therefore we do not show either set of results. The figure shows that the 4fold and 6fold codon occurrences generated under amino-acid selection by intronic 64×4 matrices correlate with the native occurrences more strongly than do those generated by 4×4 matrices, except at low GC content where they do it similarly. Furthermore, the slopes for 4fold and 6fold codons generated by 64×4s under selection fall remarkably consistently close to 1.0 while those generated by 4×4s tend to be closer to 0.0 (but 4×4-generated 4fold slopes are close to 1.0 at low GC). The 2fold and 2f-3aas occurrences generated by 4×4 or 64×4s intronic matrices correlate weakly and negatively with the native ones (but those from 4×4s do it more strongly, but negatively, between 0.45 and 0.5 GC). The 64×4s derived from coding-DNA substitutions generate only positive slopes, an improved fit for the two 2folds, and a slightly worse fit for 4- and 6-folds relative to the intronic 64×4s; while the corresponding 4×4s deliver much worse fit, except for the slightly better fit of the two 2folds (with near-zero slopes). Therefore in presence of Grantham selection on non-synonymous changes, the empirically estimated 64×4 matrices generate more realistic codon occurrences than do the 4×4 matrices.


Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes.

Antezana MA, Jordan IK - PLoS ONE (2008)

Codon occurrences in native vs. simulated coding DNA, as a function of GC content.The R2s and slopes (top, bottom; vertical axis) of the correlations between codon occurrences in native human genes and in simulated coding DNA generated by 4×4 or 64×4 matrices (left, right) with erased strand effects estimated from human-chimp/macaque intron alignments under Grantham selection of non-synonymous changes, as a function of increasing GC total (horizontal axis). Thinner-line patterns are for mouse genes and full-strand-effects 64×4s derived from mouse-rat/human coding-DNA alignments (leaning right). The fit with intronic 64×4s having full strand effects was worse.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2366069&req=5

pone-0002145-g029: Codon occurrences in native vs. simulated coding DNA, as a function of GC content.The R2s and slopes (top, bottom; vertical axis) of the correlations between codon occurrences in native human genes and in simulated coding DNA generated by 4×4 or 64×4 matrices (left, right) with erased strand effects estimated from human-chimp/macaque intron alignments under Grantham selection of non-synonymous changes, as a function of increasing GC total (horizontal axis). Thinner-line patterns are for mouse genes and full-strand-effects 64×4s derived from mouse-rat/human coding-DNA alignments (leaning right). The fit with intronic 64×4s having full strand effects was worse.
Mentions: The situation for coding regions is shown in Figure 29 which presents, for a range of GC contents, the R2s and slopes of the correlation of the codon occurrences in native human CDSs with the occurrences in DNA simulated under Granthamian selection of non-synonymous changes generated by 4×4 and 64×4 matrices from mouse-rat/Homo coding-region alignments or from Homo-chimp/macaque intron alignments with erased strand effects. Note, incidentally, that i) expectations derived by assuming a single base composition per GC group, delivered very poor fit since a single base composition is not very suitable to fit the effects of amino-acid selection, and that ii) allowing a different base composition at each of the three codon positions delivers better fit but is biologically implausible. Therefore we do not show either set of results. The figure shows that the 4fold and 6fold codon occurrences generated under amino-acid selection by intronic 64×4 matrices correlate with the native occurrences more strongly than do those generated by 4×4 matrices, except at low GC content where they do it similarly. Furthermore, the slopes for 4fold and 6fold codons generated by 64×4s under selection fall remarkably consistently close to 1.0 while those generated by 4×4s tend to be closer to 0.0 (but 4×4-generated 4fold slopes are close to 1.0 at low GC). The 2fold and 2f-3aas occurrences generated by 4×4 or 64×4s intronic matrices correlate weakly and negatively with the native ones (but those from 4×4s do it more strongly, but negatively, between 0.45 and 0.5 GC). The 64×4s derived from coding-DNA substitutions generate only positive slopes, an improved fit for the two 2folds, and a slightly worse fit for 4- and 6-folds relative to the intronic 64×4s; while the corresponding 4×4s deliver much worse fit, except for the slightly better fit of the two 2folds (with near-zero slopes). Therefore in presence of Grantham selection on non-synonymous changes, the empirically estimated 64×4 matrices generate more realistic codon occurrences than do the 4×4 matrices.

Bottom Line: The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription.We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads.Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America. marcos.antezana@gmail.com

ABSTRACT
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.

Show MeSH