Limits...
Same-strand overlapping genes in bacteria: compositional determinants of phase bias.

Sabath N, Graur D, Landan G - Biol. Direct (2008)

Bottom Line: In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps.We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1.We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA. nsabath@uh.edu

ABSTRACT

Background: Same-strand overlapping genes may occur in frameshifts of one (phase 1) or two nucleotides (phase 2). In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps. This bias was explained by either genomic location or an unspecified selection advantage. Models that focused on the ability of the two genes to evolve independently did not predict this phase bias. Here, we propose that a purely compositional model explains the phase bias in a more parsimonious manner. Same-strand overlapping genes may arise through either a mutation at the termination codon of the upstream gene or a mutation at the initiation codon of the downstream gene. We hypothesized that given these two scenarios, the frequencies of initiation and termination codons in the two phases may determine the number for overlapping genes.

Results: We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1. We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases. We show that the frequencies of start codons in each of the two phases, and, hence, the potential for the creation of overlapping genes, are determined by a universal amino-acid frequency and species-specific codon usage, leading to a correlation between long phase-1 overlaps and genomic GC content.

Conclusion: Our model explains the phase bias in same-strand overlapping genes by compositional factors without invoking selection. Therefore, it can be used as a model of neutral evolution to test selection hypotheses concerning the evolution of overlapping genes.

Show MeSH
Mean frequencies of groups of amino acids in the 167 bacterial genomes plotted against genomic GC content. Mean frequency of amino acids, which are encoded by TGN, NAT, NGT, or NTT codons, are marked in red, blue, green, and black, respectively. NAT, NGT, and NTT codons may lend a dinucleotide to one of the start codons in phase 1. TGN codons may lend a dinucleotide to one of the start codons in phase 2.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2542354&req=5

Figure 5: Mean frequencies of groups of amino acids in the 167 bacterial genomes plotted against genomic GC content. Mean frequency of amino acids, which are encoded by TGN, NAT, NGT, or NTT codons, are marked in red, blue, green, and black, respectively. NAT, NGT, and NTT codons may lend a dinucleotide to one of the start codons in phase 1. TGN codons may lend a dinucleotide to one of the start codons in phase 2.

Mentions: The difference in start codon frequencies between phase 1 and phase 2 can be explained by the codons in phase 0 that may potentially lend a dinucleotide to a start codon (ATG, GTG, and TTG) in each of the phases. In phase 2, all start codons consist of phase-0 TGN codons, which may lend TG to form a phase-2 start codon. One of these codons, TGA, is a stop codon that cannot be a part of long overlap. The remaining three codons (TGT, TGC, TGG) encode for two amino acids (cysteine and tryptophan), which are among the rarest in protein-coding genes, with a mean frequency of ~1% (Table 2). In contrast, in phase 1, the amino acids coded by NAT, NGT, and NTT codons that may lend a dinucleotide to one of the start codons (ATG, GTG, and TTG, respectively), are found in moderate to high frequencies in proteins (Table 2). Interestingly, the abundance of NAT-, NGT-, and NTT-encoded amino acids is inversely correlated with the frequency of start codons (Table 2). Moreover, amino acids encoded by NAT codons which can form the most common start codon, ATG, appear in lower frequencies than amino acids encoded by NGT- and NTT-encoded amino acids. For all bacteria and for all GC contents the frequencies of amino acids coded by TGN codons are lower than each of the amino acid groups encoded by NAT, NGT, and NTT (Figure 5, all pairwise two-sample paired Student t-tests, p < 0.001).


Same-strand overlapping genes in bacteria: compositional determinants of phase bias.

Sabath N, Graur D, Landan G - Biol. Direct (2008)

Mean frequencies of groups of amino acids in the 167 bacterial genomes plotted against genomic GC content. Mean frequency of amino acids, which are encoded by TGN, NAT, NGT, or NTT codons, are marked in red, blue, green, and black, respectively. NAT, NGT, and NTT codons may lend a dinucleotide to one of the start codons in phase 1. TGN codons may lend a dinucleotide to one of the start codons in phase 2.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2542354&req=5

Figure 5: Mean frequencies of groups of amino acids in the 167 bacterial genomes plotted against genomic GC content. Mean frequency of amino acids, which are encoded by TGN, NAT, NGT, or NTT codons, are marked in red, blue, green, and black, respectively. NAT, NGT, and NTT codons may lend a dinucleotide to one of the start codons in phase 1. TGN codons may lend a dinucleotide to one of the start codons in phase 2.
Mentions: The difference in start codon frequencies between phase 1 and phase 2 can be explained by the codons in phase 0 that may potentially lend a dinucleotide to a start codon (ATG, GTG, and TTG) in each of the phases. In phase 2, all start codons consist of phase-0 TGN codons, which may lend TG to form a phase-2 start codon. One of these codons, TGA, is a stop codon that cannot be a part of long overlap. The remaining three codons (TGT, TGC, TGG) encode for two amino acids (cysteine and tryptophan), which are among the rarest in protein-coding genes, with a mean frequency of ~1% (Table 2). In contrast, in phase 1, the amino acids coded by NAT, NGT, and NTT codons that may lend a dinucleotide to one of the start codons (ATG, GTG, and TTG, respectively), are found in moderate to high frequencies in proteins (Table 2). Interestingly, the abundance of NAT-, NGT-, and NTT-encoded amino acids is inversely correlated with the frequency of start codons (Table 2). Moreover, amino acids encoded by NAT codons which can form the most common start codon, ATG, appear in lower frequencies than amino acids encoded by NGT- and NTT-encoded amino acids. For all bacteria and for all GC contents the frequencies of amino acids coded by TGN codons are lower than each of the amino acid groups encoded by NAT, NGT, and NTT (Figure 5, all pairwise two-sample paired Student t-tests, p < 0.001).

Bottom Line: In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps.We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1.We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA. nsabath@uh.edu

ABSTRACT

Background: Same-strand overlapping genes may occur in frameshifts of one (phase 1) or two nucleotides (phase 2). In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps. This bias was explained by either genomic location or an unspecified selection advantage. Models that focused on the ability of the two genes to evolve independently did not predict this phase bias. Here, we propose that a purely compositional model explains the phase bias in a more parsimonious manner. Same-strand overlapping genes may arise through either a mutation at the termination codon of the upstream gene or a mutation at the initiation codon of the downstream gene. We hypothesized that given these two scenarios, the frequencies of initiation and termination codons in the two phases may determine the number for overlapping genes.

Results: We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1. We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases. We show that the frequencies of start codons in each of the two phases, and, hence, the potential for the creation of overlapping genes, are determined by a universal amino-acid frequency and species-specific codon usage, leading to a correlation between long phase-1 overlaps and genomic GC content.

Conclusion: Our model explains the phase bias in same-strand overlapping genes by compositional factors without invoking selection. Therefore, it can be used as a model of neutral evolution to test selection hypotheses concerning the evolution of overlapping genes.

Show MeSH