Limits...
Same-strand overlapping genes in bacteria: compositional determinants of phase bias.

Sabath N, Graur D, Landan G - Biol. Direct (2008)

Bottom Line: In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps.We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1.We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA. nsabath@uh.edu

ABSTRACT

Background: Same-strand overlapping genes may occur in frameshifts of one (phase 1) or two nucleotides (phase 2). In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps. This bias was explained by either genomic location or an unspecified selection advantage. Models that focused on the ability of the two genes to evolve independently did not predict this phase bias. Here, we propose that a purely compositional model explains the phase bias in a more parsimonious manner. Same-strand overlapping genes may arise through either a mutation at the termination codon of the upstream gene or a mutation at the initiation codon of the downstream gene. We hypothesized that given these two scenarios, the frequencies of initiation and termination codons in the two phases may determine the number for overlapping genes.

Results: We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1. We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases. We show that the frequencies of start codons in each of the two phases, and, hence, the potential for the creation of overlapping genes, are determined by a universal amino-acid frequency and species-specific codon usage, leading to a correlation between long phase-1 overlaps and genomic GC content.

Conclusion: Our model explains the phase bias in same-strand overlapping genes by compositional factors without invoking selection. Therefore, it can be used as a model of neutral evolution to test selection hypotheses concerning the evolution of overlapping genes.

Show MeSH
Orientations and phases of gene overlap. Genes can overlap on the same strand and on the opposite strand. The reference gene in a pair of overlapping genes is called phase 0. Same-strand overlaps can be in two phases (1 and 2); opposite-strand overlaps can be in three phases (0, 1, and 2). First and second codon positions, in which ~5% and 0% of the changes are synonymous, are marked in red. Third codon positions, in which ~70% of the changes are synonymous, are marked in blue.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2542354&req=5

Figure 1: Orientations and phases of gene overlap. Genes can overlap on the same strand and on the opposite strand. The reference gene in a pair of overlapping genes is called phase 0. Same-strand overlaps can be in two phases (1 and 2); opposite-strand overlaps can be in three phases (0, 1, and 2). First and second codon positions, in which ~5% and 0% of the changes are synonymous, are marked in red. Third codon positions, in which ~70% of the changes are synonymous, are marked in blue.

Mentions: Overlapping genes were found in all cellular domains of life, as well as in viruses [1-3]. Overlapping genes are thought to have unique evolutionary constraints [4,5] and regulatory properties [6,7]. Genes can overlap on the same strand (→ →) or on the complementary strand ("tail-to-tail" → ←, or "head-to-head" ← →, Figure 1). Different nomenclatures have been used in the literature to denote "same-strand" ("unidirectional," "codirected," "parallel," and "tandem"), "tail-to-tail" ("convergent," "anti-parallel," and "end-on"), and "head-to-head" ("divergent" and "head-on") overlapping genes [8-11]. Here, we use the self-explanatory terms "same-strand" and "opposite-strand" overlapping genes.


Same-strand overlapping genes in bacteria: compositional determinants of phase bias.

Sabath N, Graur D, Landan G - Biol. Direct (2008)

Orientations and phases of gene overlap. Genes can overlap on the same strand and on the opposite strand. The reference gene in a pair of overlapping genes is called phase 0. Same-strand overlaps can be in two phases (1 and 2); opposite-strand overlaps can be in three phases (0, 1, and 2). First and second codon positions, in which ~5% and 0% of the changes are synonymous, are marked in red. Third codon positions, in which ~70% of the changes are synonymous, are marked in blue.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2542354&req=5

Figure 1: Orientations and phases of gene overlap. Genes can overlap on the same strand and on the opposite strand. The reference gene in a pair of overlapping genes is called phase 0. Same-strand overlaps can be in two phases (1 and 2); opposite-strand overlaps can be in three phases (0, 1, and 2). First and second codon positions, in which ~5% and 0% of the changes are synonymous, are marked in red. Third codon positions, in which ~70% of the changes are synonymous, are marked in blue.
Mentions: Overlapping genes were found in all cellular domains of life, as well as in viruses [1-3]. Overlapping genes are thought to have unique evolutionary constraints [4,5] and regulatory properties [6,7]. Genes can overlap on the same strand (→ →) or on the complementary strand ("tail-to-tail" → ←, or "head-to-head" ← →, Figure 1). Different nomenclatures have been used in the literature to denote "same-strand" ("unidirectional," "codirected," "parallel," and "tandem"), "tail-to-tail" ("convergent," "anti-parallel," and "end-on"), and "head-to-head" ("divergent" and "head-on") overlapping genes [8-11]. Here, we use the self-explanatory terms "same-strand" and "opposite-strand" overlapping genes.

Bottom Line: In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps.We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1.We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA. nsabath@uh.edu

ABSTRACT

Background: Same-strand overlapping genes may occur in frameshifts of one (phase 1) or two nucleotides (phase 2). In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps. This bias was explained by either genomic location or an unspecified selection advantage. Models that focused on the ability of the two genes to evolve independently did not predict this phase bias. Here, we propose that a purely compositional model explains the phase bias in a more parsimonious manner. Same-strand overlapping genes may arise through either a mutation at the termination codon of the upstream gene or a mutation at the initiation codon of the downstream gene. We hypothesized that given these two scenarios, the frequencies of initiation and termination codons in the two phases may determine the number for overlapping genes.

Results: We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1. We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases. We show that the frequencies of start codons in each of the two phases, and, hence, the potential for the creation of overlapping genes, are determined by a universal amino-acid frequency and species-specific codon usage, leading to a correlation between long phase-1 overlaps and genomic GC content.

Conclusion: Our model explains the phase bias in same-strand overlapping genes by compositional factors without invoking selection. Therefore, it can be used as a model of neutral evolution to test selection hypotheses concerning the evolution of overlapping genes.

Show MeSH