Limits...
Complete mitochondrial genome sequence of three Tetrahymena species reveals mutation hot spots and accelerated nonsynonymous substitutions in Ymf genes.

Moradian MM, Beglaryan D, Skozylas JM, Kerikorian V - PLoS ONE (2007)

Bottom Line: We also found distinct features in Mt genome of T.paravorax despite similar genome organization among these approximately 47 kb long linear genomes.Importantly, nucleotide substitution types and rates suggest possible reasons for not being able to find homologues for Ymf genes.Additionally, comparative genomic analysis of complete Mt genomes is essential in identifying biologically significant motifs such as control regions.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California, United States of America. mmoradia@ucla.edu

ABSTRACT
The ciliate Tetrahymena, a model organism, contains divergent mitochondrial (Mt) genome with unusual properties, where half of its 44 genes still remain without a definitive function. These genes could be categorized into two major groups of KPC (known protein coding) and Ymf (genes without an identified function). To gain insights into the mechanisms underlying gene divergence and molecular evolution of Tetrahymena (T.) Mt genomes, we sequenced three Mt genomes of T.paravorax, T.pigmentosa, and T.malaccensis. These genomes were aligned and the analyses were carried out using several programs that calculate distance, nucleotide substitution (dn/ds), and their rate ratios (omega) on individual codon sites and via a sliding window approach. Comparative genomic analysis indicated a conserved putative transcription control sequence, a GC box, in a region where presumably transcription and replication initiate. We also found distinct features in Mt genome of T.paravorax despite similar genome organization among these approximately 47 kb long linear genomes. Another significant finding was the presence of at least one or more highly variable regions in Ymf genes where majority of substitutions were concentrated. These regions were mutation hotspots where elevated distances and the dn/ds ratios were primarily due to an increase in the number of nonsynonymous substitutions, suggesting relaxed selective constraint. However, in a few Ymf genes, accelerated rates of nonsynonymous substitutions may be due to positive selection. Similarly, on protein level the majority of amino acid replacements occurred in these regions. Ymf genes comprise half of the genes in Tetrahymena Mt genomes, so understanding why they have not been assigned definitive functions is an important aspect of molecular evolution. Importantly, nucleotide substitution types and rates suggest possible reasons for not being able to find homologues for Ymf genes. Additionally, comparative genomic analysis of complete Mt genomes is essential in identifying biologically significant motifs such as control regions.

Show MeSH

Related in: MedlinePlus

Nucleotide conservation in control region.Arrow denotes the conservation of a putative transcription control, a GC box, which is illustrated by a single elevated peak in control region. Nucleotide Change and G+C content are calculated in a hamming window of size 100. X-axis is the gene length, Y-axis-arbitrary units.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1919467&req=5

pone-0000650-g002: Nucleotide conservation in control region.Arrow denotes the conservation of a putative transcription control, a GC box, which is illustrated by a single elevated peak in control region. Nucleotide Change and G+C content are calculated in a hamming window of size 100. X-axis is the gene length, Y-axis-arbitrary units.

Mentions: Engberg and Nielsen mapped the origin of replication of rDNA of T.thermophila to a position close to the middle of the molecule [20]. Similarly the promoter region of the rDNA genes was shown to be located in the same region with a few conserved repeat sequences, which bind to Topoisomerase I [21]. Presence of the origin of replication and transcription in the middle of the rDNA molecule persuaded us to search for such elements in the longest intergenic region at the middle of the Tetrahymena Mt DNA. Comparative genomic analysis of the Cob and Ymf77 intergenic region, which was suspected to contain a control region since a bidirectional transcription initiates from this region, prompted us to search for a transcription control (GC box). Sequence alignments of this region, using the five Mt genomes of Tetrahymena species showed a 94 bp conserved block of sequence in this variable intergenic region (Figure S2). This conserved block contained a 27 bp highly conserved consensus sequence (AATAGCCGCACCTAAAAGAAAAAAATC). Among the 27 bases in these 5 species the consensus sequence had only 3 bases that deviated. In this region, which has 88% A+T, it is highly improbable that the putative GC control box (GCCGCACC) would occur by chance (Figure S2). When the probability of having an A or T is about 0.88 the probability of having eight nucleotides from which seven are either G or C is (0.12)7×(0.88)×8 = 2.5×E−06. Alignment of this intergenic region was littered with gaps, except for the conserved region containing the GC box. This conservation in a highly variable region suggests high selective pressure, which is common among functional elements in genomes. To further show the nucleotide conservation of this presumptive control region we plotted the G+C content and conservation at each nucleotide position (Figure 2). From left to right the genes are nad9, Ymf77, the intergenic region and the cob gene. The cob gene sequence is highly conserved and has a high G+C content, which is a bit variable. The nad9 gene is also conserved, but has a lower G+C value than even Ymf77. Most of Ymf77 is less conserved, due to the highly variable nature of this gene, than either cob or nad9 but has a relatively high G+C value in the carboxyl terminal region (to the left). The intergenic region is less conserved than the cob gene, but similar to Ymf77. The presumptive control region is evident by its conservation and high G+C value (Figure 2). A GC box in general contains the sequence GCCGCCC and is recognized by the factor SP1 [22]. SP1 presumably interacts with other transcription factors (TFs) to initiate transcription. The fact that the genes flanking this sequence are transcribed in opposite orientations increases the confidence that this sequence contains a control region. This region is the site from which bi-directional transcription of most of the Mt genes is initiated. It is likely that DNA replication also originates at this region. Although in bacterial chromosomes and plasmids initiation of DNA replication occurs at a single unique site (e.g., OriC), a consensus sequences for the origin of replication in mitochondria has not yet been established. In mammals and amphibia, some signals are located within the AT rich and variable control region for the replication initiation H-strand and for transcription of both H- and L-strands [23]. The origin of replication and transcription was suspected to be at the same region in Mt genome of P.aurelia [2]. Thus it is quite possible that the conserved block of 27 nucleotides mentioned above may also have a role in replication of Tetrahymena Mt genomes.


Complete mitochondrial genome sequence of three Tetrahymena species reveals mutation hot spots and accelerated nonsynonymous substitutions in Ymf genes.

Moradian MM, Beglaryan D, Skozylas JM, Kerikorian V - PLoS ONE (2007)

Nucleotide conservation in control region.Arrow denotes the conservation of a putative transcription control, a GC box, which is illustrated by a single elevated peak in control region. Nucleotide Change and G+C content are calculated in a hamming window of size 100. X-axis is the gene length, Y-axis-arbitrary units.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1919467&req=5

pone-0000650-g002: Nucleotide conservation in control region.Arrow denotes the conservation of a putative transcription control, a GC box, which is illustrated by a single elevated peak in control region. Nucleotide Change and G+C content are calculated in a hamming window of size 100. X-axis is the gene length, Y-axis-arbitrary units.
Mentions: Engberg and Nielsen mapped the origin of replication of rDNA of T.thermophila to a position close to the middle of the molecule [20]. Similarly the promoter region of the rDNA genes was shown to be located in the same region with a few conserved repeat sequences, which bind to Topoisomerase I [21]. Presence of the origin of replication and transcription in the middle of the rDNA molecule persuaded us to search for such elements in the longest intergenic region at the middle of the Tetrahymena Mt DNA. Comparative genomic analysis of the Cob and Ymf77 intergenic region, which was suspected to contain a control region since a bidirectional transcription initiates from this region, prompted us to search for a transcription control (GC box). Sequence alignments of this region, using the five Mt genomes of Tetrahymena species showed a 94 bp conserved block of sequence in this variable intergenic region (Figure S2). This conserved block contained a 27 bp highly conserved consensus sequence (AATAGCCGCACCTAAAAGAAAAAAATC). Among the 27 bases in these 5 species the consensus sequence had only 3 bases that deviated. In this region, which has 88% A+T, it is highly improbable that the putative GC control box (GCCGCACC) would occur by chance (Figure S2). When the probability of having an A or T is about 0.88 the probability of having eight nucleotides from which seven are either G or C is (0.12)7×(0.88)×8 = 2.5×E−06. Alignment of this intergenic region was littered with gaps, except for the conserved region containing the GC box. This conservation in a highly variable region suggests high selective pressure, which is common among functional elements in genomes. To further show the nucleotide conservation of this presumptive control region we plotted the G+C content and conservation at each nucleotide position (Figure 2). From left to right the genes are nad9, Ymf77, the intergenic region and the cob gene. The cob gene sequence is highly conserved and has a high G+C content, which is a bit variable. The nad9 gene is also conserved, but has a lower G+C value than even Ymf77. Most of Ymf77 is less conserved, due to the highly variable nature of this gene, than either cob or nad9 but has a relatively high G+C value in the carboxyl terminal region (to the left). The intergenic region is less conserved than the cob gene, but similar to Ymf77. The presumptive control region is evident by its conservation and high G+C value (Figure 2). A GC box in general contains the sequence GCCGCCC and is recognized by the factor SP1 [22]. SP1 presumably interacts with other transcription factors (TFs) to initiate transcription. The fact that the genes flanking this sequence are transcribed in opposite orientations increases the confidence that this sequence contains a control region. This region is the site from which bi-directional transcription of most of the Mt genes is initiated. It is likely that DNA replication also originates at this region. Although in bacterial chromosomes and plasmids initiation of DNA replication occurs at a single unique site (e.g., OriC), a consensus sequences for the origin of replication in mitochondria has not yet been established. In mammals and amphibia, some signals are located within the AT rich and variable control region for the replication initiation H-strand and for transcription of both H- and L-strands [23]. The origin of replication and transcription was suspected to be at the same region in Mt genome of P.aurelia [2]. Thus it is quite possible that the conserved block of 27 nucleotides mentioned above may also have a role in replication of Tetrahymena Mt genomes.

Bottom Line: We also found distinct features in Mt genome of T.paravorax despite similar genome organization among these approximately 47 kb long linear genomes.Importantly, nucleotide substitution types and rates suggest possible reasons for not being able to find homologues for Ymf genes.Additionally, comparative genomic analysis of complete Mt genomes is essential in identifying biologically significant motifs such as control regions.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California, United States of America. mmoradia@ucla.edu

ABSTRACT
The ciliate Tetrahymena, a model organism, contains divergent mitochondrial (Mt) genome with unusual properties, where half of its 44 genes still remain without a definitive function. These genes could be categorized into two major groups of KPC (known protein coding) and Ymf (genes without an identified function). To gain insights into the mechanisms underlying gene divergence and molecular evolution of Tetrahymena (T.) Mt genomes, we sequenced three Mt genomes of T.paravorax, T.pigmentosa, and T.malaccensis. These genomes were aligned and the analyses were carried out using several programs that calculate distance, nucleotide substitution (dn/ds), and their rate ratios (omega) on individual codon sites and via a sliding window approach. Comparative genomic analysis indicated a conserved putative transcription control sequence, a GC box, in a region where presumably transcription and replication initiate. We also found distinct features in Mt genome of T.paravorax despite similar genome organization among these approximately 47 kb long linear genomes. Another significant finding was the presence of at least one or more highly variable regions in Ymf genes where majority of substitutions were concentrated. These regions were mutation hotspots where elevated distances and the dn/ds ratios were primarily due to an increase in the number of nonsynonymous substitutions, suggesting relaxed selective constraint. However, in a few Ymf genes, accelerated rates of nonsynonymous substitutions may be due to positive selection. Similarly, on protein level the majority of amino acid replacements occurred in these regions. Ymf genes comprise half of the genes in Tetrahymena Mt genomes, so understanding why they have not been assigned definitive functions is an important aspect of molecular evolution. Importantly, nucleotide substitution types and rates suggest possible reasons for not being able to find homologues for Ymf genes. Additionally, comparative genomic analysis of complete Mt genomes is essential in identifying biologically significant motifs such as control regions.

Show MeSH
Related in: MedlinePlus