Limits...
Complete mitochondrial genome sequence of three Tetrahymena species reveals mutation hot spots and accelerated nonsynonymous substitutions in Ymf genes.

Moradian MM, Beglaryan D, Skozylas JM, Kerikorian V - PLoS ONE (2007)

Bottom Line: We also found distinct features in Mt genome of T.paravorax despite similar genome organization among these approximately 47 kb long linear genomes.Importantly, nucleotide substitution types and rates suggest possible reasons for not being able to find homologues for Ymf genes.Additionally, comparative genomic analysis of complete Mt genomes is essential in identifying biologically significant motifs such as control regions.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California, United States of America. mmoradia@ucla.edu

ABSTRACT
The ciliate Tetrahymena, a model organism, contains divergent mitochondrial (Mt) genome with unusual properties, where half of its 44 genes still remain without a definitive function. These genes could be categorized into two major groups of KPC (known protein coding) and Ymf (genes without an identified function). To gain insights into the mechanisms underlying gene divergence and molecular evolution of Tetrahymena (T.) Mt genomes, we sequenced three Mt genomes of T.paravorax, T.pigmentosa, and T.malaccensis. These genomes were aligned and the analyses were carried out using several programs that calculate distance, nucleotide substitution (dn/ds), and their rate ratios (omega) on individual codon sites and via a sliding window approach. Comparative genomic analysis indicated a conserved putative transcription control sequence, a GC box, in a region where presumably transcription and replication initiate. We also found distinct features in Mt genome of T.paravorax despite similar genome organization among these approximately 47 kb long linear genomes. Another significant finding was the presence of at least one or more highly variable regions in Ymf genes where majority of substitutions were concentrated. These regions were mutation hotspots where elevated distances and the dn/ds ratios were primarily due to an increase in the number of nonsynonymous substitutions, suggesting relaxed selective constraint. However, in a few Ymf genes, accelerated rates of nonsynonymous substitutions may be due to positive selection. Similarly, on protein level the majority of amino acid replacements occurred in these regions. Ymf genes comprise half of the genes in Tetrahymena Mt genomes, so understanding why they have not been assigned definitive functions is an important aspect of molecular evolution. Importantly, nucleotide substitution types and rates suggest possible reasons for not being able to find homologues for Ymf genes. Additionally, comparative genomic analysis of complete Mt genomes is essential in identifying biologically significant motifs such as control regions.

Show MeSH

Related in: MedlinePlus

dn/ds rate ratio Variation in Ymf genes.Y-axis represents values for dn/ds rate ratios generated by SWAPSC. These ratios are from a codon-based alignment of all five Tetrahymena species. Comparisons are from a maximum window size of 20 amino acids. X-axis indicates types of variation for each codon. Plots G and H show variable regions throughout Ymf 67 and 77 genes. HS- mutation hot spots, S- saturation of synonymous substitutions, NS- negative selection, AdN- accelerated rate of nonsynonymous substitutions, PS- positive selection.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1919467&req=5

pone-0000650-g006: dn/ds rate ratio Variation in Ymf genes.Y-axis represents values for dn/ds rate ratios generated by SWAPSC. These ratios are from a codon-based alignment of all five Tetrahymena species. Comparisons are from a maximum window size of 20 amino acids. X-axis indicates types of variation for each codon. Plots G and H show variable regions throughout Ymf 67 and 77 genes. HS- mutation hot spots, S- saturation of synonymous substitutions, NS- negative selection, AdN- accelerated rate of nonsynonymous substitutions, PS- positive selection.

Mentions: A previous study suggested that in Mt genomes of C. elegans the dn/ds ratio increased by more than five fold when the effects of natural selection were minimized [27]. Our analysis, which showed on average an almost three fold increase in the Ymf dn/ds ratios, seems to support such a conclusion. However increased dn/ds ratios in C. elegans Mt genomes occurred throughout the entire Mt genome with little spatial preference. If minimized natural selection was the reason for increased dn/ds ratios in Tetrahymena Mt genomes then these ratios should have increased in most if not all of the KPC and Ymf genes. Yet, our analysis of dn/ds ratios using a sliding window program, which revealed substitution variations in these genes in detail, did not quite support minimized natural selection throughout the Tetrahymena Mt genome. The rapid divergence of Ymf genes and elevated dn/ds ratios were primarily due to presence of regions with accelerated rates of nonsynonymous mutations (AdN), which were detected by SWAPSC. In addition to identifying regions with AdN we were able to locate regions under positive selection, mutation hot spots, regions with saturated synonymous substitution, and negative selection at specific codon regions. The significant results of SWAPSC output, which identified the variable regions in some Ymf genes that seemed to contain mutation hotspots with accelerated nonsynonymous substitutions are shown in Figure 5 and 6. SWAPSC also uses a sliding window approach yet the program itself determines the most appropriate window size with a maximum of 20 amino acids per window. Hence there are some dn/ds value differences in figure 3 vs figures 5 and 6, which are due to usage of smaller window sizes and ω in SWAPSC. There were a few small regions in some Ymf genes that suggested positive selection yet they could not be considered as a major cause for such an extensive variation. Hence we reconsidered the possibility that these variable regions of Ymf genes, could be under positive selection. To confirm our argument we determined the dn/ds rate ratios in genes with accelerated nonsynonymous substitutions for each codon site using software from PAML package (see material and method). The codonml software in this package revealed likelihood ratios of positive selection or relaxed selective constraints along lineages based on the dn/ds rate ratios (ω) per individual codon for Ymf genes from all five genomes. The significant increases in ω were observed in some Ymf genes and Nad5 (Figure S3). In sum, results from four different software packages, which determined dn/ds, ω, mutation hotspots, accelerated rates of nonsynonymous mutations, and positive selection, indicated that the primary reason for presence of variable regions in Ymf genes were accelerated rates of nonsynonymous mutations. However cases of small sites under positive selection were present in parts of the Ymf 57, 60, 61, 64, 67, 68, 71, 74, 76, and 77 where the dn/ds were elevated and ω>1. We also calculated Tajima's D values for KPC and Ymf genes to detect selection. We found negative D values in all KPC and Ymf genes. But the significantly negative D values in variable regions of Ymf 57, 60, 61, 64, 67, 68, 71, 74, 76, and 77 stood out (Figure S4). These results were consistent with the results from regions presumably under positive selective pressure based on dn/ds and ω. We also found positive selection in the 5′ region of the Nad5 gene with significant dn/ds, ω, and Tajima's D values. Such variable regions with AdN along with more substitutions in Ymf genes could be the cause for our inability to find homologues for them. Thus we conclude that substitution types, numbers, patterns, and fixation rates support the idea that variable regions with AdN in Ymf genes in Tetrahymena Mt cause them to evolve so rapidly that they could not be assigned definitive functions based on sequence similarity or homology. Also presence of sites under positive selection in some Ymf genes contributed to such rapid evolution. The presence of regions with AdN in a few KPC genes (e.g., Nad5) did not weaken our argument since, unlike in Ymf genes, the remaining regions of the aforementioned KPC genes were highly conserved and had preserved their ancestral sequence (Figure 3 and S3).


Complete mitochondrial genome sequence of three Tetrahymena species reveals mutation hot spots and accelerated nonsynonymous substitutions in Ymf genes.

Moradian MM, Beglaryan D, Skozylas JM, Kerikorian V - PLoS ONE (2007)

dn/ds rate ratio Variation in Ymf genes.Y-axis represents values for dn/ds rate ratios generated by SWAPSC. These ratios are from a codon-based alignment of all five Tetrahymena species. Comparisons are from a maximum window size of 20 amino acids. X-axis indicates types of variation for each codon. Plots G and H show variable regions throughout Ymf 67 and 77 genes. HS- mutation hot spots, S- saturation of synonymous substitutions, NS- negative selection, AdN- accelerated rate of nonsynonymous substitutions, PS- positive selection.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1919467&req=5

pone-0000650-g006: dn/ds rate ratio Variation in Ymf genes.Y-axis represents values for dn/ds rate ratios generated by SWAPSC. These ratios are from a codon-based alignment of all five Tetrahymena species. Comparisons are from a maximum window size of 20 amino acids. X-axis indicates types of variation for each codon. Plots G and H show variable regions throughout Ymf 67 and 77 genes. HS- mutation hot spots, S- saturation of synonymous substitutions, NS- negative selection, AdN- accelerated rate of nonsynonymous substitutions, PS- positive selection.
Mentions: A previous study suggested that in Mt genomes of C. elegans the dn/ds ratio increased by more than five fold when the effects of natural selection were minimized [27]. Our analysis, which showed on average an almost three fold increase in the Ymf dn/ds ratios, seems to support such a conclusion. However increased dn/ds ratios in C. elegans Mt genomes occurred throughout the entire Mt genome with little spatial preference. If minimized natural selection was the reason for increased dn/ds ratios in Tetrahymena Mt genomes then these ratios should have increased in most if not all of the KPC and Ymf genes. Yet, our analysis of dn/ds ratios using a sliding window program, which revealed substitution variations in these genes in detail, did not quite support minimized natural selection throughout the Tetrahymena Mt genome. The rapid divergence of Ymf genes and elevated dn/ds ratios were primarily due to presence of regions with accelerated rates of nonsynonymous mutations (AdN), which were detected by SWAPSC. In addition to identifying regions with AdN we were able to locate regions under positive selection, mutation hot spots, regions with saturated synonymous substitution, and negative selection at specific codon regions. The significant results of SWAPSC output, which identified the variable regions in some Ymf genes that seemed to contain mutation hotspots with accelerated nonsynonymous substitutions are shown in Figure 5 and 6. SWAPSC also uses a sliding window approach yet the program itself determines the most appropriate window size with a maximum of 20 amino acids per window. Hence there are some dn/ds value differences in figure 3 vs figures 5 and 6, which are due to usage of smaller window sizes and ω in SWAPSC. There were a few small regions in some Ymf genes that suggested positive selection yet they could not be considered as a major cause for such an extensive variation. Hence we reconsidered the possibility that these variable regions of Ymf genes, could be under positive selection. To confirm our argument we determined the dn/ds rate ratios in genes with accelerated nonsynonymous substitutions for each codon site using software from PAML package (see material and method). The codonml software in this package revealed likelihood ratios of positive selection or relaxed selective constraints along lineages based on the dn/ds rate ratios (ω) per individual codon for Ymf genes from all five genomes. The significant increases in ω were observed in some Ymf genes and Nad5 (Figure S3). In sum, results from four different software packages, which determined dn/ds, ω, mutation hotspots, accelerated rates of nonsynonymous mutations, and positive selection, indicated that the primary reason for presence of variable regions in Ymf genes were accelerated rates of nonsynonymous mutations. However cases of small sites under positive selection were present in parts of the Ymf 57, 60, 61, 64, 67, 68, 71, 74, 76, and 77 where the dn/ds were elevated and ω>1. We also calculated Tajima's D values for KPC and Ymf genes to detect selection. We found negative D values in all KPC and Ymf genes. But the significantly negative D values in variable regions of Ymf 57, 60, 61, 64, 67, 68, 71, 74, 76, and 77 stood out (Figure S4). These results were consistent with the results from regions presumably under positive selective pressure based on dn/ds and ω. We also found positive selection in the 5′ region of the Nad5 gene with significant dn/ds, ω, and Tajima's D values. Such variable regions with AdN along with more substitutions in Ymf genes could be the cause for our inability to find homologues for them. Thus we conclude that substitution types, numbers, patterns, and fixation rates support the idea that variable regions with AdN in Ymf genes in Tetrahymena Mt cause them to evolve so rapidly that they could not be assigned definitive functions based on sequence similarity or homology. Also presence of sites under positive selection in some Ymf genes contributed to such rapid evolution. The presence of regions with AdN in a few KPC genes (e.g., Nad5) did not weaken our argument since, unlike in Ymf genes, the remaining regions of the aforementioned KPC genes were highly conserved and had preserved their ancestral sequence (Figure 3 and S3).

Bottom Line: We also found distinct features in Mt genome of T.paravorax despite similar genome organization among these approximately 47 kb long linear genomes.Importantly, nucleotide substitution types and rates suggest possible reasons for not being able to find homologues for Ymf genes.Additionally, comparative genomic analysis of complete Mt genomes is essential in identifying biologically significant motifs such as control regions.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California, United States of America. mmoradia@ucla.edu

ABSTRACT
The ciliate Tetrahymena, a model organism, contains divergent mitochondrial (Mt) genome with unusual properties, where half of its 44 genes still remain without a definitive function. These genes could be categorized into two major groups of KPC (known protein coding) and Ymf (genes without an identified function). To gain insights into the mechanisms underlying gene divergence and molecular evolution of Tetrahymena (T.) Mt genomes, we sequenced three Mt genomes of T.paravorax, T.pigmentosa, and T.malaccensis. These genomes were aligned and the analyses were carried out using several programs that calculate distance, nucleotide substitution (dn/ds), and their rate ratios (omega) on individual codon sites and via a sliding window approach. Comparative genomic analysis indicated a conserved putative transcription control sequence, a GC box, in a region where presumably transcription and replication initiate. We also found distinct features in Mt genome of T.paravorax despite similar genome organization among these approximately 47 kb long linear genomes. Another significant finding was the presence of at least one or more highly variable regions in Ymf genes where majority of substitutions were concentrated. These regions were mutation hotspots where elevated distances and the dn/ds ratios were primarily due to an increase in the number of nonsynonymous substitutions, suggesting relaxed selective constraint. However, in a few Ymf genes, accelerated rates of nonsynonymous substitutions may be due to positive selection. Similarly, on protein level the majority of amino acid replacements occurred in these regions. Ymf genes comprise half of the genes in Tetrahymena Mt genomes, so understanding why they have not been assigned definitive functions is an important aspect of molecular evolution. Importantly, nucleotide substitution types and rates suggest possible reasons for not being able to find homologues for Ymf genes. Additionally, comparative genomic analysis of complete Mt genomes is essential in identifying biologically significant motifs such as control regions.

Show MeSH
Related in: MedlinePlus