Limits...
Nuclear genetic codes with a different meaning of the UAG and the UAA codon

View Article: PubMed Central - PubMed

ABSTRACT

Background: Departures from the standard genetic code in eukaryotic nuclear genomes are known for only a handful of lineages and only a few genetic code variants seem to exist outside the ciliates, the most creative group in this regard. Most frequent code modifications entail reassignment of the UAG and UAA codons, with evidence for at least 13 independent cases of a coordinated change in the meaning of both codons. However, no change affecting each of the two codons separately has been documented, suggesting the existence of underlying evolutionary or mechanistic constraints.

Results: Here, we present the discovery of two new variants of the nuclear genetic code, in which UAG is translated as an amino acid while UAA is kept as a termination codon (along with UGA). The first variant occurs in an organism noticed in a (meta)transcriptome from the heteropteran Lygus hesperus and demonstrated to be a novel insect-dwelling member of Rhizaria (specifically Sainouroidea). This first documented case of a rhizarian with a non-canonical genetic code employs UAG to encode leucine and represents an unprecedented change among nuclear codon reassignments. The second code variant was found in the recently described anaerobic flagellate Iotanema spirale (Metamonada: Fornicata). Analyses of transcriptomic data revealed that I. spirale uses UAG to encode glutamine, similarly to the most common variant of a non-canonical code known from several unrelated eukaryotic groups, including hexamitin diplomonads (also a lineage of fornicates). However, in these organisms, UAA also encodes glutamine, whereas it is the primary termination codon in I. spirale. Along with phylogenetic evidence for distant relationship of I. spirale and hexamitins, this indicates two independent genetic code changes in fornicates.

Conclusions: Our study documents, for the first time, that evolutionary changes of the meaning of UAG and UAA codons in nuclear genomes can be decoupled and that the interpretation of the two codons by the cytoplasmic translation apparatus is mechanistically separable. The latter conclusion has interesting implications for possibilities of genetic code engineering in eukaryotes. We also present a newly developed generally applicable phylogeny-informed method for inferring the meaning of reassigned codons.

Electronic supplementary material: The online version of this article (doi:10.1186/s12915-017-0353-y) contains supplementary material, which is available to authorized users.

No MeSH data available.


Phylogenetic position of the organisms studied. a Phylogeny of eukaryotes including the rhizarian exLh based on 18S rDNA sequences. The maximum likelihood (ML) tree was inferred with RAxML using the GTRGAMMAI substitution model. The values at branches represent RAxML BS values followed by PhyloBayes posterior probabilities (GTRCAT model). b Phylogeny of Fornicata including I. spirale based on a concatenated data set of 18S rDNA and EF-1α, EF2, HSP70, and HSP90 protein sequences. The ML tree was inferred with RAxML using the substitution models GTRGAMMA (for 18S rDNA) and PROTGAMMALG4X (for the protein sequences). The values at branches represent RAxML BS values followed by PhyloBayes posterior probabilities (CAT Poisson model). Maximal support (100/1) is indicated with black dots. Asterisks indicate support values lower than 50% or 0.5, respectively, dashes mark branches in the ML tree that are absent from the PhyloBayes tree
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC5304391&req=5

Fig1: Phylogenetic position of the organisms studied. a Phylogeny of eukaryotes including the rhizarian exLh based on 18S rDNA sequences. The maximum likelihood (ML) tree was inferred with RAxML using the GTRGAMMAI substitution model. The values at branches represent RAxML BS values followed by PhyloBayes posterior probabilities (GTRCAT model). b Phylogeny of Fornicata including I. spirale based on a concatenated data set of 18S rDNA and EF-1α, EF2, HSP70, and HSP90 protein sequences. The ML tree was inferred with RAxML using the substitution models GTRGAMMA (for 18S rDNA) and PROTGAMMALG4X (for the protein sequences). The values at branches represent RAxML BS values followed by PhyloBayes posterior probabilities (CAT Poisson model). Maximal support (100/1) is indicated with black dots. Asterisks indicate support values lower than 50% or 0.5, respectively, dashes mark branches in the ML tree that are absent from the PhyloBayes tree

Mentions: To further illuminate the identity of the rhizarian exLh, we carried out a maximum likelihood (ML) phylogenetic analysis of a 70-protein supermatrix containing 54 orthologs from this organism (a subset of the 71 genes mentioned above, passing an initially imposed threshold of minimal sequence identity to orthologs from other eukaryotes). The resulting tree showed it branching with maximum support within Rhizaria, specifically within Filosa as a sister lineage to G. vulgaris (Additional file 2: Figure S1). However, only very few representatives of Filosa could be included in the phylogenomic analysis due to an extremely low number of sequenced genomes or transcriptomes of this diverse group. Therefore, we sought to determine the phylogenetic position of the rhizarian exLh within Filosa using the 18S ribosomal RNA (rRNA) gene, the most broadly sampled phylogenetic marker for rhizarian phylogeny. Searching the L. hesperus TSA sequences revealed two contigs that proved to be chimeric sequences consisting from artificially merged parts of a different origin, including a 3’ segment of an 18S rRNA dissimilar to any 18S rRNA sequence in the GenBank database (and very different from the L. hesperus 18S rRNA sequence present in the TSA as another contig; see Methods for details). Using the partial 18S rRNA sequence as a seed and original RNA-seq reads we assembled a complete 18S rRNA sequence that fell phylogenetically into Filosa, specifically into the group Sainouroidea (Fig. 1a). This clade comprises several poorly studied free-living or coprophilic flagellates and amoebae, including G. vulgaris [27]. The result of the 18S rRNA analysis is thus concordant with the phylogenomic analysis of protein sequences, supporting the assumption that the assembled 18S rRNA sequence comes from the same organism as the protein-coding transcripts. Furthermore, it specifically indicates that the rhizarian exLh is a previously undetected lineage of Sainouroidea, presumably a separate genus.Fig. 1


Nuclear genetic codes with a different meaning of the UAG and the UAA codon
Phylogenetic position of the organisms studied. a Phylogeny of eukaryotes including the rhizarian exLh based on 18S rDNA sequences. The maximum likelihood (ML) tree was inferred with RAxML using the GTRGAMMAI substitution model. The values at branches represent RAxML BS values followed by PhyloBayes posterior probabilities (GTRCAT model). b Phylogeny of Fornicata including I. spirale based on a concatenated data set of 18S rDNA and EF-1α, EF2, HSP70, and HSP90 protein sequences. The ML tree was inferred with RAxML using the substitution models GTRGAMMA (for 18S rDNA) and PROTGAMMALG4X (for the protein sequences). The values at branches represent RAxML BS values followed by PhyloBayes posterior probabilities (CAT Poisson model). Maximal support (100/1) is indicated with black dots. Asterisks indicate support values lower than 50% or 0.5, respectively, dashes mark branches in the ML tree that are absent from the PhyloBayes tree
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC5304391&req=5

Fig1: Phylogenetic position of the organisms studied. a Phylogeny of eukaryotes including the rhizarian exLh based on 18S rDNA sequences. The maximum likelihood (ML) tree was inferred with RAxML using the GTRGAMMAI substitution model. The values at branches represent RAxML BS values followed by PhyloBayes posterior probabilities (GTRCAT model). b Phylogeny of Fornicata including I. spirale based on a concatenated data set of 18S rDNA and EF-1α, EF2, HSP70, and HSP90 protein sequences. The ML tree was inferred with RAxML using the substitution models GTRGAMMA (for 18S rDNA) and PROTGAMMALG4X (for the protein sequences). The values at branches represent RAxML BS values followed by PhyloBayes posterior probabilities (CAT Poisson model). Maximal support (100/1) is indicated with black dots. Asterisks indicate support values lower than 50% or 0.5, respectively, dashes mark branches in the ML tree that are absent from the PhyloBayes tree
Mentions: To further illuminate the identity of the rhizarian exLh, we carried out a maximum likelihood (ML) phylogenetic analysis of a 70-protein supermatrix containing 54 orthologs from this organism (a subset of the 71 genes mentioned above, passing an initially imposed threshold of minimal sequence identity to orthologs from other eukaryotes). The resulting tree showed it branching with maximum support within Rhizaria, specifically within Filosa as a sister lineage to G. vulgaris (Additional file 2: Figure S1). However, only very few representatives of Filosa could be included in the phylogenomic analysis due to an extremely low number of sequenced genomes or transcriptomes of this diverse group. Therefore, we sought to determine the phylogenetic position of the rhizarian exLh within Filosa using the 18S ribosomal RNA (rRNA) gene, the most broadly sampled phylogenetic marker for rhizarian phylogeny. Searching the L. hesperus TSA sequences revealed two contigs that proved to be chimeric sequences consisting from artificially merged parts of a different origin, including a 3’ segment of an 18S rRNA dissimilar to any 18S rRNA sequence in the GenBank database (and very different from the L. hesperus 18S rRNA sequence present in the TSA as another contig; see Methods for details). Using the partial 18S rRNA sequence as a seed and original RNA-seq reads we assembled a complete 18S rRNA sequence that fell phylogenetically into Filosa, specifically into the group Sainouroidea (Fig. 1a). This clade comprises several poorly studied free-living or coprophilic flagellates and amoebae, including G. vulgaris [27]. The result of the 18S rRNA analysis is thus concordant with the phylogenomic analysis of protein sequences, supporting the assumption that the assembled 18S rRNA sequence comes from the same organism as the protein-coding transcripts. Furthermore, it specifically indicates that the rhizarian exLh is a previously undetected lineage of Sainouroidea, presumably a separate genus.Fig. 1

View Article: PubMed Central - PubMed

ABSTRACT

Background: Departures from the standard genetic code in eukaryotic nuclear genomes are known for only a handful of lineages and only a few genetic code variants seem to exist outside the ciliates, the most creative group in this regard. Most frequent code modifications entail reassignment of the UAG and UAA codons, with evidence for at least 13 independent cases of a coordinated change in the meaning of both codons. However, no change affecting each of the two codons separately has been documented, suggesting the existence of underlying evolutionary or mechanistic constraints.

Results: Here, we present the discovery of two new variants of the nuclear genetic code, in which UAG is translated as an amino acid while UAA is kept as a termination codon (along with UGA). The first variant occurs in an organism noticed in a (meta)transcriptome from the heteropteran Lygus hesperus and demonstrated to be a novel insect-dwelling member of Rhizaria (specifically Sainouroidea). This first documented case of a rhizarian with a non-canonical genetic code employs UAG to encode leucine and represents an unprecedented change among nuclear codon reassignments. The second code variant was found in the recently described anaerobic flagellate Iotanema spirale (Metamonada: Fornicata). Analyses of transcriptomic data revealed that I. spirale uses UAG to encode glutamine, similarly to the most common variant of a non-canonical code known from several unrelated eukaryotic groups, including hexamitin diplomonads (also a lineage of fornicates). However, in these organisms, UAA also encodes glutamine, whereas it is the primary termination codon in I. spirale. Along with phylogenetic evidence for distant relationship of I. spirale and hexamitins, this indicates two independent genetic code changes in fornicates.

Conclusions: Our study documents, for the first time, that evolutionary changes of the meaning of UAG and UAA codons in nuclear genomes can be decoupled and that the interpretation of the two codons by the cytoplasmic translation apparatus is mechanistically separable. The latter conclusion has interesting implications for possibilities of genetic code engineering in eukaryotes. We also present a newly developed generally applicable phylogeny-informed method for inferring the meaning of reassigned codons.

Electronic supplementary material: The online version of this article (doi:10.1186/s12915-017-0353-y) contains supplementary material, which is available to authorized users.

No MeSH data available.