The Recent De Novo Origin of Protein C-Termini.
Bottom Line: Because we study recent additions to potentially old genes, we are able to apply a variety of stringent quality filters to our annotations of what is a true protein-coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes.We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of them recent enough to still be polymorphic.Four of the Saccharomyces C-terminal extensions (to ADH1, ARP8, TPM2, and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure.
Affiliation: Department of Ecology & Evolutionary Biology, University of Arizona Present address: Aegis Sciences, Nashville, TN.Show MeSH
Related in: MedlinePlus
Mentions: One extremely long addition (714 amino acids added to YOL058W) appeared to be the result of a 288 bp deletion that removed nine C-terminal amino acids, the ancestral stop codon and all of the 3′-UTR. The deletion ended in the 5′-UTR of the downstream gene (YOL057W) in-frame with its annotated start codon. Thus, the bulk of the addition consists of the 711 amino acid long ORF of YOL057W. Translation of this gene fusion can occur if the combined ORF is present on a single long transcript. The complete deletion of any transcription termination signal in the first gene’s 3′-UTR made continuous transcription a very real possibility. Because gene fusions are exceedingly rare in yeast (Durrens et al. 2008), we were surprised by this finding, and subjected it to a high level of scrutiny. The putative fusion is found in two closely related sake strains of S. cerevisiae, Y9 and Y12 (Liti et al. 2009). A third sake strain K11, the next closest relative to Y9 and Y12, has a 33 nt deletion in ARG1 that is a subset of the 288 bp deletion. This smaller deletion results in a premature stop codon causing an eight amino acid C-terminal deletion. To verify the existence of a full length transcript spanning the deletion region within these strains, we obtained Y12 strain RNA-Seq data from Skelly et al. (2013) and mapped it back to the Y12 S. cerevisiae genome. Upon visual examination, the deletion region had well-aligned reads flanking the deletion but lacked high quality reads that unambiguously spanned the deletion. We therefore next aligned the Y12 RNA-Seq reads to an alternative version of the Y12 genome assembly into which we reinserted the 288 bp deletion sequence. The new alignment (fig. 3) revealed strong hits to the previously annotated UTR portions (David et al. 2006) of the 288 bp deletion region. Figure 3 is compatible with two distinct genes, with different transcription levels, and is entirely inconsistent with a gene fusion. The annotation of a 288 nt deletion is clearly an error in both the Y12 and the Y9 genome assemblies. The 288 deletion is flanked by short poly-A sequences, which might be responsible for this replicated error.Fig. 3.—
Affiliation: Department of Ecology & Evolutionary Biology, University of Arizona Present address: Aegis Sciences, Nashville, TN.