Limits...
Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences.

Ivanov IP, Firth AE, Michel AM, Atkins JF, Baranov PV - Nucleic Acids Res. (2011)

Bottom Line: We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences.Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes.In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

View Article: PubMed Central - PubMed

Affiliation: BioSciences Institute, University College Cork, Cork, Ireland. iivanov@genetics.utah.edu

ABSTRACT
In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5' cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized--both for increased coding capacity and potentially also for novel regulatory mechanisms--remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5' untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

Show MeSH
Five known molecular mechanisms responsible for the initiation of translation upstream of the first 5′ in-frame AUG codon. mRNAs are shown as horizontal lines. Dark grey boxes represent annotated CDS regions. Light grey boxes represent extensions of CDSs upstream of annotated AUG codons up to the closest in-frame stop codon. Black boxes denoted as P5EC represent upstream regions where codons in-frame with annotated CDSs evolve under purifying selection. Diagonal stripes are used to denote alternatively spliced exons.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105428&req=5

Figure 1: Five known molecular mechanisms responsible for the initiation of translation upstream of the first 5′ in-frame AUG codon. mRNAs are shown as horizontal lines. Dark grey boxes represent annotated CDS regions. Light grey boxes represent extensions of CDSs upstream of annotated AUG codons up to the closest in-frame stop codon. Black boxes denoted as P5EC represent upstream regions where codons in-frame with annotated CDSs evolve under purifying selection. Diagonal stripes are used to denote alternatively spliced exons.

Mentions: Most protein-coding sequences evolve under purifying selection and this feature can be used for detection of protein coding regions in nucleotide sequences (45). If, in a particular gene, translation initiates upstream of the annotated start codon, then the sequence located between the annotated start codon and the actual (upstream) start codon should evolve under constraints of purifying selection. For brevity, we will refer to such regions as P5ECs. We used the existence of P5ECs as an initial indicator of utilization of alternative initiator codons. In principle, the presence of a P5EC does not guarantee that an alternative in-frame start codon is used for initiation. Figure 1 illustrates five possibilities (examples for which are known) where the sequence upstream of, and in-frame with, an annotated start codon would evolve under purifying selection: (i) Initiation at an upstream in-frame non-AUG codon (ATIS), the subject of this study. (ii) Programmed Ribosomal Frameshifting (PRF), where initiation occurs at the start codon of a uORF and then ribosomes enter the main protein-coding ORF (pORF) by shifting reading frames at a specific location. The most prominent human examples are the three antizyme paralogs (46). (iii) Stop Codon Readthrough (SCR). Similar to the above but where the uORF and the pORF occur in the same translational phase and are separated by a single Stop Codon. No chromosomal genes are known to utilize this phenomenon in humans but, in flies, three examples have been experimentally identified (47) and >100 additional candidates have been identified by comparative sequence analysis (48). (iv) RNA editing. A start codon is generated post-transcriptionally by the insertion of a U between an A and a G, as has been suggested for the linker histon H1F0 and HMGN1 protein genes (49). (v) Alternative Splicing. An exon containing a start codon in one transcript variant could be skipped in another transcript variant. In this case the latter transcript would use an initiator codon located downstream of the start codon of the former transcript. However the region between the 3′-end of the Alternatively Spliced Exon (ASE) and the start codon used in the second transcript would still evolve under the constraints of protein coding sequence, because this region is translated in the first transcript variant. This fifth class of P5EC-containing mRNAs is the largest class. Therefore, during our analysis we paid particular attention to discriminate P5ECs occurring as a result of alternative splicing from those resulting from non-annotated translation events. The pipeline for the initial computational analysis of human Refseq mRNAs is outlined in Figure 2 and is described in detail in the ‘Materials and Methods’ section.Figure 1.


Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences.

Ivanov IP, Firth AE, Michel AM, Atkins JF, Baranov PV - Nucleic Acids Res. (2011)

Five known molecular mechanisms responsible for the initiation of translation upstream of the first 5′ in-frame AUG codon. mRNAs are shown as horizontal lines. Dark grey boxes represent annotated CDS regions. Light grey boxes represent extensions of CDSs upstream of annotated AUG codons up to the closest in-frame stop codon. Black boxes denoted as P5EC represent upstream regions where codons in-frame with annotated CDSs evolve under purifying selection. Diagonal stripes are used to denote alternatively spliced exons.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105428&req=5

Figure 1: Five known molecular mechanisms responsible for the initiation of translation upstream of the first 5′ in-frame AUG codon. mRNAs are shown as horizontal lines. Dark grey boxes represent annotated CDS regions. Light grey boxes represent extensions of CDSs upstream of annotated AUG codons up to the closest in-frame stop codon. Black boxes denoted as P5EC represent upstream regions where codons in-frame with annotated CDSs evolve under purifying selection. Diagonal stripes are used to denote alternatively spliced exons.
Mentions: Most protein-coding sequences evolve under purifying selection and this feature can be used for detection of protein coding regions in nucleotide sequences (45). If, in a particular gene, translation initiates upstream of the annotated start codon, then the sequence located between the annotated start codon and the actual (upstream) start codon should evolve under constraints of purifying selection. For brevity, we will refer to such regions as P5ECs. We used the existence of P5ECs as an initial indicator of utilization of alternative initiator codons. In principle, the presence of a P5EC does not guarantee that an alternative in-frame start codon is used for initiation. Figure 1 illustrates five possibilities (examples for which are known) where the sequence upstream of, and in-frame with, an annotated start codon would evolve under purifying selection: (i) Initiation at an upstream in-frame non-AUG codon (ATIS), the subject of this study. (ii) Programmed Ribosomal Frameshifting (PRF), where initiation occurs at the start codon of a uORF and then ribosomes enter the main protein-coding ORF (pORF) by shifting reading frames at a specific location. The most prominent human examples are the three antizyme paralogs (46). (iii) Stop Codon Readthrough (SCR). Similar to the above but where the uORF and the pORF occur in the same translational phase and are separated by a single Stop Codon. No chromosomal genes are known to utilize this phenomenon in humans but, in flies, three examples have been experimentally identified (47) and >100 additional candidates have been identified by comparative sequence analysis (48). (iv) RNA editing. A start codon is generated post-transcriptionally by the insertion of a U between an A and a G, as has been suggested for the linker histon H1F0 and HMGN1 protein genes (49). (v) Alternative Splicing. An exon containing a start codon in one transcript variant could be skipped in another transcript variant. In this case the latter transcript would use an initiator codon located downstream of the start codon of the former transcript. However the region between the 3′-end of the Alternatively Spliced Exon (ASE) and the start codon used in the second transcript would still evolve under the constraints of protein coding sequence, because this region is translated in the first transcript variant. This fifth class of P5EC-containing mRNAs is the largest class. Therefore, during our analysis we paid particular attention to discriminate P5ECs occurring as a result of alternative splicing from those resulting from non-annotated translation events. The pipeline for the initial computational analysis of human Refseq mRNAs is outlined in Figure 2 and is described in detail in the ‘Materials and Methods’ section.Figure 1.

Bottom Line: We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences.Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes.In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

View Article: PubMed Central - PubMed

Affiliation: BioSciences Institute, University College Cork, Cork, Ireland. iivanov@genetics.utah.edu

ABSTRACT
In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5' cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized--both for increased coding capacity and potentially also for novel regulatory mechanisms--remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5' untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

Show MeSH