Limits...
Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences.

Ivanov IP, Firth AE, Michel AM, Atkins JF, Baranov PV - Nucleic Acids Res. (2011)

Bottom Line: We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences.Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes.In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

View Article: PubMed Central - PubMed

Affiliation: BioSciences Institute, University College Cork, Cork, Ireland. iivanov@genetics.utah.edu

ABSTRACT
In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5' cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized--both for increased coding capacity and potentially also for novel regulatory mechanisms--remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5' untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

Show MeSH
Weblogo representation of the region surrounding the known and putative conserved non-AUG initiation sites in humans. Numbering is relative to the first nucleotide of the start codon. (A) Representation for the 42 sequences with newly identified extensions. (B) Representation for the 17 sequences with previously identified and conserved extensions. (C) Representation of all AUG start sites of humans [the frequencies for nucleotide occurrence at each position for the human mRNAs were obtained from the Transterm database (73)].
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105428&req=5

Figure 6: Weblogo representation of the region surrounding the known and putative conserved non-AUG initiation sites in humans. Numbering is relative to the first nucleotide of the start codon. (A) Representation for the 42 sequences with newly identified extensions. (B) Representation for the 17 sequences with previously identified and conserved extensions. (C) Representation of all AUG start sites of humans [the frequencies for nucleotide occurrence at each position for the human mRNAs were obtained from the Transterm database (73)].

Mentions: Of the 9 possible codons that differ from AUG in a single position, 5 are used to initiate the extensions of the 17 known cases that passed our qualitative analysis for conservation of the extension. By far the most commonly used is CUG, 10 times (59%), followed by ACG, AUA and UUG with two occurrences of each, and GUG with one occurrence. The distribution of the putative initiation codons of the 42 new candidates in humans is not radically different (notwithstanding the potential for observer bias in locating the precise initiation codon in a small number of cases). Seven of the nine possible codons appear to be utilized, with only the very inefficiently recognized AGG and AAG not used. The distribution of the 7 used is: 15 CUGs (36%), 7 ACGs, 6 GUGs, 5 AUUs, 4 each of UUG and AUA and, finally, 1 AUC. This order correlates with the efficiency of initiation for each non-AUG codon, with CUG the most efficient of them all (16). In addition to the identity of the initiation codon, the initiation context of the 17 previously known examples that passed the screening process and the 42 newly identified candidates were examined (Figure 6). Once again the pattern is similar in both cases though the previously identified cases are closer to the optimal context for mammalian genes.Figure 6.


Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences.

Ivanov IP, Firth AE, Michel AM, Atkins JF, Baranov PV - Nucleic Acids Res. (2011)

Weblogo representation of the region surrounding the known and putative conserved non-AUG initiation sites in humans. Numbering is relative to the first nucleotide of the start codon. (A) Representation for the 42 sequences with newly identified extensions. (B) Representation for the 17 sequences with previously identified and conserved extensions. (C) Representation of all AUG start sites of humans [the frequencies for nucleotide occurrence at each position for the human mRNAs were obtained from the Transterm database (73)].
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105428&req=5

Figure 6: Weblogo representation of the region surrounding the known and putative conserved non-AUG initiation sites in humans. Numbering is relative to the first nucleotide of the start codon. (A) Representation for the 42 sequences with newly identified extensions. (B) Representation for the 17 sequences with previously identified and conserved extensions. (C) Representation of all AUG start sites of humans [the frequencies for nucleotide occurrence at each position for the human mRNAs were obtained from the Transterm database (73)].
Mentions: Of the 9 possible codons that differ from AUG in a single position, 5 are used to initiate the extensions of the 17 known cases that passed our qualitative analysis for conservation of the extension. By far the most commonly used is CUG, 10 times (59%), followed by ACG, AUA and UUG with two occurrences of each, and GUG with one occurrence. The distribution of the putative initiation codons of the 42 new candidates in humans is not radically different (notwithstanding the potential for observer bias in locating the precise initiation codon in a small number of cases). Seven of the nine possible codons appear to be utilized, with only the very inefficiently recognized AGG and AAG not used. The distribution of the 7 used is: 15 CUGs (36%), 7 ACGs, 6 GUGs, 5 AUUs, 4 each of UUG and AUA and, finally, 1 AUC. This order correlates with the efficiency of initiation for each non-AUG codon, with CUG the most efficient of them all (16). In addition to the identity of the initiation codon, the initiation context of the 17 previously known examples that passed the screening process and the 42 newly identified candidates were examined (Figure 6). Once again the pattern is similar in both cases though the previously identified cases are closer to the optimal context for mammalian genes.Figure 6.

Bottom Line: We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences.Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes.In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

View Article: PubMed Central - PubMed

Affiliation: BioSciences Institute, University College Cork, Cork, Ireland. iivanov@genetics.utah.edu

ABSTRACT
In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5' cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized--both for increased coding capacity and potentially also for novel regulatory mechanisms--remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5' untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

Show MeSH