Limits...
Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences.

Ivanov IP, Firth AE, Michel AM, Atkins JF, Baranov PV - Nucleic Acids Res. (2011)

Bottom Line: We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences.Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes.In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

View Article: PubMed Central - PubMed

Affiliation: BioSciences Institute, University College Cork, Cork, Ireland. iivanov@genetics.utah.edu

ABSTRACT
In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5' cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized--both for increased coding capacity and potentially also for novel regulatory mechanisms--remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5' untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

Show MeSH
Histogram of Ka/Ks values for mRNA sequences with known 5′ extensions. White bars represent mRNAs for which alternative transcripts with extended CDSs are known and therefore corresponding extensions are known to be translated in alternative transcripts. Sequences of these extensions are expected to evolve as protein coding sequences and were used as an internal control in this study. Black bars represent the remaining mRNAs for which it is not known whether alternative mRNA isoforms exist. Curves indicate the number of genes (y-axis) with Ka/Ks below a particular value (x-axis).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3105428&req=5

Figure 3: Histogram of Ka/Ks values for mRNA sequences with known 5′ extensions. White bars represent mRNAs for which alternative transcripts with extended CDSs are known and therefore corresponding extensions are known to be translated in alternative transcripts. Sequences of these extensions are expected to evolve as protein coding sequences and were used as an internal control in this study. Black bars represent the remaining mRNAs for which it is not known whether alternative mRNA isoforms exist. Curves indicate the number of genes (y-axis) with Ka/Ks below a particular value (x-axis).

Mentions: After obtaining Ka/Ks values for P5EC regions, we generated a set of sequences where the presence of a P5EC is due to the existence of alternative splice variants (see ‘Materials and Methods’ section). We compared the distribution of Ka/Ks values for these P5ECs with the Ka/Ks values for the remaining P5ECs. The distribution is shown in Figure 3 (see Supplementary Data for the actual values for each sequence). The distribution of Ka/Ks values for mRNAs with known alternative transcript variants (containing CDS 5′ extensions) is significantly sharper than the distribution of Ka/Ks values for the rest of the P5ECs, with the great majority of them falling under 0.2. Therefore, Figure 3 clearly illustrates that Ka/Ks can be used as a predictor of bona fide CDS 5′ extensions. Figure 4 shows scatter plot distributions of Ka/Ks ratios for sequences from both data sets in relation to the length of P5EC (upper panels) and the level of identity between mouse and human orthologs at the protein level. While P5ECs from both datasets have highly variable length, it is clear that those resulting from alternative transcript variants have, on average, higher identity at the protein level as well as lower Ka/Ks values. While low Ka/Ks ratio and high protein identity are good indicators of translated P5ECs, high Ka/Ks ratio and low protein identity does not necessarily means that a P5EC is not translated. This is because, at this stage of the analysis, the statistics were calculated for the entire region between the annotated AUG codon and the nearest in-frame stop codon in the 5′-UTR. However, if alternative initiation occurs closer to the 3′-end of a P5EC, then the region of the P5EC upstream of the ATIS would not evolve under the constraints of protein coding evolution and the cumulative values of Ka/Ks and of protein identity for the entire region would be intermediate between values typical of coding and non-coding sequences. Therefore, we used a relatively relaxed Ka/Ks ratio threshold for selecting the candidates for further detailed analysis (see ‘Materials and Methods’ section).Figure 3.


Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences.

Ivanov IP, Firth AE, Michel AM, Atkins JF, Baranov PV - Nucleic Acids Res. (2011)

Histogram of Ka/Ks values for mRNA sequences with known 5′ extensions. White bars represent mRNAs for which alternative transcripts with extended CDSs are known and therefore corresponding extensions are known to be translated in alternative transcripts. Sequences of these extensions are expected to evolve as protein coding sequences and were used as an internal control in this study. Black bars represent the remaining mRNAs for which it is not known whether alternative mRNA isoforms exist. Curves indicate the number of genes (y-axis) with Ka/Ks below a particular value (x-axis).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3105428&req=5

Figure 3: Histogram of Ka/Ks values for mRNA sequences with known 5′ extensions. White bars represent mRNAs for which alternative transcripts with extended CDSs are known and therefore corresponding extensions are known to be translated in alternative transcripts. Sequences of these extensions are expected to evolve as protein coding sequences and were used as an internal control in this study. Black bars represent the remaining mRNAs for which it is not known whether alternative mRNA isoforms exist. Curves indicate the number of genes (y-axis) with Ka/Ks below a particular value (x-axis).
Mentions: After obtaining Ka/Ks values for P5EC regions, we generated a set of sequences where the presence of a P5EC is due to the existence of alternative splice variants (see ‘Materials and Methods’ section). We compared the distribution of Ka/Ks values for these P5ECs with the Ka/Ks values for the remaining P5ECs. The distribution is shown in Figure 3 (see Supplementary Data for the actual values for each sequence). The distribution of Ka/Ks values for mRNAs with known alternative transcript variants (containing CDS 5′ extensions) is significantly sharper than the distribution of Ka/Ks values for the rest of the P5ECs, with the great majority of them falling under 0.2. Therefore, Figure 3 clearly illustrates that Ka/Ks can be used as a predictor of bona fide CDS 5′ extensions. Figure 4 shows scatter plot distributions of Ka/Ks ratios for sequences from both data sets in relation to the length of P5EC (upper panels) and the level of identity between mouse and human orthologs at the protein level. While P5ECs from both datasets have highly variable length, it is clear that those resulting from alternative transcript variants have, on average, higher identity at the protein level as well as lower Ka/Ks values. While low Ka/Ks ratio and high protein identity are good indicators of translated P5ECs, high Ka/Ks ratio and low protein identity does not necessarily means that a P5EC is not translated. This is because, at this stage of the analysis, the statistics were calculated for the entire region between the annotated AUG codon and the nearest in-frame stop codon in the 5′-UTR. However, if alternative initiation occurs closer to the 3′-end of a P5EC, then the region of the P5EC upstream of the ATIS would not evolve under the constraints of protein coding evolution and the cumulative values of Ka/Ks and of protein identity for the entire region would be intermediate between values typical of coding and non-coding sequences. Therefore, we used a relatively relaxed Ka/Ks ratio threshold for selecting the candidates for further detailed analysis (see ‘Materials and Methods’ section).Figure 3.

Bottom Line: We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences.Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes.In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

View Article: PubMed Central - PubMed

Affiliation: BioSciences Institute, University College Cork, Cork, Ireland. iivanov@genetics.utah.edu

ABSTRACT
In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5' cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized--both for increased coding capacity and potentially also for novel regulatory mechanisms--remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5' untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

Show MeSH