Limits...
Genome-wide functional analysis of human 5' untranslated region introns.

Cenik C, Derti A, Mellor JC, Berriz GF, Roth FP - Genome Biol. (2010)

Bottom Line: Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories.Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner.While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.

View Article: PubMed Central - HTML - PubMed

Affiliation: Harvard Medical School, Department of Biological Chemistry and Molecular Pharmacology, 250 Longwood Avenue, SGMB-322, Boston, MA 02115, USA. cancenik@fas.harvard.edu.

ABSTRACT

Background: Approximately 35% of human genes contain introns within the 5' untranslated region (UTR). Introns in 5'UTRs differ from those in coding regions and 3'UTRs with respect to nucleotide composition, length distribution and density. Despite their presumed impact on gene regulation, the evolution and possible functions of 5'UTR introns remain largely unexplored.

Results: We performed a genome-scale computational analysis of 5'UTR introns in humans. We discovered that the most highly expressed genes tended to have short 5'UTR introns rather than having long 5'UTR introns or lacking 5'UTR introns entirely. Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories. In particular, genes with regulatory roles were surprisingly enriched in having 5'UTR introns. Finally, we analyzed the evolution of 5'UTR introns in non-receptor protein tyrosine kinases (NRTK), and identified a conserved DNA motif enriched within the 5'UTR introns of human NRTKs.

Conclusions: Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner. While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.

Show MeSH

Related in: MedlinePlus

Characterization of fundamental properties of 5'UTR introns. (a) Histogram of the total 5'UTR intron length. A well annotated set of RefSeq transcript IDs are used in this analysis and this histogram shows the distribution of the log10 of the total number of intronic nucleotides in the 5'UTR. (b) Distribution of the number of introns in the 5'UTR. The log10 of number of transcripts that have a given number of introns in their 5'UTR is shown. The number of transcripts with a given number of 5'UTR introns decreases exponentially. (c) Heat map depicting the relationship between total lengths of 5'UTR introns and 5'UTR exons. (d) Heat map depicting the relationship between total lengths of 5'UTR introns and non-5'UTR introns. In both heatmaps, darker shades of gray indicate more transcripts.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2864569&req=5

Figure 1: Characterization of fundamental properties of 5'UTR introns. (a) Histogram of the total 5'UTR intron length. A well annotated set of RefSeq transcript IDs are used in this analysis and this histogram shows the distribution of the log10 of the total number of intronic nucleotides in the 5'UTR. (b) Distribution of the number of introns in the 5'UTR. The log10 of number of transcripts that have a given number of introns in their 5'UTR is shown. The number of transcripts with a given number of 5'UTR introns decreases exponentially. (c) Heat map depicting the relationship between total lengths of 5'UTR introns and 5'UTR exons. (d) Heat map depicting the relationship between total lengths of 5'UTR introns and non-5'UTR introns. In both heatmaps, darker shades of gray indicate more transcripts.

Mentions: To investigate the functional properties of human 5UIs, we used NCBI's Reference Sequence (RefSeq) collection. These are curated, full-length sequences with annotated UTR boundaries, and expression data are available for many of them. The lack of a translation reading frame makes the computational prediction of splice sites in 5'UTRs inherently more difficult [37], necessitating the choice of such a validated set. In humans, approximately 8.5k (35%) out of 24.5k RefSeq mRNAs contained at least one intron in their 5'UTR (Additional file 1). Previous estimates of the percentage of genes with 5UIs ranged between 22% and 26% [18] and 38% [19] in humans, suggesting that the RefSeq collection had no major bias in terms of presence or absence of 5UIs compared to other previously used datasets. The distribution of total 5'UTR intronic length for genes in our dataset was also similar to that observed previously (Figure 1a). The inter-quartile range of total length of 5UIs within each gene was approximately 1.3 - 16 kb. Some 5UIs were extremely long -- 16% were longer than 27 kb, the length of the average protein coding gene in the human genome [38], and 5% were longer than 76 kb (Figure 1a). As previously reported [18,19], most genes had few 5UIs. More than 90% had a single intron, and the percentage of genes with two or more introns decreased exponentially (Figure 1b).


Genome-wide functional analysis of human 5' untranslated region introns.

Cenik C, Derti A, Mellor JC, Berriz GF, Roth FP - Genome Biol. (2010)

Characterization of fundamental properties of 5'UTR introns. (a) Histogram of the total 5'UTR intron length. A well annotated set of RefSeq transcript IDs are used in this analysis and this histogram shows the distribution of the log10 of the total number of intronic nucleotides in the 5'UTR. (b) Distribution of the number of introns in the 5'UTR. The log10 of number of transcripts that have a given number of introns in their 5'UTR is shown. The number of transcripts with a given number of 5'UTR introns decreases exponentially. (c) Heat map depicting the relationship between total lengths of 5'UTR introns and 5'UTR exons. (d) Heat map depicting the relationship between total lengths of 5'UTR introns and non-5'UTR introns. In both heatmaps, darker shades of gray indicate more transcripts.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2864569&req=5

Figure 1: Characterization of fundamental properties of 5'UTR introns. (a) Histogram of the total 5'UTR intron length. A well annotated set of RefSeq transcript IDs are used in this analysis and this histogram shows the distribution of the log10 of the total number of intronic nucleotides in the 5'UTR. (b) Distribution of the number of introns in the 5'UTR. The log10 of number of transcripts that have a given number of introns in their 5'UTR is shown. The number of transcripts with a given number of 5'UTR introns decreases exponentially. (c) Heat map depicting the relationship between total lengths of 5'UTR introns and 5'UTR exons. (d) Heat map depicting the relationship between total lengths of 5'UTR introns and non-5'UTR introns. In both heatmaps, darker shades of gray indicate more transcripts.
Mentions: To investigate the functional properties of human 5UIs, we used NCBI's Reference Sequence (RefSeq) collection. These are curated, full-length sequences with annotated UTR boundaries, and expression data are available for many of them. The lack of a translation reading frame makes the computational prediction of splice sites in 5'UTRs inherently more difficult [37], necessitating the choice of such a validated set. In humans, approximately 8.5k (35%) out of 24.5k RefSeq mRNAs contained at least one intron in their 5'UTR (Additional file 1). Previous estimates of the percentage of genes with 5UIs ranged between 22% and 26% [18] and 38% [19] in humans, suggesting that the RefSeq collection had no major bias in terms of presence or absence of 5UIs compared to other previously used datasets. The distribution of total 5'UTR intronic length for genes in our dataset was also similar to that observed previously (Figure 1a). The inter-quartile range of total length of 5UIs within each gene was approximately 1.3 - 16 kb. Some 5UIs were extremely long -- 16% were longer than 27 kb, the length of the average protein coding gene in the human genome [38], and 5% were longer than 76 kb (Figure 1a). As previously reported [18,19], most genes had few 5UIs. More than 90% had a single intron, and the percentage of genes with two or more introns decreased exponentially (Figure 1b).

Bottom Line: Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories.Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner.While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.

View Article: PubMed Central - HTML - PubMed

Affiliation: Harvard Medical School, Department of Biological Chemistry and Molecular Pharmacology, 250 Longwood Avenue, SGMB-322, Boston, MA 02115, USA. cancenik@fas.harvard.edu.

ABSTRACT

Background: Approximately 35% of human genes contain introns within the 5' untranslated region (UTR). Introns in 5'UTRs differ from those in coding regions and 3'UTRs with respect to nucleotide composition, length distribution and density. Despite their presumed impact on gene regulation, the evolution and possible functions of 5'UTR introns remain largely unexplored.

Results: We performed a genome-scale computational analysis of 5'UTR introns in humans. We discovered that the most highly expressed genes tended to have short 5'UTR introns rather than having long 5'UTR introns or lacking 5'UTR introns entirely. Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories. In particular, genes with regulatory roles were surprisingly enriched in having 5'UTR introns. Finally, we analyzed the evolution of 5'UTR introns in non-receptor protein tyrosine kinases (NRTK), and identified a conserved DNA motif enriched within the 5'UTR introns of human NRTKs.

Conclusions: Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner. While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.

Show MeSH
Related in: MedlinePlus