Limits...
Genome-wide functional analysis of human 5' untranslated region introns.

Cenik C, Derti A, Mellor JC, Berriz GF, Roth FP - Genome Biol. (2010)

Bottom Line: Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories.Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner.While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.

View Article: PubMed Central - HTML - PubMed

Affiliation: Harvard Medical School, Department of Biological Chemistry and Molecular Pharmacology, 250 Longwood Avenue, SGMB-322, Boston, MA 02115, USA. cancenik@fas.harvard.edu.

ABSTRACT

Background: Approximately 35% of human genes contain introns within the 5' untranslated region (UTR). Introns in 5'UTRs differ from those in coding regions and 3'UTRs with respect to nucleotide composition, length distribution and density. Despite their presumed impact on gene regulation, the evolution and possible functions of 5'UTR introns remain largely unexplored.

Results: We performed a genome-scale computational analysis of 5'UTR introns in humans. We discovered that the most highly expressed genes tended to have short 5'UTR introns rather than having long 5'UTR introns or lacking 5'UTR introns entirely. Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories. In particular, genes with regulatory roles were surprisingly enriched in having 5'UTR introns. Finally, we analyzed the evolution of 5'UTR introns in non-receptor protein tyrosine kinases (NRTK), and identified a conserved DNA motif enriched within the 5'UTR introns of human NRTKs.

Conclusions: Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner. While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.

Show MeSH

Related in: MedlinePlus

Analysis of variability in expression across tissues as a function of the total 5'UTR intron length. (a) Transcripts with low mean expression have higher normalized expression variability. A standardized measure of the variability in gene expression across tissues was calculated and plotted against the natural logarithm of mean expression level. The black vertical line represents the lowest 25th percentile in mean expression. Since transcripts with low levels of mean expression tend to exhibit an artificially high variability in expression, they are removed from further analysis. (b) Boxplot of the coefficient of variation (standard deviation-to-mean ratio) of genes grouped by the total length of 5'UTR intron. The width of the boxes represents the relative number of data points in each category. There are no apparent differences between the three groups (c) Boxplot of log10 of total 5'UTR intron length of genes grouped by their across-tissue variability. Genes are divided into six categories depending on their coefficient of variation. Error bars correspond to standard deviation of the mean. No obvious dependence of expression variability to total 5UI length can be observed except for the most highly variable genes, which tend to have slightly shorter 5'UTR introns. (d) Boxplot of log10 of total 5'UTR intron length for gene groups defined by the number of tissues in which expression of each gene was detected. A gene was defined to have detectable expression in a given tissues if its expression was higher than the 25th percentile of mean expression of all genes. We found no differences in total 5'UTR intron length amongst the different gene groups. (e) Histogram of number of genes divided by the presence of 5'UTR introns and by the number of tissues in which expression was detected. The number of tissues in which expression was detected was independent of the presence of 5'UTR introns.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2864569&req=5

Figure 3: Analysis of variability in expression across tissues as a function of the total 5'UTR intron length. (a) Transcripts with low mean expression have higher normalized expression variability. A standardized measure of the variability in gene expression across tissues was calculated and plotted against the natural logarithm of mean expression level. The black vertical line represents the lowest 25th percentile in mean expression. Since transcripts with low levels of mean expression tend to exhibit an artificially high variability in expression, they are removed from further analysis. (b) Boxplot of the coefficient of variation (standard deviation-to-mean ratio) of genes grouped by the total length of 5'UTR intron. The width of the boxes represents the relative number of data points in each category. There are no apparent differences between the three groups (c) Boxplot of log10 of total 5'UTR intron length of genes grouped by their across-tissue variability. Genes are divided into six categories depending on their coefficient of variation. Error bars correspond to standard deviation of the mean. No obvious dependence of expression variability to total 5UI length can be observed except for the most highly variable genes, which tend to have slightly shorter 5'UTR introns. (d) Boxplot of log10 of total 5'UTR intron length for gene groups defined by the number of tissues in which expression of each gene was detected. A gene was defined to have detectable expression in a given tissues if its expression was higher than the 25th percentile of mean expression of all genes. We found no differences in total 5'UTR intron length amongst the different gene groups. (e) Histogram of number of genes divided by the presence of 5'UTR introns and by the number of tissues in which expression was detected. The number of tissues in which expression was detected was independent of the presence of 5'UTR introns.

Mentions: where CVx is the CV of expression of gene x across all tissues, yx represents the vector of CV values for all 201 genes in a window centered around gene x, while μ1/2 and MAD represent the median and median absolute deviation, respectively. As expected, genes with low expression tended to have much more variability across tissues (Figure 3a). Based on the observed trend line, the genes with the lowest 25% expression were removed from further analysis (Figure 3a). The remaining genes were sorted into three categories with respect to the total intronic 5'UTR length as before (short, 0 to 25%; intermediate, 25 to 75%; long, 75 to 100%). We found no significant differences between these groups with respect to inter-tissue variability as measured by the coefficient of variation (Figure 3b; Kruskal-Wallis rank sum test, df = 2, P = 0.23). We then examined the lengths of the introns as a function of variability in expression (Figure 3c). The genes with the highest 5% variability across tissues did not differ from the other genes with respect to their 5UI lengths (Wilcoxon rank sum test, P = 0.07, 95% confidence interval between -0.008 and 0.25), but the genes with highest 1% across-tissue variability tended to have slightly shorter 5UIs (Wilcoxon rank sum test, P = 0.006, 95% confidence interval between -0.67 and -0.11). Genes with short 5UIs were also overrepresented in the top 1% across-tissue variability category (Fisher's Exact Test, P = 0.005, odds-ratio = 2.7). Our results suggested that length of the 5UI was not a major factor in determining across-tissue variability but there was a preference for shorter 5UIs in the most variable genes.


Genome-wide functional analysis of human 5' untranslated region introns.

Cenik C, Derti A, Mellor JC, Berriz GF, Roth FP - Genome Biol. (2010)

Analysis of variability in expression across tissues as a function of the total 5'UTR intron length. (a) Transcripts with low mean expression have higher normalized expression variability. A standardized measure of the variability in gene expression across tissues was calculated and plotted against the natural logarithm of mean expression level. The black vertical line represents the lowest 25th percentile in mean expression. Since transcripts with low levels of mean expression tend to exhibit an artificially high variability in expression, they are removed from further analysis. (b) Boxplot of the coefficient of variation (standard deviation-to-mean ratio) of genes grouped by the total length of 5'UTR intron. The width of the boxes represents the relative number of data points in each category. There are no apparent differences between the three groups (c) Boxplot of log10 of total 5'UTR intron length of genes grouped by their across-tissue variability. Genes are divided into six categories depending on their coefficient of variation. Error bars correspond to standard deviation of the mean. No obvious dependence of expression variability to total 5UI length can be observed except for the most highly variable genes, which tend to have slightly shorter 5'UTR introns. (d) Boxplot of log10 of total 5'UTR intron length for gene groups defined by the number of tissues in which expression of each gene was detected. A gene was defined to have detectable expression in a given tissues if its expression was higher than the 25th percentile of mean expression of all genes. We found no differences in total 5'UTR intron length amongst the different gene groups. (e) Histogram of number of genes divided by the presence of 5'UTR introns and by the number of tissues in which expression was detected. The number of tissues in which expression was detected was independent of the presence of 5'UTR introns.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2864569&req=5

Figure 3: Analysis of variability in expression across tissues as a function of the total 5'UTR intron length. (a) Transcripts with low mean expression have higher normalized expression variability. A standardized measure of the variability in gene expression across tissues was calculated and plotted against the natural logarithm of mean expression level. The black vertical line represents the lowest 25th percentile in mean expression. Since transcripts with low levels of mean expression tend to exhibit an artificially high variability in expression, they are removed from further analysis. (b) Boxplot of the coefficient of variation (standard deviation-to-mean ratio) of genes grouped by the total length of 5'UTR intron. The width of the boxes represents the relative number of data points in each category. There are no apparent differences between the three groups (c) Boxplot of log10 of total 5'UTR intron length of genes grouped by their across-tissue variability. Genes are divided into six categories depending on their coefficient of variation. Error bars correspond to standard deviation of the mean. No obvious dependence of expression variability to total 5UI length can be observed except for the most highly variable genes, which tend to have slightly shorter 5'UTR introns. (d) Boxplot of log10 of total 5'UTR intron length for gene groups defined by the number of tissues in which expression of each gene was detected. A gene was defined to have detectable expression in a given tissues if its expression was higher than the 25th percentile of mean expression of all genes. We found no differences in total 5'UTR intron length amongst the different gene groups. (e) Histogram of number of genes divided by the presence of 5'UTR introns and by the number of tissues in which expression was detected. The number of tissues in which expression was detected was independent of the presence of 5'UTR introns.
Mentions: where CVx is the CV of expression of gene x across all tissues, yx represents the vector of CV values for all 201 genes in a window centered around gene x, while μ1/2 and MAD represent the median and median absolute deviation, respectively. As expected, genes with low expression tended to have much more variability across tissues (Figure 3a). Based on the observed trend line, the genes with the lowest 25% expression were removed from further analysis (Figure 3a). The remaining genes were sorted into three categories with respect to the total intronic 5'UTR length as before (short, 0 to 25%; intermediate, 25 to 75%; long, 75 to 100%). We found no significant differences between these groups with respect to inter-tissue variability as measured by the coefficient of variation (Figure 3b; Kruskal-Wallis rank sum test, df = 2, P = 0.23). We then examined the lengths of the introns as a function of variability in expression (Figure 3c). The genes with the highest 5% variability across tissues did not differ from the other genes with respect to their 5UI lengths (Wilcoxon rank sum test, P = 0.07, 95% confidence interval between -0.008 and 0.25), but the genes with highest 1% across-tissue variability tended to have slightly shorter 5UIs (Wilcoxon rank sum test, P = 0.006, 95% confidence interval between -0.67 and -0.11). Genes with short 5UIs were also overrepresented in the top 1% across-tissue variability category (Fisher's Exact Test, P = 0.005, odds-ratio = 2.7). Our results suggested that length of the 5UI was not a major factor in determining across-tissue variability but there was a preference for shorter 5UIs in the most variable genes.

Bottom Line: Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories.Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner.While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.

View Article: PubMed Central - HTML - PubMed

Affiliation: Harvard Medical School, Department of Biological Chemistry and Molecular Pharmacology, 250 Longwood Avenue, SGMB-322, Boston, MA 02115, USA. cancenik@fas.harvard.edu.

ABSTRACT

Background: Approximately 35% of human genes contain introns within the 5' untranslated region (UTR). Introns in 5'UTRs differ from those in coding regions and 3'UTRs with respect to nucleotide composition, length distribution and density. Despite their presumed impact on gene regulation, the evolution and possible functions of 5'UTR introns remain largely unexplored.

Results: We performed a genome-scale computational analysis of 5'UTR introns in humans. We discovered that the most highly expressed genes tended to have short 5'UTR introns rather than having long 5'UTR introns or lacking 5'UTR introns entirely. Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories. In particular, genes with regulatory roles were surprisingly enriched in having 5'UTR introns. Finally, we analyzed the evolution of 5'UTR introns in non-receptor protein tyrosine kinases (NRTK), and identified a conserved DNA motif enriched within the 5'UTR introns of human NRTKs.

Conclusions: Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner. While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.

Show MeSH
Related in: MedlinePlus