Limits...
Characterization of human pseudogene-derived non-coding RNAs for functional potential.

Guo X, Lin M, Rockowitz S, Lachman HM, Zheng D - PLoS ONE (2014)

Bottom Line: Our analysis of the ENCODE project data also found many transcriptionally active pseudogenes in the GM12878 and K562 cell lines; moreover, it showed that many human pseudogenes produced small RNAs (sRNAs) and some pseudogene-derived sRNAs, especially those from antisense strands, exhibited evidence of interfering with gene expression.Further integrated analysis of transcriptomics and epigenomics data, however, demonstrated that trimethylation of histone 3 at lysine 9 (H3K9me3), a posttranslational modification typically associated with gene repression and heterochromatin, was enriched at many transcribed pseudogenes in a transcription-level dependent manner in the two cell lines.The H3K9me3 enrichment was more prominent in pseudogenes that produced sRNAs at pseudogene loci and their adjacent regions, an observation further supported by the co-enrichment of SETDB1 (a H3K9 methyltransferase), suggesting that pseudogene sRNAs may have a role in regional chromatin repression.

View Article: PubMed Central - PubMed

Affiliation: The Saul R. Korey Department of Neurology, Albert Einstein College of Medicine, New York, New York, United States of America.

ABSTRACT
Thousands of pseudogenes exist in the human genome and many are transcribed, but their functional potential remains elusive and understudied. To explore these issues systematically, we first developed a computational pipeline to identify transcribed pseudogenes from RNA-Seq data. Applying the pipeline to datasets from 16 distinct normal human tissues identified ∼ 3,000 pseudogenes that could produce non-coding RNAs in a manner of low abundance but high tissue specificity under normal physiological conditions. Cross-tissue comparison revealed that the transcriptional profiles of pseudogenes and their parent genes showed mostly positive correlations, suggesting that pseudogene transcription could have a positive effect on the expression of their parent genes, perhaps by functioning as competing endogenous RNAs (ceRNAs), as previously suggested and demonstrated with the PTEN pseudogene, PTENP1. Our analysis of the ENCODE project data also found many transcriptionally active pseudogenes in the GM12878 and K562 cell lines; moreover, it showed that many human pseudogenes produced small RNAs (sRNAs) and some pseudogene-derived sRNAs, especially those from antisense strands, exhibited evidence of interfering with gene expression. Further integrated analysis of transcriptomics and epigenomics data, however, demonstrated that trimethylation of histone 3 at lysine 9 (H3K9me3), a posttranslational modification typically associated with gene repression and heterochromatin, was enriched at many transcribed pseudogenes in a transcription-level dependent manner in the two cell lines. The H3K9me3 enrichment was more prominent in pseudogenes that produced sRNAs at pseudogene loci and their adjacent regions, an observation further supported by the co-enrichment of SETDB1 (a H3K9 methyltransferase), suggesting that pseudogene sRNAs may have a role in regional chromatin repression. Taken together, our comprehensive and systematic characterization of pseudogene transcription uncovers a complex picture of how pseudogene ncRNAs could influence gene and pseudogene expression, at both epigenetic and post-transcriptional levels.

Show MeSH

Related in: MedlinePlus

Transcriptional correlations (ρpg:g) between pseudogenes and their parents.A) A heatmap for distribution of ρpg:g, including data from separation of processed and duplicated pseudogenes into two groups based on the presence of a coding gene within 20 kb. The coefficients between transcribed pseudogenes and randomly chosen coding genes (top) were used as a control for p-value estimation. Colors represent relative numbers of pseudogenes in each ρpg:g range (in Z-score transformation). B) Pseudogenes transcribed in the sense direction (S) exhibited higher ρpg:g than those in the antisense (A). C) The transcriptional correlation between pseudogenes and their parents (ρpg:g) is inversely correlated to the transcriptional correlation between miRNAs and their putative targets (ρmiRNA:g). Genes were binned on their ρmiRNA:g values (x-axis) and then the mean and standard deviation of ρpg:g (y-axis) for each group of genes was plotted. D) Expression of parental genes targeted by miRNAs was less affected by miRNA KD than the targeting genes without pseudogenes. Only genes in response to KD (up >1.3 fold) were analyzed here. Y-axis shows the fold change of KD over control. The miRNA targets were experimentally determined by the CLASH analysis [49]. The middle line in the boxplots mark median and the box lines mark the first and third quartile values (same for boxplots below).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3974860&req=5

pone-0093972-g003: Transcriptional correlations (ρpg:g) between pseudogenes and their parents.A) A heatmap for distribution of ρpg:g, including data from separation of processed and duplicated pseudogenes into two groups based on the presence of a coding gene within 20 kb. The coefficients between transcribed pseudogenes and randomly chosen coding genes (top) were used as a control for p-value estimation. Colors represent relative numbers of pseudogenes in each ρpg:g range (in Z-score transformation). B) Pseudogenes transcribed in the sense direction (S) exhibited higher ρpg:g than those in the antisense (A). C) The transcriptional correlation between pseudogenes and their parents (ρpg:g) is inversely correlated to the transcriptional correlation between miRNAs and their putative targets (ρmiRNA:g). Genes were binned on their ρmiRNA:g values (x-axis) and then the mean and standard deviation of ρpg:g (y-axis) for each group of genes was plotted. D) Expression of parental genes targeted by miRNAs was less affected by miRNA KD than the targeting genes without pseudogenes. Only genes in response to KD (up >1.3 fold) were analyzed here. Y-axis shows the fold change of KD over control. The miRNA targets were experimentally determined by the CLASH analysis [49]. The middle line in the boxplots mark median and the box lines mark the first and third quartile values (same for boxplots below).

Mentions: The evidence of pervasive pseudogene transcription is compelling, but more important questions are what kinds of biological functions pseudogene ncRNAs can have. Note that the term “biological function” in this report is used in a loose sense, whereas “biochemical activity” may arguably be more appropriate, in accordance with the source of our experimental data and the computational nature of our work. The first obvious question is how pseudogene and parent gene transcription are related, as this information may shed light on how pseudogenes could regulate their most conceivable targets. To this end, we computed the Spearman rank correlation of the 16 tissue transcription levels for each of the 1,270 pseudogene-parent pairs (ρpg:g). The resulting correlation coefficients for both processed and duplicated pseudogenes showed a distribution that was deviated from the theoretical normal distributions (p = 0.05, Kolmogorov-Smirnov (KS) test) and biased towards positive numbers (ρpg:g median  = 0.42 and 0.12 for duplicated and processed pseudogenes, respectively, Fig. 3A). The skew was statistically significant, when compared to the distribution of the ρ between transcribed pseudogenes and randomly selected coding genes (Fig. 3A and Fig. S3). In addition, 128 and 95 of the positive ρpg:g values for processed and duplicated pseudogenes were statistically significant (p<0.05). Since some pseudogenes are close to their parents on chromosomes (e.g., those from tandem duplications) and adjacent genes tend to be co-regulated [41], we computed and used the chromosomal distances of transcribed pseudogenes to the nearest coding gene to separate transcribed pseudogenes within 20 kb of a gene (“group t1”; n = 712 and 236 for processed and duplicated, respectively) from the rest (“group t2”; n = 167 and 78 for processed and duplicated, respectively). We found that ρpg:g values for the t2 group remained skewed to positive for both processed and duplicated pseudogenes (ρpg:g median = 0.42 and 0.41 for group t1 and t2 duplicated, and 0.08 and 0.25 for processed pseudogenes; Fig 3A). Interestingly, this breakdown indeed revealed that group t2 processed pseudogenes showed even larger correlations with their parents (Wilcoxon test, p<0.002). These results suggest that our observation of positive ρpg:g values for most pseudogenes did not arise from co-regulation of pseudogenes and their parents due to their close chromosomal proximity. We noted that the difference between t1 and t2 processed pseudogenes remained significant when longer distances were applied (p<0.002, 0.008 and 0.02 for 20 kb, 50 kb and 100 kb, respectively). In summary, our results indicate that pseudogene transcription is positively correlated with the expression of their parents.


Characterization of human pseudogene-derived non-coding RNAs for functional potential.

Guo X, Lin M, Rockowitz S, Lachman HM, Zheng D - PLoS ONE (2014)

Transcriptional correlations (ρpg:g) between pseudogenes and their parents.A) A heatmap for distribution of ρpg:g, including data from separation of processed and duplicated pseudogenes into two groups based on the presence of a coding gene within 20 kb. The coefficients between transcribed pseudogenes and randomly chosen coding genes (top) were used as a control for p-value estimation. Colors represent relative numbers of pseudogenes in each ρpg:g range (in Z-score transformation). B) Pseudogenes transcribed in the sense direction (S) exhibited higher ρpg:g than those in the antisense (A). C) The transcriptional correlation between pseudogenes and their parents (ρpg:g) is inversely correlated to the transcriptional correlation between miRNAs and their putative targets (ρmiRNA:g). Genes were binned on their ρmiRNA:g values (x-axis) and then the mean and standard deviation of ρpg:g (y-axis) for each group of genes was plotted. D) Expression of parental genes targeted by miRNAs was less affected by miRNA KD than the targeting genes without pseudogenes. Only genes in response to KD (up >1.3 fold) were analyzed here. Y-axis shows the fold change of KD over control. The miRNA targets were experimentally determined by the CLASH analysis [49]. The middle line in the boxplots mark median and the box lines mark the first and third quartile values (same for boxplots below).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3974860&req=5

pone-0093972-g003: Transcriptional correlations (ρpg:g) between pseudogenes and their parents.A) A heatmap for distribution of ρpg:g, including data from separation of processed and duplicated pseudogenes into two groups based on the presence of a coding gene within 20 kb. The coefficients between transcribed pseudogenes and randomly chosen coding genes (top) were used as a control for p-value estimation. Colors represent relative numbers of pseudogenes in each ρpg:g range (in Z-score transformation). B) Pseudogenes transcribed in the sense direction (S) exhibited higher ρpg:g than those in the antisense (A). C) The transcriptional correlation between pseudogenes and their parents (ρpg:g) is inversely correlated to the transcriptional correlation between miRNAs and their putative targets (ρmiRNA:g). Genes were binned on their ρmiRNA:g values (x-axis) and then the mean and standard deviation of ρpg:g (y-axis) for each group of genes was plotted. D) Expression of parental genes targeted by miRNAs was less affected by miRNA KD than the targeting genes without pseudogenes. Only genes in response to KD (up >1.3 fold) were analyzed here. Y-axis shows the fold change of KD over control. The miRNA targets were experimentally determined by the CLASH analysis [49]. The middle line in the boxplots mark median and the box lines mark the first and third quartile values (same for boxplots below).
Mentions: The evidence of pervasive pseudogene transcription is compelling, but more important questions are what kinds of biological functions pseudogene ncRNAs can have. Note that the term “biological function” in this report is used in a loose sense, whereas “biochemical activity” may arguably be more appropriate, in accordance with the source of our experimental data and the computational nature of our work. The first obvious question is how pseudogene and parent gene transcription are related, as this information may shed light on how pseudogenes could regulate their most conceivable targets. To this end, we computed the Spearman rank correlation of the 16 tissue transcription levels for each of the 1,270 pseudogene-parent pairs (ρpg:g). The resulting correlation coefficients for both processed and duplicated pseudogenes showed a distribution that was deviated from the theoretical normal distributions (p = 0.05, Kolmogorov-Smirnov (KS) test) and biased towards positive numbers (ρpg:g median  = 0.42 and 0.12 for duplicated and processed pseudogenes, respectively, Fig. 3A). The skew was statistically significant, when compared to the distribution of the ρ between transcribed pseudogenes and randomly selected coding genes (Fig. 3A and Fig. S3). In addition, 128 and 95 of the positive ρpg:g values for processed and duplicated pseudogenes were statistically significant (p<0.05). Since some pseudogenes are close to their parents on chromosomes (e.g., those from tandem duplications) and adjacent genes tend to be co-regulated [41], we computed and used the chromosomal distances of transcribed pseudogenes to the nearest coding gene to separate transcribed pseudogenes within 20 kb of a gene (“group t1”; n = 712 and 236 for processed and duplicated, respectively) from the rest (“group t2”; n = 167 and 78 for processed and duplicated, respectively). We found that ρpg:g values for the t2 group remained skewed to positive for both processed and duplicated pseudogenes (ρpg:g median = 0.42 and 0.41 for group t1 and t2 duplicated, and 0.08 and 0.25 for processed pseudogenes; Fig 3A). Interestingly, this breakdown indeed revealed that group t2 processed pseudogenes showed even larger correlations with their parents (Wilcoxon test, p<0.002). These results suggest that our observation of positive ρpg:g values for most pseudogenes did not arise from co-regulation of pseudogenes and their parents due to their close chromosomal proximity. We noted that the difference between t1 and t2 processed pseudogenes remained significant when longer distances were applied (p<0.002, 0.008 and 0.02 for 20 kb, 50 kb and 100 kb, respectively). In summary, our results indicate that pseudogene transcription is positively correlated with the expression of their parents.

Bottom Line: Our analysis of the ENCODE project data also found many transcriptionally active pseudogenes in the GM12878 and K562 cell lines; moreover, it showed that many human pseudogenes produced small RNAs (sRNAs) and some pseudogene-derived sRNAs, especially those from antisense strands, exhibited evidence of interfering with gene expression.Further integrated analysis of transcriptomics and epigenomics data, however, demonstrated that trimethylation of histone 3 at lysine 9 (H3K9me3), a posttranslational modification typically associated with gene repression and heterochromatin, was enriched at many transcribed pseudogenes in a transcription-level dependent manner in the two cell lines.The H3K9me3 enrichment was more prominent in pseudogenes that produced sRNAs at pseudogene loci and their adjacent regions, an observation further supported by the co-enrichment of SETDB1 (a H3K9 methyltransferase), suggesting that pseudogene sRNAs may have a role in regional chromatin repression.

View Article: PubMed Central - PubMed

Affiliation: The Saul R. Korey Department of Neurology, Albert Einstein College of Medicine, New York, New York, United States of America.

ABSTRACT
Thousands of pseudogenes exist in the human genome and many are transcribed, but their functional potential remains elusive and understudied. To explore these issues systematically, we first developed a computational pipeline to identify transcribed pseudogenes from RNA-Seq data. Applying the pipeline to datasets from 16 distinct normal human tissues identified ∼ 3,000 pseudogenes that could produce non-coding RNAs in a manner of low abundance but high tissue specificity under normal physiological conditions. Cross-tissue comparison revealed that the transcriptional profiles of pseudogenes and their parent genes showed mostly positive correlations, suggesting that pseudogene transcription could have a positive effect on the expression of their parent genes, perhaps by functioning as competing endogenous RNAs (ceRNAs), as previously suggested and demonstrated with the PTEN pseudogene, PTENP1. Our analysis of the ENCODE project data also found many transcriptionally active pseudogenes in the GM12878 and K562 cell lines; moreover, it showed that many human pseudogenes produced small RNAs (sRNAs) and some pseudogene-derived sRNAs, especially those from antisense strands, exhibited evidence of interfering with gene expression. Further integrated analysis of transcriptomics and epigenomics data, however, demonstrated that trimethylation of histone 3 at lysine 9 (H3K9me3), a posttranslational modification typically associated with gene repression and heterochromatin, was enriched at many transcribed pseudogenes in a transcription-level dependent manner in the two cell lines. The H3K9me3 enrichment was more prominent in pseudogenes that produced sRNAs at pseudogene loci and their adjacent regions, an observation further supported by the co-enrichment of SETDB1 (a H3K9 methyltransferase), suggesting that pseudogene sRNAs may have a role in regional chromatin repression. Taken together, our comprehensive and systematic characterization of pseudogene transcription uncovers a complex picture of how pseudogene ncRNAs could influence gene and pseudogene expression, at both epigenetic and post-transcriptional levels.

Show MeSH
Related in: MedlinePlus