Limits...
A human haploid gene trap collection to study lncRNAs with unusual RNA biology.

Kornienko AE, Vlatkovic I, Neesen J, Barlow DP, Pauler FM - RNA Biol (2016)

Bottom Line: De novo assembly from RNA-seq data suggests that LOC100288798 extends 289kb beyond its annotated 3' end and overlaps the downstream SLC38A4 gene.RT-qPCR and RNA-seq confirmed successful lncRNA truncation and its extended length.Expression analysis from RNA-seq data shows significant deregulation of 41 protein-coding genes upon LOC100288798 truncation.

View Article: PubMed Central - PubMed

Affiliation: a CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3 , 1090 Vienna , Austria.

ABSTRACT
Many thousand long non-coding (lnc) RNAs are mapped in the human genome. Time consuming studies using reverse genetic approaches by post-transcriptional knock-down or genetic modification of the locus demonstrated diverse biological functions for a few of these transcripts. The Human Gene Trap Mutant Collection in haploid KBM7 cells is a ready-to-use tool for studying protein-coding gene function. As lncRNAs show remarkable differences in RNA biology compared to protein-coding genes, it is unclear if this gene trap collection is useful for functional analysis of lncRNAs. Here we use the uncharacterized LOC100288798 lncRNA as a model to answer this question. Using public RNA-seq data we show that LOC100288798 is ubiquitously expressed, but inefficiently spliced. The minor spliced LOC100288798 isoforms are exported to the cytoplasm, whereas the major unspliced isoform is nuclear localized. This shows that LOC100288798 RNA biology differs markedly from typical mRNAs. De novo assembly from RNA-seq data suggests that LOC100288798 extends 289kb beyond its annotated 3' end and overlaps the downstream SLC38A4 gene. Three cell lines with independent gene trap insertions in LOC100288798 were available from the KBM7 gene trap collection. RT-qPCR and RNA-seq confirmed successful lncRNA truncation and its extended length. Expression analysis from RNA-seq data shows significant deregulation of 41 protein-coding genes upon LOC100288798 truncation. Our data shows that gene trap collections in human haploid cell lines are useful tools to study lncRNAs, and identifies the previously uncharacterized LOC100288798 as a potential gene regulator.

No MeSH data available.


Related in: MedlinePlus

LOC100288798 exon structure assembly from various tissues extends its annotation to over 500kb overlapping SLC38A4.UCSC Genome Browser screen shot of the studied locus (chr12:46,772,500-47,422,500). From top to bottom: Chromosome position and the scale; RefSeq gene annotation (all annotated isoforms are displayed), spliced human ESTs (12/35 ESTs displayed), transcriptome assembly of the locus obtained in this study (Results, Methods). Note that only selected transcripts are shown (11/167 de novo isoforms of LOC100288798 and 4/43 de novo isoforms of SLC38A4), and that both EST and transcriptome assembly data reveal extension of LOC100288798 to over 500kb in length. RNA-seq tracks from ENCODE/CSHL UCSC hub with the titles containing cell type name, RNA-seq type and transcriptional orientation are displayed below. Only total whole cell RNA-seq is displayed. Bottom: normalized RNA-seq signal from wild type human haploid KBM7 cell lines (merged data from 2 wild type clones sequenced in this study, Methods). For all RNA-seq tracks: only forward strand (Plus Signal) is displayed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4829315&req=5

f0002: LOC100288798 exon structure assembly from various tissues extends its annotation to over 500kb overlapping SLC38A4.UCSC Genome Browser screen shot of the studied locus (chr12:46,772,500-47,422,500). From top to bottom: Chromosome position and the scale; RefSeq gene annotation (all annotated isoforms are displayed), spliced human ESTs (12/35 ESTs displayed), transcriptome assembly of the locus obtained in this study (Results, Methods). Note that only selected transcripts are shown (11/167 de novo isoforms of LOC100288798 and 4/43 de novo isoforms of SLC38A4), and that both EST and transcriptome assembly data reveal extension of LOC100288798 to over 500kb in length. RNA-seq tracks from ENCODE/CSHL UCSC hub with the titles containing cell type name, RNA-seq type and transcriptional orientation are displayed below. Only total whole cell RNA-seq is displayed. Bottom: normalized RNA-seq signal from wild type human haploid KBM7 cell lines (merged data from 2 wild type clones sequenced in this study, Methods). For all RNA-seq tracks: only forward strand (Plus Signal) is displayed.

Mentions: Visual inspection of the RNA-seq data indicated that LOC100288798 transcription extends over the downstream SLC38A4 gene (see continuous RNA-seq signal in Fig. 1D), in spite of RefSeq annotating the 3’ end of LOC100288798 112kb upstream from SLC38A4 (Fig. 2 top). Interestingly, human spliced ESTs annotated continuous spliced transcripts overlapping SLC34A4 (Fig. 2). We next aimed to fully annotate LOC100288798 using publicly available RNA-seq data from multiple cell types. We limited this analysis to reads aligned to a 1 Mega base pairs (Mb) region (chr12:46,500,000-47,500,000) around LOC100288798. We extracted reads from each of the 46 aligned RNA-seq samples used in Fig. 1B (polyA+ as well as ribosomal depleted total RNA-seq) and performed de novo assembly using the Cufflinks software.59 Thus, we obtained 46 assemblies, which we merged using Cuffmerge software59 to create an integrative de novo annotation of the investigated region (see Fig. 2 for selected isoforms and Table S1Cfor all the isoforms annotated in the region). Importantly, we identified exon models that share exons with LOC100288798 lncRNA and overlap the SLC38A4 protein coding gene, indicating that LOC100288798 is a 558kb long lncRNA (chr12:46777455-47335067, see CUFF.281.86 in Fig. 2 and Table S1C). Visual inspection of the LOC100288798 RNA-seq signal in cell types ranging from the highest expressing (CD34 cells, RPKM=6.68) to lowest expressing (MNC Peripheral blood, RPKM=0.56), showed that extended transcription persists independently of expression level (Fig. 2). Therefore LOC100288798 lncRNA is consistently overlapping the SLC38A4 protein-coding gene and should be renamed as SLC38A4-AS according to the recently suggested nomenclature.53 As this nomenclature also appears more intuitive we have used it for the remainder of this study.Figure 2.


A human haploid gene trap collection to study lncRNAs with unusual RNA biology.

Kornienko AE, Vlatkovic I, Neesen J, Barlow DP, Pauler FM - RNA Biol (2016)

LOC100288798 exon structure assembly from various tissues extends its annotation to over 500kb overlapping SLC38A4.UCSC Genome Browser screen shot of the studied locus (chr12:46,772,500-47,422,500). From top to bottom: Chromosome position and the scale; RefSeq gene annotation (all annotated isoforms are displayed), spliced human ESTs (12/35 ESTs displayed), transcriptome assembly of the locus obtained in this study (Results, Methods). Note that only selected transcripts are shown (11/167 de novo isoforms of LOC100288798 and 4/43 de novo isoforms of SLC38A4), and that both EST and transcriptome assembly data reveal extension of LOC100288798 to over 500kb in length. RNA-seq tracks from ENCODE/CSHL UCSC hub with the titles containing cell type name, RNA-seq type and transcriptional orientation are displayed below. Only total whole cell RNA-seq is displayed. Bottom: normalized RNA-seq signal from wild type human haploid KBM7 cell lines (merged data from 2 wild type clones sequenced in this study, Methods). For all RNA-seq tracks: only forward strand (Plus Signal) is displayed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4829315&req=5

f0002: LOC100288798 exon structure assembly from various tissues extends its annotation to over 500kb overlapping SLC38A4.UCSC Genome Browser screen shot of the studied locus (chr12:46,772,500-47,422,500). From top to bottom: Chromosome position and the scale; RefSeq gene annotation (all annotated isoforms are displayed), spliced human ESTs (12/35 ESTs displayed), transcriptome assembly of the locus obtained in this study (Results, Methods). Note that only selected transcripts are shown (11/167 de novo isoforms of LOC100288798 and 4/43 de novo isoforms of SLC38A4), and that both EST and transcriptome assembly data reveal extension of LOC100288798 to over 500kb in length. RNA-seq tracks from ENCODE/CSHL UCSC hub with the titles containing cell type name, RNA-seq type and transcriptional orientation are displayed below. Only total whole cell RNA-seq is displayed. Bottom: normalized RNA-seq signal from wild type human haploid KBM7 cell lines (merged data from 2 wild type clones sequenced in this study, Methods). For all RNA-seq tracks: only forward strand (Plus Signal) is displayed.
Mentions: Visual inspection of the RNA-seq data indicated that LOC100288798 transcription extends over the downstream SLC38A4 gene (see continuous RNA-seq signal in Fig. 1D), in spite of RefSeq annotating the 3’ end of LOC100288798 112kb upstream from SLC38A4 (Fig. 2 top). Interestingly, human spliced ESTs annotated continuous spliced transcripts overlapping SLC34A4 (Fig. 2). We next aimed to fully annotate LOC100288798 using publicly available RNA-seq data from multiple cell types. We limited this analysis to reads aligned to a 1 Mega base pairs (Mb) region (chr12:46,500,000-47,500,000) around LOC100288798. We extracted reads from each of the 46 aligned RNA-seq samples used in Fig. 1B (polyA+ as well as ribosomal depleted total RNA-seq) and performed de novo assembly using the Cufflinks software.59 Thus, we obtained 46 assemblies, which we merged using Cuffmerge software59 to create an integrative de novo annotation of the investigated region (see Fig. 2 for selected isoforms and Table S1Cfor all the isoforms annotated in the region). Importantly, we identified exon models that share exons with LOC100288798 lncRNA and overlap the SLC38A4 protein coding gene, indicating that LOC100288798 is a 558kb long lncRNA (chr12:46777455-47335067, see CUFF.281.86 in Fig. 2 and Table S1C). Visual inspection of the LOC100288798 RNA-seq signal in cell types ranging from the highest expressing (CD34 cells, RPKM=6.68) to lowest expressing (MNC Peripheral blood, RPKM=0.56), showed that extended transcription persists independently of expression level (Fig. 2). Therefore LOC100288798 lncRNA is consistently overlapping the SLC38A4 protein-coding gene and should be renamed as SLC38A4-AS according to the recently suggested nomenclature.53 As this nomenclature also appears more intuitive we have used it for the remainder of this study.Figure 2.

Bottom Line: De novo assembly from RNA-seq data suggests that LOC100288798 extends 289kb beyond its annotated 3' end and overlaps the downstream SLC38A4 gene.RT-qPCR and RNA-seq confirmed successful lncRNA truncation and its extended length.Expression analysis from RNA-seq data shows significant deregulation of 41 protein-coding genes upon LOC100288798 truncation.

View Article: PubMed Central - PubMed

Affiliation: a CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3 , 1090 Vienna , Austria.

ABSTRACT
Many thousand long non-coding (lnc) RNAs are mapped in the human genome. Time consuming studies using reverse genetic approaches by post-transcriptional knock-down or genetic modification of the locus demonstrated diverse biological functions for a few of these transcripts. The Human Gene Trap Mutant Collection in haploid KBM7 cells is a ready-to-use tool for studying protein-coding gene function. As lncRNAs show remarkable differences in RNA biology compared to protein-coding genes, it is unclear if this gene trap collection is useful for functional analysis of lncRNAs. Here we use the uncharacterized LOC100288798 lncRNA as a model to answer this question. Using public RNA-seq data we show that LOC100288798 is ubiquitously expressed, but inefficiently spliced. The minor spliced LOC100288798 isoforms are exported to the cytoplasm, whereas the major unspliced isoform is nuclear localized. This shows that LOC100288798 RNA biology differs markedly from typical mRNAs. De novo assembly from RNA-seq data suggests that LOC100288798 extends 289kb beyond its annotated 3' end and overlaps the downstream SLC38A4 gene. Three cell lines with independent gene trap insertions in LOC100288798 were available from the KBM7 gene trap collection. RT-qPCR and RNA-seq confirmed successful lncRNA truncation and its extended length. Expression analysis from RNA-seq data shows significant deregulation of 41 protein-coding genes upon LOC100288798 truncation. Our data shows that gene trap collections in human haploid cell lines are useful tools to study lncRNAs, and identifies the previously uncharacterized LOC100288798 as a potential gene regulator.

No MeSH data available.


Related in: MedlinePlus