Limits...
HIPSTR and thousands of lncRNAs are heterogeneously expressed in human embryos, primordial germ cells and stable cell lines

View Article: PubMed Central - PubMed

ABSTRACT

Eukaryotic genomes are transcribed into numerous regulatory long non-coding RNAs (lncRNAs). Compared to mRNAs, lncRNAs display higher developmental stage-, tissue-, and cell-subtype-specificity of expression, and are generally less abundant in a population of cells. Despite the progress in single-cell-focused research, the origins of low population-level expression of lncRNAs in homogeneous populations of cells are poorly understood. Here, we identify HIPSTR (Heterogeneously expressed from the Intronic Plus Strand of the TFAP2A-locus RNA), a novel lncRNA gene in the developmentally regulated TFAP2A locus. HIPSTR has evolutionarily conserved expression patterns, its promoter is most active in undifferentiated cells, and depletion of HIPSTR in HEK293 and in pluripotent H1BP cells predominantly affects the genes involved in early organismal development and cell differentiation. Most importantly, we find that HIPSTR is specifically induced and heterogeneously expressed in the 8-cell-stage human embryos during the major wave of embryonic genome activation. We systematically explore the phenomenon of cell-to-cell variation of gene expression and link it to low population-level expression of lncRNAs, showing that, similar to HIPSTR, the expression of thousands of lncRNAs is more highly heterogeneous than the expression of mRNAs in the individual, otherwise indistinguishable cells of totipotent human embryos, primordial germ cells, and stable cell lines.

No MeSH data available.


LncRNAs show higher heterogeneity of expression than mRNAs.(A–E) LncRNAs have higher cell-to-cell variation in expression than mRNAs. Coefficient of variation (CV) across all cells of a given single-cell RNA-seq data set was calculated for each expressed gene (>3 FPKM), and shown are box plots of CV values for highly expressed (>30 FPKM) mRNAs (dark orange) and lncRNAs (dark grey), and for moderately expressed (3–30 FPKM) mRNAs (light orange) and lncRNAs (light grey). Box shows the first and third interquartile range (IQR), the line inside the box shows the median, and whiskers encompass the CV values within 1.5 IQR below and above the first and third quartiles, respectively. Points outside the whiskers are CV outliers. All possible pairwise comparisons result in statistically significant differences, Welch’s t-test (p-value < 0.001). (F–J) Higher fraction of lncRNAs is classified as highly heterogeneously expressed, as compared to mRNAs. Plotted are density distributions of numbers of expressing cells calculated for lncRNAs (black dashed line), mRNAs (red dashed line), lncRNAs and mRNAs together (grey bars), and for modeled populations of genes with high (solid light blue line) or low (solid dark blue line) heterogeneity of expression. Pie charts demonstrate fractions of lncRNAs and mRNAs associated with the population of genes with high (light blue), low (dark blue) or uncertain (grey) heterogeneity of expression. Genes used for this analysis had expression >3 FPKM in at least one cell, and <30 FPKM in all cells of the corresponding data sets. Genes that contributed to the plots and pie charts on (F–J) were classified as belonging to either of the modeled populations of genes (with high or low expression heterogeneity) with a posterior probability >0.99, or were assigned the “uncertain heterogeneity” classification otherwise (posterior probability ≤0.99) (Tables S4,S5,S6,S7,S8). Single-cell RNA-seq data sets re-analyzed here were from: ref. 7 (8-cell and morula stage embryos, hESCs), ref. 53 (K562 cells), and ref. 54 (7W hPGCs and 19 W hPGCs). Number of individual cells used for each analysis is in parentheses in each panel heading.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5015059&req=5

f5: LncRNAs show higher heterogeneity of expression than mRNAs.(A–E) LncRNAs have higher cell-to-cell variation in expression than mRNAs. Coefficient of variation (CV) across all cells of a given single-cell RNA-seq data set was calculated for each expressed gene (>3 FPKM), and shown are box plots of CV values for highly expressed (>30 FPKM) mRNAs (dark orange) and lncRNAs (dark grey), and for moderately expressed (3–30 FPKM) mRNAs (light orange) and lncRNAs (light grey). Box shows the first and third interquartile range (IQR), the line inside the box shows the median, and whiskers encompass the CV values within 1.5 IQR below and above the first and third quartiles, respectively. Points outside the whiskers are CV outliers. All possible pairwise comparisons result in statistically significant differences, Welch’s t-test (p-value < 0.001). (F–J) Higher fraction of lncRNAs is classified as highly heterogeneously expressed, as compared to mRNAs. Plotted are density distributions of numbers of expressing cells calculated for lncRNAs (black dashed line), mRNAs (red dashed line), lncRNAs and mRNAs together (grey bars), and for modeled populations of genes with high (solid light blue line) or low (solid dark blue line) heterogeneity of expression. Pie charts demonstrate fractions of lncRNAs and mRNAs associated with the population of genes with high (light blue), low (dark blue) or uncertain (grey) heterogeneity of expression. Genes used for this analysis had expression >3 FPKM in at least one cell, and <30 FPKM in all cells of the corresponding data sets. Genes that contributed to the plots and pie charts on (F–J) were classified as belonging to either of the modeled populations of genes (with high or low expression heterogeneity) with a posterior probability >0.99, or were assigned the “uncertain heterogeneity” classification otherwise (posterior probability ≤0.99) (Tables S4,S5,S6,S7,S8). Single-cell RNA-seq data sets re-analyzed here were from: ref. 7 (8-cell and morula stage embryos, hESCs), ref. 53 (K562 cells), and ref. 54 (7W hPGCs and 19 W hPGCs). Number of individual cells used for each analysis is in parentheses in each panel heading.

Mentions: To resolve this discrepancy between single-molecule RNA-FISH results and observations from single-cell RNA-seq data, we next systematically explored patterns of cell-to-cell expression variability of lncRNAs and mRNAs in human cells. For this, we used five single-cell RNA-seq data sets – from human totipotent blastomeres (36 cells; ref. 7), from pluripotent hESCs (32 cells; ref. 7), from K562 cells (96 cells; ref. 53), and from hPGCs of 7 weeks-old (7W; 39 cells; ref. 54) and of 19 weeks-old (19W; 57 cells, ref. 54) male embryos. We considered all expressed genes, defined here as those having max expression >3 FPKM (30-fold more stringent threshold than in refs 7, 54; see Methods) in at least one individual cell of a given data set, and compared the coefficient of variation of gene expression across the cells between lncRNAs and mRNAs. For genes with max expression within 3–30 FPKM, we saw a greater difference between non-coding and protein-coding transcripts than for those with max expression >30 FPKM (Fig. 5A–E). For the former group, the distribution of the numbers of cells was a mixture distribution. We fitted this mixture distribution with a finite mixture model with two populations, and used this model to classify lncRNAs and mRNAs as having high, low or uncertain heterogeneity of expression (Fig. 5F–J). For lncRNAs of this group (max expression 3–30 FPKM), only a small fraction showed low or uncertain (posterior probability <0.99) heterogeneity of expression −6.5%, 7.0%, 4.2%, 4.8%, and 2.3% in human totipotent blastomeres (Fig. 5F), hESCs (Fig. 5G), K562 cells (Fig. 5H), 7W hPGCs (Fig. 5I), and 19W hPGCs (Fig. 5J), respectively (Table S9). For example, in hESCs the known pluripotency regulator TUNAR (ref. 55) was assigned low heterogeneity flag in our analysis (Table S2). At the same time, HIPSTR was classified as a transcript with high heterogeneity of expression in 8-cell and morula-stage human embryos, and in K562 cells (Tables S4, S6), as expected. Remarkable heterogeneity of expression of lncRNAs was in a stark contrast to the much lower heterogeneity of expression of mRNAs with comparable expression levels (3–30 FPKM), of which 40%, 43%, 19%, 27%, and 20% were associated with low or uncertain heterogeneity of expression in human totipotent blastomeres, hESCs, K562 cells, 7W hPGCs, and 19W hPGCs, respectively (Fig. 5F–J; Table S9). Overall, lncRNAs analyzed here (max expression 3–30 FPKM) and assigned the high heterogeneity flag (H) constituted on average 74% of all expressed lncRNAs (>3 FPKM), while for mRNAs this fraction was only 35% (Table S8).


HIPSTR and thousands of lncRNAs are heterogeneously expressed in human embryos, primordial germ cells and stable cell lines
LncRNAs show higher heterogeneity of expression than mRNAs.(A–E) LncRNAs have higher cell-to-cell variation in expression than mRNAs. Coefficient of variation (CV) across all cells of a given single-cell RNA-seq data set was calculated for each expressed gene (>3 FPKM), and shown are box plots of CV values for highly expressed (>30 FPKM) mRNAs (dark orange) and lncRNAs (dark grey), and for moderately expressed (3–30 FPKM) mRNAs (light orange) and lncRNAs (light grey). Box shows the first and third interquartile range (IQR), the line inside the box shows the median, and whiskers encompass the CV values within 1.5 IQR below and above the first and third quartiles, respectively. Points outside the whiskers are CV outliers. All possible pairwise comparisons result in statistically significant differences, Welch’s t-test (p-value < 0.001). (F–J) Higher fraction of lncRNAs is classified as highly heterogeneously expressed, as compared to mRNAs. Plotted are density distributions of numbers of expressing cells calculated for lncRNAs (black dashed line), mRNAs (red dashed line), lncRNAs and mRNAs together (grey bars), and for modeled populations of genes with high (solid light blue line) or low (solid dark blue line) heterogeneity of expression. Pie charts demonstrate fractions of lncRNAs and mRNAs associated with the population of genes with high (light blue), low (dark blue) or uncertain (grey) heterogeneity of expression. Genes used for this analysis had expression >3 FPKM in at least one cell, and <30 FPKM in all cells of the corresponding data sets. Genes that contributed to the plots and pie charts on (F–J) were classified as belonging to either of the modeled populations of genes (with high or low expression heterogeneity) with a posterior probability >0.99, or were assigned the “uncertain heterogeneity” classification otherwise (posterior probability ≤0.99) (Tables S4,S5,S6,S7,S8). Single-cell RNA-seq data sets re-analyzed here were from: ref. 7 (8-cell and morula stage embryos, hESCs), ref. 53 (K562 cells), and ref. 54 (7W hPGCs and 19 W hPGCs). Number of individual cells used for each analysis is in parentheses in each panel heading.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5015059&req=5

f5: LncRNAs show higher heterogeneity of expression than mRNAs.(A–E) LncRNAs have higher cell-to-cell variation in expression than mRNAs. Coefficient of variation (CV) across all cells of a given single-cell RNA-seq data set was calculated for each expressed gene (>3 FPKM), and shown are box plots of CV values for highly expressed (>30 FPKM) mRNAs (dark orange) and lncRNAs (dark grey), and for moderately expressed (3–30 FPKM) mRNAs (light orange) and lncRNAs (light grey). Box shows the first and third interquartile range (IQR), the line inside the box shows the median, and whiskers encompass the CV values within 1.5 IQR below and above the first and third quartiles, respectively. Points outside the whiskers are CV outliers. All possible pairwise comparisons result in statistically significant differences, Welch’s t-test (p-value < 0.001). (F–J) Higher fraction of lncRNAs is classified as highly heterogeneously expressed, as compared to mRNAs. Plotted are density distributions of numbers of expressing cells calculated for lncRNAs (black dashed line), mRNAs (red dashed line), lncRNAs and mRNAs together (grey bars), and for modeled populations of genes with high (solid light blue line) or low (solid dark blue line) heterogeneity of expression. Pie charts demonstrate fractions of lncRNAs and mRNAs associated with the population of genes with high (light blue), low (dark blue) or uncertain (grey) heterogeneity of expression. Genes used for this analysis had expression >3 FPKM in at least one cell, and <30 FPKM in all cells of the corresponding data sets. Genes that contributed to the plots and pie charts on (F–J) were classified as belonging to either of the modeled populations of genes (with high or low expression heterogeneity) with a posterior probability >0.99, or were assigned the “uncertain heterogeneity” classification otherwise (posterior probability ≤0.99) (Tables S4,S5,S6,S7,S8). Single-cell RNA-seq data sets re-analyzed here were from: ref. 7 (8-cell and morula stage embryos, hESCs), ref. 53 (K562 cells), and ref. 54 (7W hPGCs and 19 W hPGCs). Number of individual cells used for each analysis is in parentheses in each panel heading.
Mentions: To resolve this discrepancy between single-molecule RNA-FISH results and observations from single-cell RNA-seq data, we next systematically explored patterns of cell-to-cell expression variability of lncRNAs and mRNAs in human cells. For this, we used five single-cell RNA-seq data sets – from human totipotent blastomeres (36 cells; ref. 7), from pluripotent hESCs (32 cells; ref. 7), from K562 cells (96 cells; ref. 53), and from hPGCs of 7 weeks-old (7W; 39 cells; ref. 54) and of 19 weeks-old (19W; 57 cells, ref. 54) male embryos. We considered all expressed genes, defined here as those having max expression >3 FPKM (30-fold more stringent threshold than in refs 7, 54; see Methods) in at least one individual cell of a given data set, and compared the coefficient of variation of gene expression across the cells between lncRNAs and mRNAs. For genes with max expression within 3–30 FPKM, we saw a greater difference between non-coding and protein-coding transcripts than for those with max expression >30 FPKM (Fig. 5A–E). For the former group, the distribution of the numbers of cells was a mixture distribution. We fitted this mixture distribution with a finite mixture model with two populations, and used this model to classify lncRNAs and mRNAs as having high, low or uncertain heterogeneity of expression (Fig. 5F–J). For lncRNAs of this group (max expression 3–30 FPKM), only a small fraction showed low or uncertain (posterior probability <0.99) heterogeneity of expression −6.5%, 7.0%, 4.2%, 4.8%, and 2.3% in human totipotent blastomeres (Fig. 5F), hESCs (Fig. 5G), K562 cells (Fig. 5H), 7W hPGCs (Fig. 5I), and 19W hPGCs (Fig. 5J), respectively (Table S9). For example, in hESCs the known pluripotency regulator TUNAR (ref. 55) was assigned low heterogeneity flag in our analysis (Table S2). At the same time, HIPSTR was classified as a transcript with high heterogeneity of expression in 8-cell and morula-stage human embryos, and in K562 cells (Tables S4, S6), as expected. Remarkable heterogeneity of expression of lncRNAs was in a stark contrast to the much lower heterogeneity of expression of mRNAs with comparable expression levels (3–30 FPKM), of which 40%, 43%, 19%, 27%, and 20% were associated with low or uncertain heterogeneity of expression in human totipotent blastomeres, hESCs, K562 cells, 7W hPGCs, and 19W hPGCs, respectively (Fig. 5F–J; Table S9). Overall, lncRNAs analyzed here (max expression 3–30 FPKM) and assigned the high heterogeneity flag (H) constituted on average 74% of all expressed lncRNAs (>3 FPKM), while for mRNAs this fraction was only 35% (Table S8).

View Article: PubMed Central - PubMed

ABSTRACT

Eukaryotic genomes are transcribed into numerous regulatory long non-coding RNAs (lncRNAs). Compared to mRNAs, lncRNAs display higher developmental stage-, tissue-, and cell-subtype-specificity of expression, and are generally less abundant in a population of cells. Despite the progress in single-cell-focused research, the origins of low population-level expression of lncRNAs in homogeneous populations of cells are poorly understood. Here, we identify HIPSTR (Heterogeneously expressed from the Intronic Plus Strand of the TFAP2A-locus RNA), a novel lncRNA gene in the developmentally regulated TFAP2A locus. HIPSTR has evolutionarily conserved expression patterns, its promoter is most active in undifferentiated cells, and depletion of HIPSTR in HEK293 and in pluripotent H1BP cells predominantly affects the genes involved in early organismal development and cell differentiation. Most importantly, we find that HIPSTR is specifically induced and heterogeneously expressed in the 8-cell-stage human embryos during the major wave of embryonic genome activation. We systematically explore the phenomenon of cell-to-cell variation of gene expression and link it to low population-level expression of lncRNAs, showing that, similar to HIPSTR, the expression of thousands of lncRNAs is more highly heterogeneous than the expression of mRNAs in the individual, otherwise indistinguishable cells of totipotent human embryos, primordial germ cells, and stable cell lines.

No MeSH data available.