Limits...
Comparative transcriptomic analysis of multiple cardiovascular fates from embryonic stem cells predicts novel regulators in human cardiogenesis.

Li Y, Lin B, Yang L - Sci Rep (2015)

Bottom Line: Furthermore, GEPA analysis revealed the MCP-specific expressions of genes in ephrin signaling pathway, positive role of which in cardiomyocyte differentiation was further validated experimentally.By using RNA-seq plus GEPA workflow, we also identified stage-specific RNA splicing switch and lineage-enriched long non-coding RNAs during human cardiovascular differentiation.Overall, our study utilized multi-cell-fate transcriptomic comparison analysis to establish a lineage-specific gene expression map for predicting and validating novel regulatory mechanisms underlying early human cardiovascular development.

View Article: PubMed Central - PubMed

Affiliation: Department of Developmental Biology, University of Pittsburgh School of Medicine, 530 45th Street, Rangos Research Center, Pittsburgh, PA 15201.

ABSTRACT
Dissecting the gene expression programs which control the early stage cardiovascular development is essential for understanding the molecular mechanisms of human heart development and heart disease. Here, we performed transcriptome sequencing (RNA-seq) of highly purified human Embryonic Stem Cells (hESCs), hESC-derived Multipotential Cardiovascular Progenitors (MCPs) and MCP-specified three cardiovascular lineages. A novel algorithm, named as Gene Expression Pattern Analyzer (GEPA), was developed to obtain a refined lineage-specificity map of all sequenced genes, which reveals dynamic changes of transcriptional factor networks underlying early human cardiovascular development. Moreover, our GEPA predictions captured ~90% of top-ranked regulatory cardiac genes that were previously predicted based on chromatin signature changes in hESCs, and further defined their cardiovascular lineage-specificities, indicating that our multi-fate comparison analysis could predict novel regulatory genes. Furthermore, GEPA analysis revealed the MCP-specific expressions of genes in ephrin signaling pathway, positive role of which in cardiomyocyte differentiation was further validated experimentally. By using RNA-seq plus GEPA workflow, we also identified stage-specific RNA splicing switch and lineage-enriched long non-coding RNAs during human cardiovascular differentiation. Overall, our study utilized multi-cell-fate transcriptomic comparison analysis to establish a lineage-specific gene expression map for predicting and validating novel regulatory mechanisms underlying early human cardiovascular development.

No MeSH data available.


Related in: MedlinePlus

GEPA algorithm identified lineage-enriched genes from RNA-seq data. (a) Schematic representation of the “GEPA” algorithm workflow to identify the lineage-enriched patterns of the genes. (b) Estimation of false positive and false negative ratio of GEPA algorithm at different thresholds of FPKM fold change. (c) Distribution of the genes across the expression pattern categories. Lineage-enriched patterns are indicated at left side. In the same row, rectangles filled with blue are at least 2.5 fold higher than those in light gray. The bars indicating the number of genes are color coded. Blue for single lineage-enriched groups. Green, yellow and orange for two, three and four lineage-enriched groups, respectively. Light blue for “Gradient” group and purple for “Even” group. (d) qRT-PCR validation of the signature genes for lineage-enriched categories. Gene name and expression pattern defined by GEPA (in brackets) were shown above the plots.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4440522&req=5

f2: GEPA algorithm identified lineage-enriched genes from RNA-seq data. (a) Schematic representation of the “GEPA” algorithm workflow to identify the lineage-enriched patterns of the genes. (b) Estimation of false positive and false negative ratio of GEPA algorithm at different thresholds of FPKM fold change. (c) Distribution of the genes across the expression pattern categories. Lineage-enriched patterns are indicated at left side. In the same row, rectangles filled with blue are at least 2.5 fold higher than those in light gray. The bars indicating the number of genes are color coded. Blue for single lineage-enriched groups. Green, yellow and orange for two, three and four lineage-enriched groups, respectively. Light blue for “Gradient” group and purple for “Even” group. (d) qRT-PCR validation of the signature genes for lineage-enriched categories. Gene name and expression pattern defined by GEPA (in brackets) were shown above the plots.

Mentions: To compare the relative expression levels of each gene within the five cell types, we developed a new algorithm named Gene Expression Pattern Analyzer (GEPA) (Fig. 2a). This algorithm recognized the single or multiple lineages enrichment pattern (LEP) of each individual gene within the five cell types and the lineage-specificity was set based on the fold change of gene expression over an arbitrary threshold (see Methods for details). Using GEPA, all sequenced genes from the five cell types could be grouped based on their lineage-specificity. If a clear lineage-specific enrichment was not identified, genes would be grouped into “Gradient” or “Even” categories, which indicate the patterns of commonly expressed genes across all five samples with mild or no lineage specificity, respectively (Fig. 2a). Since LEP of a single gene is largely relying on the arbitrary setting of fold change thresholds, to estimate the false positive recognition of the LEP by GEPA, we input 3804 documented human housekeeping genes as negative controls to run GEPA under four fold change thresholds, 1.5, 2, 2.5 and 316. Because the 3508 housekeeping genes should not exhibit lineage-specificity and should be grouped into “Gradient” or “Even” categories, the less ratio of LEP from the 3508 genes would indicate a higher definition to recognize LEP. We found that false positive rates of those housekeeping genes were much lower with thresholds of 2.5 and 3 (3.5% and 1.9% respectively) when compared to the thresholds of 1.5 and 2 (36.8 and 10.2 respectively) (Fig. 2b). Next, we input 49 embryonic stem cell-specific and cardiac-specific genes as positive controls to run GEPA and calculated the false negative rates (Supplementary Table. 1). Thresholds 1.5 and 2 captured all the positive controls with LEG, whereas thresholds of 2.5 and 3 lost LEG in 4.3% and 16.7% of all inputs, respectively. All these tests indicated that thresholds of 1.5 and 2 were too loose and threshold of 3 was too stringent. However, threshold of 2.5 kept a high definition for LEG recognition without introducing too much falsely identified LEG (Fig. 2b). Thus, we then classified all sequenced genes from the five cell types into 32 categories using a threshold of 2.5 (Fig. 2c). Of all sequenced genes, a total of 79% genes exhibited no lineage(s)–enrichment, with 48% in the “Even” category and 31% in the “Gradient” category. Approximately 21% sequenced genes showed single or multiple lineages-enrichment. In addition, we found the categories of single lineage-enrichment contained a higher average number of genes per category (total 1680 genes in 5 categories) than that of categories with a multiple-lineages specificity (total 1560 genes in 25 categories) (Fig. 2c). The genes showing multiple-lineages-specific enrichment were mainly distributed in categories of “ES&MCP”, “ES&MCP&CM”, “SM&EC”, and “CM&SM&EC”. This lineage enrichment analysis was consistent with the PCA and hierarchical clustering results in Fig. 1e and Supplementary Fig. 2, indicating that the more closely related cell types during cardiovascular differentiation would share the more commonly expressed genes (Fig. 2c). The LEG distribution pattern was lost when the sequenced genes of 5 cell types were shuffled (Supplementary Table. 2), indicating the LEG distribution could genuinely recognize the intrinsic nature of gene expressions in those cell types. Noticeably, the LEG patterns from RNA-seq results were highly consistent with the qRT-PCR validation in Fig. 2d. Therefore, our newly developed GEPA algorithm successfully identified LEG of all the sequenced genes for studying human cardiovascular development.


Comparative transcriptomic analysis of multiple cardiovascular fates from embryonic stem cells predicts novel regulators in human cardiogenesis.

Li Y, Lin B, Yang L - Sci Rep (2015)

GEPA algorithm identified lineage-enriched genes from RNA-seq data. (a) Schematic representation of the “GEPA” algorithm workflow to identify the lineage-enriched patterns of the genes. (b) Estimation of false positive and false negative ratio of GEPA algorithm at different thresholds of FPKM fold change. (c) Distribution of the genes across the expression pattern categories. Lineage-enriched patterns are indicated at left side. In the same row, rectangles filled with blue are at least 2.5 fold higher than those in light gray. The bars indicating the number of genes are color coded. Blue for single lineage-enriched groups. Green, yellow and orange for two, three and four lineage-enriched groups, respectively. Light blue for “Gradient” group and purple for “Even” group. (d) qRT-PCR validation of the signature genes for lineage-enriched categories. Gene name and expression pattern defined by GEPA (in brackets) were shown above the plots.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4440522&req=5

f2: GEPA algorithm identified lineage-enriched genes from RNA-seq data. (a) Schematic representation of the “GEPA” algorithm workflow to identify the lineage-enriched patterns of the genes. (b) Estimation of false positive and false negative ratio of GEPA algorithm at different thresholds of FPKM fold change. (c) Distribution of the genes across the expression pattern categories. Lineage-enriched patterns are indicated at left side. In the same row, rectangles filled with blue are at least 2.5 fold higher than those in light gray. The bars indicating the number of genes are color coded. Blue for single lineage-enriched groups. Green, yellow and orange for two, three and four lineage-enriched groups, respectively. Light blue for “Gradient” group and purple for “Even” group. (d) qRT-PCR validation of the signature genes for lineage-enriched categories. Gene name and expression pattern defined by GEPA (in brackets) were shown above the plots.
Mentions: To compare the relative expression levels of each gene within the five cell types, we developed a new algorithm named Gene Expression Pattern Analyzer (GEPA) (Fig. 2a). This algorithm recognized the single or multiple lineages enrichment pattern (LEP) of each individual gene within the five cell types and the lineage-specificity was set based on the fold change of gene expression over an arbitrary threshold (see Methods for details). Using GEPA, all sequenced genes from the five cell types could be grouped based on their lineage-specificity. If a clear lineage-specific enrichment was not identified, genes would be grouped into “Gradient” or “Even” categories, which indicate the patterns of commonly expressed genes across all five samples with mild or no lineage specificity, respectively (Fig. 2a). Since LEP of a single gene is largely relying on the arbitrary setting of fold change thresholds, to estimate the false positive recognition of the LEP by GEPA, we input 3804 documented human housekeeping genes as negative controls to run GEPA under four fold change thresholds, 1.5, 2, 2.5 and 316. Because the 3508 housekeeping genes should not exhibit lineage-specificity and should be grouped into “Gradient” or “Even” categories, the less ratio of LEP from the 3508 genes would indicate a higher definition to recognize LEP. We found that false positive rates of those housekeeping genes were much lower with thresholds of 2.5 and 3 (3.5% and 1.9% respectively) when compared to the thresholds of 1.5 and 2 (36.8 and 10.2 respectively) (Fig. 2b). Next, we input 49 embryonic stem cell-specific and cardiac-specific genes as positive controls to run GEPA and calculated the false negative rates (Supplementary Table. 1). Thresholds 1.5 and 2 captured all the positive controls with LEG, whereas thresholds of 2.5 and 3 lost LEG in 4.3% and 16.7% of all inputs, respectively. All these tests indicated that thresholds of 1.5 and 2 were too loose and threshold of 3 was too stringent. However, threshold of 2.5 kept a high definition for LEG recognition without introducing too much falsely identified LEG (Fig. 2b). Thus, we then classified all sequenced genes from the five cell types into 32 categories using a threshold of 2.5 (Fig. 2c). Of all sequenced genes, a total of 79% genes exhibited no lineage(s)–enrichment, with 48% in the “Even” category and 31% in the “Gradient” category. Approximately 21% sequenced genes showed single or multiple lineages-enrichment. In addition, we found the categories of single lineage-enrichment contained a higher average number of genes per category (total 1680 genes in 5 categories) than that of categories with a multiple-lineages specificity (total 1560 genes in 25 categories) (Fig. 2c). The genes showing multiple-lineages-specific enrichment were mainly distributed in categories of “ES&MCP”, “ES&MCP&CM”, “SM&EC”, and “CM&SM&EC”. This lineage enrichment analysis was consistent with the PCA and hierarchical clustering results in Fig. 1e and Supplementary Fig. 2, indicating that the more closely related cell types during cardiovascular differentiation would share the more commonly expressed genes (Fig. 2c). The LEG distribution pattern was lost when the sequenced genes of 5 cell types were shuffled (Supplementary Table. 2), indicating the LEG distribution could genuinely recognize the intrinsic nature of gene expressions in those cell types. Noticeably, the LEG patterns from RNA-seq results were highly consistent with the qRT-PCR validation in Fig. 2d. Therefore, our newly developed GEPA algorithm successfully identified LEG of all the sequenced genes for studying human cardiovascular development.

Bottom Line: Furthermore, GEPA analysis revealed the MCP-specific expressions of genes in ephrin signaling pathway, positive role of which in cardiomyocyte differentiation was further validated experimentally.By using RNA-seq plus GEPA workflow, we also identified stage-specific RNA splicing switch and lineage-enriched long non-coding RNAs during human cardiovascular differentiation.Overall, our study utilized multi-cell-fate transcriptomic comparison analysis to establish a lineage-specific gene expression map for predicting and validating novel regulatory mechanisms underlying early human cardiovascular development.

View Article: PubMed Central - PubMed

Affiliation: Department of Developmental Biology, University of Pittsburgh School of Medicine, 530 45th Street, Rangos Research Center, Pittsburgh, PA 15201.

ABSTRACT
Dissecting the gene expression programs which control the early stage cardiovascular development is essential for understanding the molecular mechanisms of human heart development and heart disease. Here, we performed transcriptome sequencing (RNA-seq) of highly purified human Embryonic Stem Cells (hESCs), hESC-derived Multipotential Cardiovascular Progenitors (MCPs) and MCP-specified three cardiovascular lineages. A novel algorithm, named as Gene Expression Pattern Analyzer (GEPA), was developed to obtain a refined lineage-specificity map of all sequenced genes, which reveals dynamic changes of transcriptional factor networks underlying early human cardiovascular development. Moreover, our GEPA predictions captured ~90% of top-ranked regulatory cardiac genes that were previously predicted based on chromatin signature changes in hESCs, and further defined their cardiovascular lineage-specificities, indicating that our multi-fate comparison analysis could predict novel regulatory genes. Furthermore, GEPA analysis revealed the MCP-specific expressions of genes in ephrin signaling pathway, positive role of which in cardiomyocyte differentiation was further validated experimentally. By using RNA-seq plus GEPA workflow, we also identified stage-specific RNA splicing switch and lineage-enriched long non-coding RNAs during human cardiovascular differentiation. Overall, our study utilized multi-cell-fate transcriptomic comparison analysis to establish a lineage-specific gene expression map for predicting and validating novel regulatory mechanisms underlying early human cardiovascular development.

No MeSH data available.


Related in: MedlinePlus