Limits...
Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data

View Article: PubMed Central - PubMed

ABSTRACT

Single-cell RNA-Sequencing (scRNA-Seq) is a revolutionary technique for discovering and describing cell types in heterogeneous tissues, yet its measurement of expression often suffers from large systematic bias. A major source of this bias is the cell cycle, which introduces large within-cell-type heterogeneity that can obscure the differences in expression between cell types. The current method for removing the cell-cycle effect is unable to effectively identify this effect and has a high risk of removing other biological components of interest, compromising downstream analysis. We present ccRemover, a new method that reliably identifies the cell-cycle effect and removes it. ccRemover preserves other biological signals of interest in the data and thus can serve as an important pre-processing step for many scRNA-Seq data analyses. The effectiveness of ccRemover is demonstrated using simulation data and three real scRNA-Seq datasets, where it boosts the performance of existing clustering algorithms in distinguishing between cell types.

No MeSH data available.


Related in: MedlinePlus

Dendrogram plots from the hierarchical clustering on the original, ccRemover corrected and scLVM corrected glioblastoma data.The tumor of each of the cells is represented by their colors, MGH26 (yellow), MGH28 (purple), MGH29 (orange), MGH30 (blue) and MGH31 (red). The clustering assignments are displayed as boxes separating the cells. (a) Original data. There are significant misclassifications within the clusters for the original dataset. In particular the MGH28, MGH30 and MGH31 clusters contain significant numbers of cells from the other tumors. (b) scLVM corrected data. There is an increase in the accuracy of the clustering from the original data, however the MGH26 and MGH30 cells are now mixed between clusters. (c) ccRemover corrected data. There is a significant improvement in the purity clusters here compared to the original and scLVM corrected data. The MGH28 cluster is now much purer and only contains a few cells from the other tumors.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037372&req=5

f3: Dendrogram plots from the hierarchical clustering on the original, ccRemover corrected and scLVM corrected glioblastoma data.The tumor of each of the cells is represented by their colors, MGH26 (yellow), MGH28 (purple), MGH29 (orange), MGH30 (blue) and MGH31 (red). The clustering assignments are displayed as boxes separating the cells. (a) Original data. There are significant misclassifications within the clusters for the original dataset. In particular the MGH28, MGH30 and MGH31 clusters contain significant numbers of cells from the other tumors. (b) scLVM corrected data. There is an increase in the accuracy of the clustering from the original data, however the MGH26 and MGH30 cells are now mixed between clusters. (c) ccRemover corrected data. There is a significant improvement in the purity clusters here compared to the original and scLVM corrected data. The MGH28 cluster is now much purer and only contains a few cells from the other tumors.

Mentions: Hierarchical clustering was applied to the (original, scLVM corrected, and ccRemover corrected) data, splitting the cells into five clusters, with each cluster being assigned the class of the majority of the cells contained within the cluster. The results are shown in Fig. 3. On the original data, 87.44% of the cells were clustered correctly. From the plot of the dendrogram (Fig. 3a) it is clear that the MGH31 (red) cluster contains cells from all the other tumors that have been incorrectly classified, the MGH28 (purple) and MGH30 (blue) clusters also display significant impurities. On the scLVM corrected data, 90.00% of the cells were classified correctly, an improvement of over 2.5% from the original data. On the ccRemover corrected data, 92.32% of the cells were classified correctly, an increase of nearly 5% from the original data. The purity of the clusters in the dendrogram (Fig. 3c) for the ccRemover corrected data show marked improvement over the original data, and especially the MGH28 and MGH31 clusters show convincing improvements in purity. This result is particularly striking when considering the very low levels of cell-cycle activity within this dataset and demonstrates that ccRemover can improve the downstream analysis of scRNA-Seq data even when the cell-cycle effect is not very strong.


Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data
Dendrogram plots from the hierarchical clustering on the original, ccRemover corrected and scLVM corrected glioblastoma data.The tumor of each of the cells is represented by their colors, MGH26 (yellow), MGH28 (purple), MGH29 (orange), MGH30 (blue) and MGH31 (red). The clustering assignments are displayed as boxes separating the cells. (a) Original data. There are significant misclassifications within the clusters for the original dataset. In particular the MGH28, MGH30 and MGH31 clusters contain significant numbers of cells from the other tumors. (b) scLVM corrected data. There is an increase in the accuracy of the clustering from the original data, however the MGH26 and MGH30 cells are now mixed between clusters. (c) ccRemover corrected data. There is a significant improvement in the purity clusters here compared to the original and scLVM corrected data. The MGH28 cluster is now much purer and only contains a few cells from the other tumors.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037372&req=5

f3: Dendrogram plots from the hierarchical clustering on the original, ccRemover corrected and scLVM corrected glioblastoma data.The tumor of each of the cells is represented by their colors, MGH26 (yellow), MGH28 (purple), MGH29 (orange), MGH30 (blue) and MGH31 (red). The clustering assignments are displayed as boxes separating the cells. (a) Original data. There are significant misclassifications within the clusters for the original dataset. In particular the MGH28, MGH30 and MGH31 clusters contain significant numbers of cells from the other tumors. (b) scLVM corrected data. There is an increase in the accuracy of the clustering from the original data, however the MGH26 and MGH30 cells are now mixed between clusters. (c) ccRemover corrected data. There is a significant improvement in the purity clusters here compared to the original and scLVM corrected data. The MGH28 cluster is now much purer and only contains a few cells from the other tumors.
Mentions: Hierarchical clustering was applied to the (original, scLVM corrected, and ccRemover corrected) data, splitting the cells into five clusters, with each cluster being assigned the class of the majority of the cells contained within the cluster. The results are shown in Fig. 3. On the original data, 87.44% of the cells were clustered correctly. From the plot of the dendrogram (Fig. 3a) it is clear that the MGH31 (red) cluster contains cells from all the other tumors that have been incorrectly classified, the MGH28 (purple) and MGH30 (blue) clusters also display significant impurities. On the scLVM corrected data, 90.00% of the cells were classified correctly, an improvement of over 2.5% from the original data. On the ccRemover corrected data, 92.32% of the cells were classified correctly, an increase of nearly 5% from the original data. The purity of the clusters in the dendrogram (Fig. 3c) for the ccRemover corrected data show marked improvement over the original data, and especially the MGH28 and MGH31 clusters show convincing improvements in purity. This result is particularly striking when considering the very low levels of cell-cycle activity within this dataset and demonstrates that ccRemover can improve the downstream analysis of scRNA-Seq data even when the cell-cycle effect is not very strong.

View Article: PubMed Central - PubMed

ABSTRACT

Single-cell RNA-Sequencing (scRNA-Seq) is a revolutionary technique for discovering and describing cell types in heterogeneous tissues, yet its measurement of expression often suffers from large systematic bias. A major source of this bias is the cell cycle, which introduces large within-cell-type heterogeneity that can obscure the differences in expression between cell types. The current method for removing the cell-cycle effect is unable to effectively identify this effect and has a high risk of removing other biological components of interest, compromising downstream analysis. We present ccRemover, a new method that reliably identifies the cell-cycle effect and removes it. ccRemover preserves other biological signals of interest in the data and thus can serve as an important pre-processing step for many scRNA-Seq data analyses. The effectiveness of ccRemover is demonstrated using simulation data and three real scRNA-Seq datasets, where it boosts the performance of existing clustering algorithms in distinguishing between cell types.

No MeSH data available.


Related in: MedlinePlus