Limits...
Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data

View Article: PubMed Central - PubMed

ABSTRACT

Single-cell RNA-Sequencing (scRNA-Seq) is a revolutionary technique for discovering and describing cell types in heterogeneous tissues, yet its measurement of expression often suffers from large systematic bias. A major source of this bias is the cell cycle, which introduces large within-cell-type heterogeneity that can obscure the differences in expression between cell types. The current method for removing the cell-cycle effect is unable to effectively identify this effect and has a high risk of removing other biological components of interest, compromising downstream analysis. We present ccRemover, a new method that reliably identifies the cell-cycle effect and removes it. ccRemover preserves other biological signals of interest in the data and thus can serve as an important pre-processing step for many scRNA-Seq data analyses. The effectiveness of ccRemover is demonstrated using simulation data and three real scRNA-Seq datasets, where it boosts the performance of existing clustering algorithms in distinguishing between cell types.

No MeSH data available.


Related in: MedlinePlus

Bar plots of the clustering assignments for the lung adenocarcinoma cells.(a) Original data. The LC.PT and LC.PT_RE cells split into two clusters each containing a roughly equal proportion of cells from each sample, indicating that 4-means failed to separate the cells from these two samples. (b) scLVM corrected data. Similar to the original data scLVM fails to split the LC.PT and LC.PT_RE cells into separate clusters. (c) ccRemover corrected data. The separation of the LC.PT and LC.PT_RE cells between the clusters has improved significantly with one cluster dominated by LC.PT cells and the other by LC.PT_RE cells.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037372&req=5

f4: Bar plots of the clustering assignments for the lung adenocarcinoma cells.(a) Original data. The LC.PT and LC.PT_RE cells split into two clusters each containing a roughly equal proportion of cells from each sample, indicating that 4-means failed to separate the cells from these two samples. (b) scLVM corrected data. Similar to the original data scLVM fails to split the LC.PT and LC.PT_RE cells into separate clusters. (c) ccRemover corrected data. The separation of the LC.PT and LC.PT_RE cells between the clusters has improved significantly with one cluster dominated by LC.PT cells and the other by LC.PT_RE cells.

Mentions: When using 3-means clustering on the original data, the three clusters represent the three cell types perfectly, and thus there is no room for improvement. Instead, we consider using 4-means clustering, in order to see whether the 77 cells of the first cell type can be clustered accordingly to the two sets of technical replicates, LC.PT and LC.PT_RE. Figure 4 shows the results. On both the original data (Fig. 4a) and the scLVM corrected data (Fig. 4b), the LC.PT and LC.PT_RE cells are split into two clusters (clusters 3 and 4) each containing roughly equal proportions of cells from each set, indicating that the technical replicates are non-separable. On the ccRemover corrected data (Fig. 4c), on the other hand, the majority (80%) of cluster 3 are cells from the LC.PT_RE group, while the majority (89%) of cluster 4 are cells from the LC.PT group. This means that cells from different sets of technical replicates are largely separated by the batch effect. This batch effect is present in all three of the original and corrected datasets, but it has a noticeable influence in the clustering results only on ccRemover corrected data. The reason could be that the batch effect is confounded by the stronger cell-cycle effect in the original data, and it stands out when the cell-cycle effect was removed by ccRemover. scLVM may have not removed the cell-cycle effect thoroughly enough to make a difference.


Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data
Bar plots of the clustering assignments for the lung adenocarcinoma cells.(a) Original data. The LC.PT and LC.PT_RE cells split into two clusters each containing a roughly equal proportion of cells from each sample, indicating that 4-means failed to separate the cells from these two samples. (b) scLVM corrected data. Similar to the original data scLVM fails to split the LC.PT and LC.PT_RE cells into separate clusters. (c) ccRemover corrected data. The separation of the LC.PT and LC.PT_RE cells between the clusters has improved significantly with one cluster dominated by LC.PT cells and the other by LC.PT_RE cells.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037372&req=5

f4: Bar plots of the clustering assignments for the lung adenocarcinoma cells.(a) Original data. The LC.PT and LC.PT_RE cells split into two clusters each containing a roughly equal proportion of cells from each sample, indicating that 4-means failed to separate the cells from these two samples. (b) scLVM corrected data. Similar to the original data scLVM fails to split the LC.PT and LC.PT_RE cells into separate clusters. (c) ccRemover corrected data. The separation of the LC.PT and LC.PT_RE cells between the clusters has improved significantly with one cluster dominated by LC.PT cells and the other by LC.PT_RE cells.
Mentions: When using 3-means clustering on the original data, the three clusters represent the three cell types perfectly, and thus there is no room for improvement. Instead, we consider using 4-means clustering, in order to see whether the 77 cells of the first cell type can be clustered accordingly to the two sets of technical replicates, LC.PT and LC.PT_RE. Figure 4 shows the results. On both the original data (Fig. 4a) and the scLVM corrected data (Fig. 4b), the LC.PT and LC.PT_RE cells are split into two clusters (clusters 3 and 4) each containing roughly equal proportions of cells from each set, indicating that the technical replicates are non-separable. On the ccRemover corrected data (Fig. 4c), on the other hand, the majority (80%) of cluster 3 are cells from the LC.PT_RE group, while the majority (89%) of cluster 4 are cells from the LC.PT group. This means that cells from different sets of technical replicates are largely separated by the batch effect. This batch effect is present in all three of the original and corrected datasets, but it has a noticeable influence in the clustering results only on ccRemover corrected data. The reason could be that the batch effect is confounded by the stronger cell-cycle effect in the original data, and it stands out when the cell-cycle effect was removed by ccRemover. scLVM may have not removed the cell-cycle effect thoroughly enough to make a difference.

View Article: PubMed Central - PubMed

ABSTRACT

Single-cell RNA-Sequencing (scRNA-Seq) is a revolutionary technique for discovering and describing cell types in heterogeneous tissues, yet its measurement of expression often suffers from large systematic bias. A major source of this bias is the cell cycle, which introduces large within-cell-type heterogeneity that can obscure the differences in expression between cell types. The current method for removing the cell-cycle effect is unable to effectively identify this effect and has a high risk of removing other biological components of interest, compromising downstream analysis. We present ccRemover, a new method that reliably identifies the cell-cycle effect and removes it. ccRemover preserves other biological signals of interest in the data and thus can serve as an important pre-processing step for many scRNA-Seq data analyses. The effectiveness of ccRemover is demonstrated using simulation data and three real scRNA-Seq datasets, where it boosts the performance of existing clustering algorithms in distinguishing between cell types.

No MeSH data available.


Related in: MedlinePlus