Limits...
Sensitive detection of rare disease-associated cell subsets via representation learning

View Article: PubMed Central - PubMed

ABSTRACT

Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.

No MeSH data available.


Related in: MedlinePlus

Detection of rare CMV seropositivity-associated cell populations.(a) Visualization of the cell subsets selected by CellCnn and Citrus across 100 Monte Carlo cross-validation (CV) repetitions. Centroids of selected populations are highlighted on a t-SNE map computed from all samples using 20,000 cells per individual (see Methods for details). The cell population most frequently (81 out of 100 times) selected by CellCnn is positively associated with CMV prior infection, whereas the second most frequent cell subset is negatively associated with CMV seropositivity. (b) t-SNE map colour-coded according to abundance of selected markers. The top-left subplot depicts the cell subset most frequently selected by CellCnn, corresponding to cluster 1 in a, (see Methods for details). This cell subset corresponds to a memory-like (NKG2C+, CD57+) NK (CD56+, CD3−) and NK T (CD56+, CD3+) cell population. (c) Histograms of selected marker abundances for the whole-cell population and the cell subset most frequently selected by CellCnn. (d) Boxplot of area under the ROC curve (ROC AUC) on the test samples for 100 Monte Carlo CV repetitions. The median test ROC AUC for CellCnn is equal to 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5384229&req=5

f3: Detection of rare CMV seropositivity-associated cell populations.(a) Visualization of the cell subsets selected by CellCnn and Citrus across 100 Monte Carlo cross-validation (CV) repetitions. Centroids of selected populations are highlighted on a t-SNE map computed from all samples using 20,000 cells per individual (see Methods for details). The cell population most frequently (81 out of 100 times) selected by CellCnn is positively associated with CMV prior infection, whereas the second most frequent cell subset is negatively associated with CMV seropositivity. (b) t-SNE map colour-coded according to abundance of selected markers. The top-left subplot depicts the cell subset most frequently selected by CellCnn, corresponding to cluster 1 in a, (see Methods for details). This cell subset corresponds to a memory-like (NKG2C+, CD57+) NK (CD56+, CD3−) and NK T (CD56+, CD3+) cell population. (c) Histograms of selected marker abundances for the whole-cell population and the cell subset most frequently selected by CellCnn. (d) Boxplot of area under the ROC curve (ROC AUC) on the test samples for 100 Monte Carlo CV repetitions. The median test ROC AUC for CellCnn is equal to 1.

Mentions: We went on to assess CellCnn's capability to detect rare disease-associated cell populations. Specifically, we analysed a mass cytometry data set acquired to characterize human NK cell diversity and associate NK cell subsets with genetic and environmental factors, namely prior CMV infection18. This data set comprises mass cytometry measurements of 36 markers, including 28 NK cell receptors, for PBMC samples of 20 donors with varying serology for CMV (see section Data sets in Methods). Applied to the ungated single-cell data, CellCnn identified two CMV seropositivity-associated cell populations (Fig. 3a and Supplementary Fig. 5). The most predictive cell population is rare (frequency<1%), positively associated with previous CMV infection and exhibits a memory-like NKG2C+, CD57+ NK cell phenotype (Fig. 3b,c) as further described in ref. 18. The state-of-the-art cell population classifier Citrus failed to identify this rare, predictive cell population (Fig. 3a,b) and, as a result, exhibited inferior classification performance in comparison to CellCnn (Fig. 3d).


Sensitive detection of rare disease-associated cell subsets via representation learning
Detection of rare CMV seropositivity-associated cell populations.(a) Visualization of the cell subsets selected by CellCnn and Citrus across 100 Monte Carlo cross-validation (CV) repetitions. Centroids of selected populations are highlighted on a t-SNE map computed from all samples using 20,000 cells per individual (see Methods for details). The cell population most frequently (81 out of 100 times) selected by CellCnn is positively associated with CMV prior infection, whereas the second most frequent cell subset is negatively associated with CMV seropositivity. (b) t-SNE map colour-coded according to abundance of selected markers. The top-left subplot depicts the cell subset most frequently selected by CellCnn, corresponding to cluster 1 in a, (see Methods for details). This cell subset corresponds to a memory-like (NKG2C+, CD57+) NK (CD56+, CD3−) and NK T (CD56+, CD3+) cell population. (c) Histograms of selected marker abundances for the whole-cell population and the cell subset most frequently selected by CellCnn. (d) Boxplot of area under the ROC curve (ROC AUC) on the test samples for 100 Monte Carlo CV repetitions. The median test ROC AUC for CellCnn is equal to 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5384229&req=5

f3: Detection of rare CMV seropositivity-associated cell populations.(a) Visualization of the cell subsets selected by CellCnn and Citrus across 100 Monte Carlo cross-validation (CV) repetitions. Centroids of selected populations are highlighted on a t-SNE map computed from all samples using 20,000 cells per individual (see Methods for details). The cell population most frequently (81 out of 100 times) selected by CellCnn is positively associated with CMV prior infection, whereas the second most frequent cell subset is negatively associated with CMV seropositivity. (b) t-SNE map colour-coded according to abundance of selected markers. The top-left subplot depicts the cell subset most frequently selected by CellCnn, corresponding to cluster 1 in a, (see Methods for details). This cell subset corresponds to a memory-like (NKG2C+, CD57+) NK (CD56+, CD3−) and NK T (CD56+, CD3+) cell population. (c) Histograms of selected marker abundances for the whole-cell population and the cell subset most frequently selected by CellCnn. (d) Boxplot of area under the ROC curve (ROC AUC) on the test samples for 100 Monte Carlo CV repetitions. The median test ROC AUC for CellCnn is equal to 1.
Mentions: We went on to assess CellCnn's capability to detect rare disease-associated cell populations. Specifically, we analysed a mass cytometry data set acquired to characterize human NK cell diversity and associate NK cell subsets with genetic and environmental factors, namely prior CMV infection18. This data set comprises mass cytometry measurements of 36 markers, including 28 NK cell receptors, for PBMC samples of 20 donors with varying serology for CMV (see section Data sets in Methods). Applied to the ungated single-cell data, CellCnn identified two CMV seropositivity-associated cell populations (Fig. 3a and Supplementary Fig. 5). The most predictive cell population is rare (frequency<1%), positively associated with previous CMV infection and exhibits a memory-like NKG2C+, CD57+ NK cell phenotype (Fig. 3b,c) as further described in ref. 18. The state-of-the-art cell population classifier Citrus failed to identify this rare, predictive cell population (Fig. 3a,b) and, as a result, exhibited inferior classification performance in comparison to CellCnn (Fig. 3d).

View Article: PubMed Central - PubMed

ABSTRACT

Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.

No MeSH data available.


Related in: MedlinePlus