Limits...
Sensitive detection of rare disease-associated cell subsets via representation learning

View Article: PubMed Central - PubMed

ABSTRACT

Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.

No MeSH data available.


Related in: MedlinePlus

Benchmark results on the identification of in silico spike-in rare leukaemic blast populations for two AML subclasses.(a) Whole-sample representation learned by CellCnn for various AML blast cell population frequencies. The three classes are well separated (linearly separable) in the CellCnn-based representation space (i.e., when projected to the two most relevant AML-specific filters). (b) Comparison to the baseline methods for whole-sample representation (Citrus9, moment-based: multi-cell input summary profiles, denoising autoencoder26) for AML blast population at 0.1%. The three classes are not well separated in the representation space learned by these approaches. (c) Comparison to baseline methods for single-cell classification (LR, logistic regression; outlier, distance-based outlier detection20; RF, random forests; SVM, support vector machines Citrus9) for AML blast population at 0.1%. For all methods except Citrus, average precision–recall curves for recovery of blast cells on the test samples are reported. Shadowed areas indicate 95% confidence intervals. Citrus does not provide a precision–recall series; therefore, a single precision–recall point is computed for each test sample. (d) Single-cell classification performance of CellCnn for various low AML blast cell population frequencies. Average precision–recall curves on the test samples are reported with shadowed areas indicating 95% confidence intervals. CBF, core binding factor translocation; CN, cytogenetically normal.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5384229&req=5

f5: Benchmark results on the identification of in silico spike-in rare leukaemic blast populations for two AML subclasses.(a) Whole-sample representation learned by CellCnn for various AML blast cell population frequencies. The three classes are well separated (linearly separable) in the CellCnn-based representation space (i.e., when projected to the two most relevant AML-specific filters). (b) Comparison to the baseline methods for whole-sample representation (Citrus9, moment-based: multi-cell input summary profiles, denoising autoencoder26) for AML blast population at 0.1%. The three classes are not well separated in the representation space learned by these approaches. (c) Comparison to baseline methods for single-cell classification (LR, logistic regression; outlier, distance-based outlier detection20; RF, random forests; SVM, support vector machines Citrus9) for AML blast population at 0.1%. For all methods except Citrus, average precision–recall curves for recovery of blast cells on the test samples are reported. Shadowed areas indicate 95% confidence intervals. Citrus does not provide a precision–recall series; therefore, a single precision–recall point is computed for each test sample. (d) Single-cell classification performance of CellCnn for various low AML blast cell population frequencies. Average precision–recall curves on the test samples are reported with shadowed areas indicating 95% confidence intervals. CBF, core binding factor translocation; CN, cytogenetically normal.

Mentions: Due to the limited number of test samples available, we assessed the ability of CellCnn to correctly predict the phenotype of new samples on the basis of the characteristics of the learned representation. A good representation should clearly separate healthy, CN AML and CBF AML samples. To this end, we computed a two-dimensional projection of each mass cytometry sample by projecting it to the two most relevant AML-specific filters. We refer to this projection as the CellCnn-based representation. In a similar fashion, we computed a two-dimensional Citrus-based representation by projecting each mass cytometry sample to the two most relevant AML-specific clusters. Finally, we derived two-dimensional moment-based and autoencoder-based sample representations by projecting the full sample representations to their first two principal components (for details see Methods). The two-dimensional representations for the training, validation and test samples obtained by the different methods are visualized in Fig. 5a,b, where it is illustrated that the CellCnn-based representation achieves the clearest separation between the healthy, CN AML and CBF AML samples.


Sensitive detection of rare disease-associated cell subsets via representation learning
Benchmark results on the identification of in silico spike-in rare leukaemic blast populations for two AML subclasses.(a) Whole-sample representation learned by CellCnn for various AML blast cell population frequencies. The three classes are well separated (linearly separable) in the CellCnn-based representation space (i.e., when projected to the two most relevant AML-specific filters). (b) Comparison to the baseline methods for whole-sample representation (Citrus9, moment-based: multi-cell input summary profiles, denoising autoencoder26) for AML blast population at 0.1%. The three classes are not well separated in the representation space learned by these approaches. (c) Comparison to baseline methods for single-cell classification (LR, logistic regression; outlier, distance-based outlier detection20; RF, random forests; SVM, support vector machines Citrus9) for AML blast population at 0.1%. For all methods except Citrus, average precision–recall curves for recovery of blast cells on the test samples are reported. Shadowed areas indicate 95% confidence intervals. Citrus does not provide a precision–recall series; therefore, a single precision–recall point is computed for each test sample. (d) Single-cell classification performance of CellCnn for various low AML blast cell population frequencies. Average precision–recall curves on the test samples are reported with shadowed areas indicating 95% confidence intervals. CBF, core binding factor translocation; CN, cytogenetically normal.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5384229&req=5

f5: Benchmark results on the identification of in silico spike-in rare leukaemic blast populations for two AML subclasses.(a) Whole-sample representation learned by CellCnn for various AML blast cell population frequencies. The three classes are well separated (linearly separable) in the CellCnn-based representation space (i.e., when projected to the two most relevant AML-specific filters). (b) Comparison to the baseline methods for whole-sample representation (Citrus9, moment-based: multi-cell input summary profiles, denoising autoencoder26) for AML blast population at 0.1%. The three classes are not well separated in the representation space learned by these approaches. (c) Comparison to baseline methods for single-cell classification (LR, logistic regression; outlier, distance-based outlier detection20; RF, random forests; SVM, support vector machines Citrus9) for AML blast population at 0.1%. For all methods except Citrus, average precision–recall curves for recovery of blast cells on the test samples are reported. Shadowed areas indicate 95% confidence intervals. Citrus does not provide a precision–recall series; therefore, a single precision–recall point is computed for each test sample. (d) Single-cell classification performance of CellCnn for various low AML blast cell population frequencies. Average precision–recall curves on the test samples are reported with shadowed areas indicating 95% confidence intervals. CBF, core binding factor translocation; CN, cytogenetically normal.
Mentions: Due to the limited number of test samples available, we assessed the ability of CellCnn to correctly predict the phenotype of new samples on the basis of the characteristics of the learned representation. A good representation should clearly separate healthy, CN AML and CBF AML samples. To this end, we computed a two-dimensional projection of each mass cytometry sample by projecting it to the two most relevant AML-specific filters. We refer to this projection as the CellCnn-based representation. In a similar fashion, we computed a two-dimensional Citrus-based representation by projecting each mass cytometry sample to the two most relevant AML-specific clusters. Finally, we derived two-dimensional moment-based and autoencoder-based sample representations by projecting the full sample representations to their first two principal components (for details see Methods). The two-dimensional representations for the training, validation and test samples obtained by the different methods are visualized in Fig. 5a,b, where it is illustrated that the CellCnn-based representation achieves the clearest separation between the healthy, CN AML and CBF AML samples.

View Article: PubMed Central - PubMed

ABSTRACT

Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.

No MeSH data available.


Related in: MedlinePlus