Limits...
Sensitive detection of rare disease-associated cell subsets via representation learning

View Article: PubMed Central - PubMed

ABSTRACT

Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.

No MeSH data available.


Identification of in silico spike-in rare leukaemic blast populations for two AML subgroups.(a) The spiked-in subset (frequency=0.1%) of blast cells from a cytogenetically normal (CN) patient is highlighted in red on the left plot (ground truth) and compared with cells identified by CellCnn, which are marked in red on the right plot. (b) Similar setting as (a) for a spiked-in subset of blast cells from a core-binding-factor translocation (CBF) patient. (c,d) Similar settings as (a,b) for spiked-in subsets of blast cells with even lower frequency (0.01%). (e) Histograms of selected cell surface markers for the disease-associated cell populations identified by CellCnn. The markers presented highlight the differences of blast cell immunophenotypic profiles between CBF and CN patients. CBF, core binding factor translocation; CN, cytogenetically normal.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5384229&req=5

f4: Identification of in silico spike-in rare leukaemic blast populations for two AML subgroups.(a) The spiked-in subset (frequency=0.1%) of blast cells from a cytogenetically normal (CN) patient is highlighted in red on the left plot (ground truth) and compared with cells identified by CellCnn, which are marked in red on the right plot. (b) Similar setting as (a) for a spiked-in subset of blast cells from a core-binding-factor translocation (CBF) patient. (c,d) Similar settings as (a,b) for spiked-in subsets of blast cells with even lower frequency (0.01%). (e) Histograms of selected cell surface markers for the disease-associated cell populations identified by CellCnn. The markers presented highlight the differences of blast cell immunophenotypic profiles between CBF and CN patients. CBF, core binding factor translocation; CN, cytogenetically normal.

Mentions: We next assessed the scope of CellCnn to detect extremely rare cell populations associated with MRD in acute myeloid leukaemia (AML). Specifically, we analysed mass cytometry data sets of healthy bone marrow samples with in silico leukaemic blast spike-in subpopulations of decreasing frequency to mimic the MRD phenotype19. To objectively compare CellCnn with existing methods with respect to detecting rare phenotype-associated cell populations, we assembled a benchmark data set with clearly defined training/validation and test samples (see Data sets in Methods section). Spike-ins from patients characterized as cytogenetically normal (CN), as well as from patients with core-binding factor translocation [t(8;21) or inv(16)] (CBF) were considered. CellCnn was trained on the three-class classification problem of sample stratification as healthy, CN AML or CBF AML and correctly identified the leukaemic blast subsets in the test samples (not used for training) at a frequency as low as 0.1% (500/500,000 blast/total cells) (Fig. 4a,b). We found that the predictive subsets for the AML subgroups shared differentially abundant markers (CD34, CD45, CD44) but also exhibited several differences (Fig. 4e). For instance, CN AML blasts were CD7+, CD38+, CD117+, whereas CBF AML blasts were CD15+, CD38mid. These results are in accordance with the findings presented in the original study19.


Sensitive detection of rare disease-associated cell subsets via representation learning
Identification of in silico spike-in rare leukaemic blast populations for two AML subgroups.(a) The spiked-in subset (frequency=0.1%) of blast cells from a cytogenetically normal (CN) patient is highlighted in red on the left plot (ground truth) and compared with cells identified by CellCnn, which are marked in red on the right plot. (b) Similar setting as (a) for a spiked-in subset of blast cells from a core-binding-factor translocation (CBF) patient. (c,d) Similar settings as (a,b) for spiked-in subsets of blast cells with even lower frequency (0.01%). (e) Histograms of selected cell surface markers for the disease-associated cell populations identified by CellCnn. The markers presented highlight the differences of blast cell immunophenotypic profiles between CBF and CN patients. CBF, core binding factor translocation; CN, cytogenetically normal.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5384229&req=5

f4: Identification of in silico spike-in rare leukaemic blast populations for two AML subgroups.(a) The spiked-in subset (frequency=0.1%) of blast cells from a cytogenetically normal (CN) patient is highlighted in red on the left plot (ground truth) and compared with cells identified by CellCnn, which are marked in red on the right plot. (b) Similar setting as (a) for a spiked-in subset of blast cells from a core-binding-factor translocation (CBF) patient. (c,d) Similar settings as (a,b) for spiked-in subsets of blast cells with even lower frequency (0.01%). (e) Histograms of selected cell surface markers for the disease-associated cell populations identified by CellCnn. The markers presented highlight the differences of blast cell immunophenotypic profiles between CBF and CN patients. CBF, core binding factor translocation; CN, cytogenetically normal.
Mentions: We next assessed the scope of CellCnn to detect extremely rare cell populations associated with MRD in acute myeloid leukaemia (AML). Specifically, we analysed mass cytometry data sets of healthy bone marrow samples with in silico leukaemic blast spike-in subpopulations of decreasing frequency to mimic the MRD phenotype19. To objectively compare CellCnn with existing methods with respect to detecting rare phenotype-associated cell populations, we assembled a benchmark data set with clearly defined training/validation and test samples (see Data sets in Methods section). Spike-ins from patients characterized as cytogenetically normal (CN), as well as from patients with core-binding factor translocation [t(8;21) or inv(16)] (CBF) were considered. CellCnn was trained on the three-class classification problem of sample stratification as healthy, CN AML or CBF AML and correctly identified the leukaemic blast subsets in the test samples (not used for training) at a frequency as low as 0.1% (500/500,000 blast/total cells) (Fig. 4a,b). We found that the predictive subsets for the AML subgroups shared differentially abundant markers (CD34, CD45, CD44) but also exhibited several differences (Fig. 4e). For instance, CN AML blasts were CD7+, CD38+, CD117+, whereas CBF AML blasts were CD15+, CD38mid. These results are in accordance with the findings presented in the original study19.

View Article: PubMed Central - PubMed

ABSTRACT

Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.

No MeSH data available.