Limits...
Sensitive detection of rare disease-associated cell subsets via representation learning

View Article: PubMed Central - PubMed

ABSTRACT

Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.

No MeSH data available.


CellCnn analysis of immune cell populations associated with AIDS onset in HIV patients.(a) Kaplan–Meier plots for high- and low-risk patient cohort according to CellCnn survival prediction (P=3.03e-03, log-rank test, computation time: 1 h, single laptop core) and state of the art: Citrus (P=2.97e-02, 3 days, 24 Intel Xeon cores). (b) Reconstruction of cell subsets predicting AIDS-free survival in HIV-infected patients. Cells selected by CellCnn filters are highlighted (in red) on the t-SNE map computed from all test samples. A distinct area is occupied by each selected subpopulation. Filters 1 and 2 are positively associated with survival, whereas filter 3 is negatively associated. Average frequency of the selected cell subsets in 10 test patients with lowest/highest survival times is reported. (c) Histograms of measured marker abundances for the whole-cell population and the selected cell subsets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5384229&req=5

f2: CellCnn analysis of immune cell populations associated with AIDS onset in HIV patients.(a) Kaplan–Meier plots for high- and low-risk patient cohort according to CellCnn survival prediction (P=3.03e-03, log-rank test, computation time: 1 h, single laptop core) and state of the art: Citrus (P=2.97e-02, 3 days, 24 Intel Xeon cores). (b) Reconstruction of cell subsets predicting AIDS-free survival in HIV-infected patients. Cells selected by CellCnn filters are highlighted (in red) on the t-SNE map computed from all test samples. A distinct area is occupied by each selected subpopulation. Filters 1 and 2 are positively associated with survival, whereas filter 3 is negatively associated. Average frequency of the selected cell subsets in 10 test patients with lowest/highest survival times is reported. (c) Histograms of measured marker abundances for the whole-cell population and the selected cell subsets.

Mentions: We used CellCnn to identify T-cell subsets associated with increased risk of AIDS onset in a cohort of 383 HIV-infected individuals16. Flow cytometry measurements of 10 T-cell-related molecular markers from peripheral blood and AIDS-free survival time were available for each individual. Trained on a subcohort of 256 individuals, CellCnn identified cell subsets with either elevated proliferation marker Ki67 or naive T-cell phenotype (Fig. 2b,c). The frequency of these cell subsets has been reported to be associated with AIDS-free survival in previous studies91017. CellCnn was further used to categorize the remaining set of 127 test individuals into a low- and high-risk group (see Methods). Kaplan–Meier curves of these groups are significantly different (P-value=3.03e-03, log-rank test; Fig. 2a). Citrus, a state-of-the-art approach to identifying clinically prognostic cell subsets9 achieved a less significant dissection of the two risk groups on the same training and test data partition (P-value=2.97e-02, Fig. 2a). CellCnn and Citrus identify the same strongly survival-associated cell populations (Supplementary Fig. 4). Furthermore, to assess the robustness of our approach, we reduced the size of the training cohort from 67 to 50% and 33% of the samples. In these more challenging settings, the stratification performance of CellCnn remained at equivalently high levels (Supplementary Fig. 3).


Sensitive detection of rare disease-associated cell subsets via representation learning
CellCnn analysis of immune cell populations associated with AIDS onset in HIV patients.(a) Kaplan–Meier plots for high- and low-risk patient cohort according to CellCnn survival prediction (P=3.03e-03, log-rank test, computation time: 1 h, single laptop core) and state of the art: Citrus (P=2.97e-02, 3 days, 24 Intel Xeon cores). (b) Reconstruction of cell subsets predicting AIDS-free survival in HIV-infected patients. Cells selected by CellCnn filters are highlighted (in red) on the t-SNE map computed from all test samples. A distinct area is occupied by each selected subpopulation. Filters 1 and 2 are positively associated with survival, whereas filter 3 is negatively associated. Average frequency of the selected cell subsets in 10 test patients with lowest/highest survival times is reported. (c) Histograms of measured marker abundances for the whole-cell population and the selected cell subsets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5384229&req=5

f2: CellCnn analysis of immune cell populations associated with AIDS onset in HIV patients.(a) Kaplan–Meier plots for high- and low-risk patient cohort according to CellCnn survival prediction (P=3.03e-03, log-rank test, computation time: 1 h, single laptop core) and state of the art: Citrus (P=2.97e-02, 3 days, 24 Intel Xeon cores). (b) Reconstruction of cell subsets predicting AIDS-free survival in HIV-infected patients. Cells selected by CellCnn filters are highlighted (in red) on the t-SNE map computed from all test samples. A distinct area is occupied by each selected subpopulation. Filters 1 and 2 are positively associated with survival, whereas filter 3 is negatively associated. Average frequency of the selected cell subsets in 10 test patients with lowest/highest survival times is reported. (c) Histograms of measured marker abundances for the whole-cell population and the selected cell subsets.
Mentions: We used CellCnn to identify T-cell subsets associated with increased risk of AIDS onset in a cohort of 383 HIV-infected individuals16. Flow cytometry measurements of 10 T-cell-related molecular markers from peripheral blood and AIDS-free survival time were available for each individual. Trained on a subcohort of 256 individuals, CellCnn identified cell subsets with either elevated proliferation marker Ki67 or naive T-cell phenotype (Fig. 2b,c). The frequency of these cell subsets has been reported to be associated with AIDS-free survival in previous studies91017. CellCnn was further used to categorize the remaining set of 127 test individuals into a low- and high-risk group (see Methods). Kaplan–Meier curves of these groups are significantly different (P-value=3.03e-03, log-rank test; Fig. 2a). Citrus, a state-of-the-art approach to identifying clinically prognostic cell subsets9 achieved a less significant dissection of the two risk groups on the same training and test data partition (P-value=2.97e-02, Fig. 2a). CellCnn and Citrus identify the same strongly survival-associated cell populations (Supplementary Fig. 4). Furthermore, to assess the robustness of our approach, we reduced the size of the training cohort from 67 to 50% and 33% of the samples. In these more challenging settings, the stratification performance of CellCnn remained at equivalently high levels (Supplementary Fig. 3).

View Article: PubMed Central - PubMed

ABSTRACT

Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.

No MeSH data available.