Limits...
Sensitive detection of rare disease-associated cell subsets via representation learning

View Article: PubMed Central - PubMed

ABSTRACT

Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.

No MeSH data available.


Related in: MedlinePlus

CellCnn overview and demonstration.(a) CellCnn convolutional neural network architecture. CellCnn takes as input groups of single-cell measurements (multi-cell inputs), where each group is annotated with a phenotype. Node activities in the convolutional layer are defined as weighted sums over single-cell molecular profiles. Nodes in the pooling layer evaluate the presence (max pooling) or frequency (mean pooling) of specific cell subsets. The output of the network estimates the sample-associated phenotype (e.g., disease condition, expected survival). Network training optimizes weights to match training data set phenotype. Trained filter weights correspond to molecular profiles of relevant cell subsets and allow for assignment of the cell subset membership of individual cells (cell-filter response). (b) Illustration of cell-filter response computations for individual cells. For instance, marker profiles of cell 1 and 3 exhibit perfect/no match with weights of filter 1/2 and therefore result in a high/low (red/blue) cell-filter response. (c) CellCnn classification of GM-CSF (un-) stimulated peripheral blood mononuclear cell populations monitored with mass cytometry. t-SNE28 projection based on all cell type-defining surface markers (not used by CellCnn), coloured by cell-filter response. (d) Density-based clustering of high cell filter-response regions using all cell-type-defining surface markers reveals two distinct cell types, namely monocytes (CD33+) and dendritic cells (CD123+). (e) Histograms of the signalling markers (used by CellCnn) showing greatest differential abundance in terms of the Kolmogorov–Smirnov two-sample test between the whole-cell population and the selected cell subsets. (f) Response of individual cells (grouped by manually gated cell types) is shown for both conditions. Significantly higher cell-filter response for monocytes and dendritic cells in the stimulated sample.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5384229&req=5

f1: CellCnn overview and demonstration.(a) CellCnn convolutional neural network architecture. CellCnn takes as input groups of single-cell measurements (multi-cell inputs), where each group is annotated with a phenotype. Node activities in the convolutional layer are defined as weighted sums over single-cell molecular profiles. Nodes in the pooling layer evaluate the presence (max pooling) or frequency (mean pooling) of specific cell subsets. The output of the network estimates the sample-associated phenotype (e.g., disease condition, expected survival). Network training optimizes weights to match training data set phenotype. Trained filter weights correspond to molecular profiles of relevant cell subsets and allow for assignment of the cell subset membership of individual cells (cell-filter response). (b) Illustration of cell-filter response computations for individual cells. For instance, marker profiles of cell 1 and 3 exhibit perfect/no match with weights of filter 1/2 and therefore result in a high/low (red/blue) cell-filter response. (c) CellCnn classification of GM-CSF (un-) stimulated peripheral blood mononuclear cell populations monitored with mass cytometry. t-SNE28 projection based on all cell type-defining surface markers (not used by CellCnn), coloured by cell-filter response. (d) Density-based clustering of high cell filter-response regions using all cell-type-defining surface markers reveals two distinct cell types, namely monocytes (CD33+) and dendritic cells (CD123+). (e) Histograms of the signalling markers (used by CellCnn) showing greatest differential abundance in terms of the Kolmogorov–Smirnov two-sample test between the whole-cell population and the selected cell subsets. (f) Response of individual cells (grouped by manually gated cell types) is shown for both conditions. Significantly higher cell-filter response for monocytes and dendritic cells in the stimulated sample.

Mentions: CellCnn takes as input a set of observations of cellular populations (multi-cell inputs) each associated with a phenotype, for example, patient blood or tissue samples with associated disease status or survival information. It is difficult to learn the molecular basis of this association since it possibly manifests itself by differences of a priori unknown cell subsets. To address this difficulty, CellCnn associates a multi-cell input with the considered phenotype by means of a convolutional neural network. The network automatically learns a concise cell population representation in terms of molecular profiles (filters) of individual cells whose presence or frequency is associated with a phenotype (Fig. 1a and see section Methods).


Sensitive detection of rare disease-associated cell subsets via representation learning
CellCnn overview and demonstration.(a) CellCnn convolutional neural network architecture. CellCnn takes as input groups of single-cell measurements (multi-cell inputs), where each group is annotated with a phenotype. Node activities in the convolutional layer are defined as weighted sums over single-cell molecular profiles. Nodes in the pooling layer evaluate the presence (max pooling) or frequency (mean pooling) of specific cell subsets. The output of the network estimates the sample-associated phenotype (e.g., disease condition, expected survival). Network training optimizes weights to match training data set phenotype. Trained filter weights correspond to molecular profiles of relevant cell subsets and allow for assignment of the cell subset membership of individual cells (cell-filter response). (b) Illustration of cell-filter response computations for individual cells. For instance, marker profiles of cell 1 and 3 exhibit perfect/no match with weights of filter 1/2 and therefore result in a high/low (red/blue) cell-filter response. (c) CellCnn classification of GM-CSF (un-) stimulated peripheral blood mononuclear cell populations monitored with mass cytometry. t-SNE28 projection based on all cell type-defining surface markers (not used by CellCnn), coloured by cell-filter response. (d) Density-based clustering of high cell filter-response regions using all cell-type-defining surface markers reveals two distinct cell types, namely monocytes (CD33+) and dendritic cells (CD123+). (e) Histograms of the signalling markers (used by CellCnn) showing greatest differential abundance in terms of the Kolmogorov–Smirnov two-sample test between the whole-cell population and the selected cell subsets. (f) Response of individual cells (grouped by manually gated cell types) is shown for both conditions. Significantly higher cell-filter response for monocytes and dendritic cells in the stimulated sample.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5384229&req=5

f1: CellCnn overview and demonstration.(a) CellCnn convolutional neural network architecture. CellCnn takes as input groups of single-cell measurements (multi-cell inputs), where each group is annotated with a phenotype. Node activities in the convolutional layer are defined as weighted sums over single-cell molecular profiles. Nodes in the pooling layer evaluate the presence (max pooling) or frequency (mean pooling) of specific cell subsets. The output of the network estimates the sample-associated phenotype (e.g., disease condition, expected survival). Network training optimizes weights to match training data set phenotype. Trained filter weights correspond to molecular profiles of relevant cell subsets and allow for assignment of the cell subset membership of individual cells (cell-filter response). (b) Illustration of cell-filter response computations for individual cells. For instance, marker profiles of cell 1 and 3 exhibit perfect/no match with weights of filter 1/2 and therefore result in a high/low (red/blue) cell-filter response. (c) CellCnn classification of GM-CSF (un-) stimulated peripheral blood mononuclear cell populations monitored with mass cytometry. t-SNE28 projection based on all cell type-defining surface markers (not used by CellCnn), coloured by cell-filter response. (d) Density-based clustering of high cell filter-response regions using all cell-type-defining surface markers reveals two distinct cell types, namely monocytes (CD33+) and dendritic cells (CD123+). (e) Histograms of the signalling markers (used by CellCnn) showing greatest differential abundance in terms of the Kolmogorov–Smirnov two-sample test between the whole-cell population and the selected cell subsets. (f) Response of individual cells (grouped by manually gated cell types) is shown for both conditions. Significantly higher cell-filter response for monocytes and dendritic cells in the stimulated sample.
Mentions: CellCnn takes as input a set of observations of cellular populations (multi-cell inputs) each associated with a phenotype, for example, patient blood or tissue samples with associated disease status or survival information. It is difficult to learn the molecular basis of this association since it possibly manifests itself by differences of a priori unknown cell subsets. To address this difficulty, CellCnn associates a multi-cell input with the considered phenotype by means of a convolutional neural network. The network automatically learns a concise cell population representation in terms of molecular profiles (filters) of individual cells whose presence or frequency is associated with a phenotype (Fig. 1a and see section Methods).

View Article: PubMed Central - PubMed

ABSTRACT

Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.

No MeSH data available.


Related in: MedlinePlus