Limits...
ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis.

Pierson E, Yau C - Genome Biol. (2015)

Bottom Line: Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level.Dimensionality reduction of such high-dimensional data sets is essential for visualization and analysis, but single-cell RNA-seq data are challenging for classical dimensionality-reduction methods because of the prevalence of dropout events, which lead to zero-inflated data.Here, we develop a dimensionality-reduction method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and show that it improves modeling accuracy on simulated and biological data sets.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, Oxford, UK. emma.pierson@st-annes.ox.ac.uk.

ABSTRACT
Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level. Dimensionality reduction of such high-dimensional data sets is essential for visualization and analysis, but single-cell RNA-seq data are challenging for classical dimensionality-reduction methods because of the prevalence of dropout events, which lead to zero-inflated data. Here, we develop a dimensionality-reduction method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and show that it improves modeling accuracy on simulated and biological data sets.

Show MeSH
Consistency of cell-to-cell distances. Box plots showing the correlation between distance matrices for PPCA and ZIFA from 100 gene sets selected at random from (a) differentiating T cells [3], (b) 11 populations [15], (c) myoblasts [5] and (d) bone marrow [14]. The distance matrices produced by ZIFA are more correlated with each other than are the distance matrices produced by PPCA
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4630968&req=5

Fig4: Consistency of cell-to-cell distances. Box plots showing the correlation between distance matrices for PPCA and ZIFA from 100 gene sets selected at random from (a) differentiating T cells [3], (b) 11 populations [15], (c) myoblasts [5] and (d) bone marrow [14]. The distance matrices produced by ZIFA are more correlated with each other than are the distance matrices produced by PPCA

Mentions: We further assessed whether the low-dimensional projections by ZIFA were more consistent than those of PPCA. For the four data sets, we repeated the following procedure 100 times: we sampled 100 genes at random, ran ZIFA or PPCA, and computed the pairwise distances between points in the low-dimensional space. This yielded 100 distance matrices, one for each iterate. We computed the Spearman correlation between each pair of distance matrices (for a total of 100×99/2 correlations) and recorded the average Spearman correlation for both ZIFA and PPCA. Figure 4 shows the distribution of the Spearman correlations for ZIFA and PPCA on the four data sets. Overall, the distance matrices produced by ZIFA were more correlated with each other than those produced by PPCA, indicating that the ZIFA distance matrices are more consistent across random iterates as ZIFA’s performance is less dependent on the number of dropout events present in the data.Fig. 4


ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis.

Pierson E, Yau C - Genome Biol. (2015)

Consistency of cell-to-cell distances. Box plots showing the correlation between distance matrices for PPCA and ZIFA from 100 gene sets selected at random from (a) differentiating T cells [3], (b) 11 populations [15], (c) myoblasts [5] and (d) bone marrow [14]. The distance matrices produced by ZIFA are more correlated with each other than are the distance matrices produced by PPCA
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4630968&req=5

Fig4: Consistency of cell-to-cell distances. Box plots showing the correlation between distance matrices for PPCA and ZIFA from 100 gene sets selected at random from (a) differentiating T cells [3], (b) 11 populations [15], (c) myoblasts [5] and (d) bone marrow [14]. The distance matrices produced by ZIFA are more correlated with each other than are the distance matrices produced by PPCA
Mentions: We further assessed whether the low-dimensional projections by ZIFA were more consistent than those of PPCA. For the four data sets, we repeated the following procedure 100 times: we sampled 100 genes at random, ran ZIFA or PPCA, and computed the pairwise distances between points in the low-dimensional space. This yielded 100 distance matrices, one for each iterate. We computed the Spearman correlation between each pair of distance matrices (for a total of 100×99/2 correlations) and recorded the average Spearman correlation for both ZIFA and PPCA. Figure 4 shows the distribution of the Spearman correlations for ZIFA and PPCA on the four data sets. Overall, the distance matrices produced by ZIFA were more correlated with each other than those produced by PPCA, indicating that the ZIFA distance matrices are more consistent across random iterates as ZIFA’s performance is less dependent on the number of dropout events present in the data.Fig. 4

Bottom Line: Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level.Dimensionality reduction of such high-dimensional data sets is essential for visualization and analysis, but single-cell RNA-seq data are challenging for classical dimensionality-reduction methods because of the prevalence of dropout events, which lead to zero-inflated data.Here, we develop a dimensionality-reduction method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and show that it improves modeling accuracy on simulated and biological data sets.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, Oxford, UK. emma.pierson@st-annes.ox.ac.uk.

ABSTRACT
Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level. Dimensionality reduction of such high-dimensional data sets is essential for visualization and analysis, but single-cell RNA-seq data are challenging for classical dimensionality-reduction methods because of the prevalence of dropout events, which lead to zero-inflated data. Here, we develop a dimensionality-reduction method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and show that it improves modeling accuracy on simulated and biological data sets.

Show MeSH