Limits...
Vertical flow array chips reliably identify cell types from single-cell mRNA sequencing experiments

View Article: PubMed Central - PubMed

ABSTRACT

Single-cell mRNA sequencing offers an unbiased approach to dissecting cell types as functional units in multicellular tissues. However, highly reliable cell typing based on single-cell gene expression analysis remains challenging because of the lack of methods for efficient sample preparation for high-throughput sequencing and evaluating the statistical reliability of the acquired cell types. Here, we present a highly efficient nucleic reaction chip (a vertical flow array chip (VFAC)) that uses porous materials to reduce measurement noise and improve throughput without a substantial increase in reagent. We also present a probabilistic evaluation method for cell typing depending on the amount of measurement noise. Applying the VFACs to 2580 monocytes provides 1967 single-cell expressions for 47 genes, including low-expression genes such as transcription factors. The statistical method can distinguish two cell types with probabilistic quality values, with the measurement noise level being considered for the first time. This approach enables the identification of various sub-types of cells in tissues and provides a foundation for subsequent analyses.

No MeSH data available.


Related in: MedlinePlus

Validation of the statistical method for generated data.(a–c) Distribution of the generated data with various distances between the clusters; the distances are 2, 4, and 8 times the standard deviation of the cluster. There are 1000 total data points. CS = 0.9, Σms = 0.5 SD, and dimensions = 2. The definition of the position of the cluster is shown in Supplemental Figure 1. (d–f) negative BIC, silhouette and pq-values for the three generated data sets against predetermined numbers of clusters. The pq-value results for a different set of parameters are shown in Supplementary Figure 3. (g) The pq-values for various cluster distances. The maximum pq-value occurs for K = 2 when the distance between the clusters is larger than 3 SD. (h) The standard deviation of the pq-values decreases as the number of cells increases. (i) An increase in measurement noise reduces the resolution of the clustering and reduces the optimum number of clusters from K = 2 to K = 1. The red curve represents K = 2, and the blue curve represents K = 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5120284&req=5

f4: Validation of the statistical method for generated data.(a–c) Distribution of the generated data with various distances between the clusters; the distances are 2, 4, and 8 times the standard deviation of the cluster. There are 1000 total data points. CS = 0.9, Σms = 0.5 SD, and dimensions = 2. The definition of the position of the cluster is shown in Supplemental Figure 1. (d–f) negative BIC, silhouette and pq-values for the three generated data sets against predetermined numbers of clusters. The pq-value results for a different set of parameters are shown in Supplementary Figure 3. (g) The pq-values for various cluster distances. The maximum pq-value occurs for K = 2 when the distance between the clusters is larger than 3 SD. (h) The standard deviation of the pq-values decreases as the number of cells increases. (i) An increase in measurement noise reduces the resolution of the clustering and reduces the optimum number of clusters from K = 2 to K = 1. The red curve represents K = 2, and the blue curve represents K = 1.

Mentions: We assessed the performance of our statistical method for data generated using a mixture of Gaussian distributions with various parameters, including the number of genes (n = 2, 10, 20, 50), the number of data points (N = 100–1000), and the distance between clusters (0–8 times the standard deviation (SD) of a cluster), by comparing the optimum number of clusters with the results of other conventional methods, including the Gaussian mixture model with the negative signed BIC343740 (which is (−1) × BIC and represented as negative BIC) and silhouette index36 (Fig. 4, Supplemental Figures 1 and 2). The parameters for calculating the pq-values included the measurement noise (Σme = 0.5, 1.0, 1.5, 2.0, 2.5 × SD), cluster size (CS = 90%) and number of clusters (K = 1–5). As shown in Fig. 4, the numbers of clusters for maximum pq-values when the measurement noise is less than 1.0 × SD were consistent with the numbers for negative BICs in various numbers of genes and of data points. For larger measurement noise, a single cluster was selected in the pq-value method as expected. The apparent discrepancy in the optimum number of clusters between negative BICs and pq-values when the distance between two clusters is small is related to the rate of improperly assigned cell data to a cluster, which can be evaluated as the fraction of overlapping cluster distributions (Supplementary Fig. 3). Because pq-value method false assignment rate is controlled by the parameter CS, the discrepancies should come from differences in the standard of discrimination of neighboring clusters between the BIC and pq-value. The robustness of pq-values against skewing of the data distribution from the Gaussian distribution model also makes it different from the BIC (Supplementary Figure 4), as the pq-value does not rely on fitting the data to a model distribution.


Vertical flow array chips reliably identify cell types from single-cell mRNA sequencing experiments
Validation of the statistical method for generated data.(a–c) Distribution of the generated data with various distances between the clusters; the distances are 2, 4, and 8 times the standard deviation of the cluster. There are 1000 total data points. CS = 0.9, Σms = 0.5 SD, and dimensions = 2. The definition of the position of the cluster is shown in Supplemental Figure 1. (d–f) negative BIC, silhouette and pq-values for the three generated data sets against predetermined numbers of clusters. The pq-value results for a different set of parameters are shown in Supplementary Figure 3. (g) The pq-values for various cluster distances. The maximum pq-value occurs for K = 2 when the distance between the clusters is larger than 3 SD. (h) The standard deviation of the pq-values decreases as the number of cells increases. (i) An increase in measurement noise reduces the resolution of the clustering and reduces the optimum number of clusters from K = 2 to K = 1. The red curve represents K = 2, and the blue curve represents K = 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5120284&req=5

f4: Validation of the statistical method for generated data.(a–c) Distribution of the generated data with various distances between the clusters; the distances are 2, 4, and 8 times the standard deviation of the cluster. There are 1000 total data points. CS = 0.9, Σms = 0.5 SD, and dimensions = 2. The definition of the position of the cluster is shown in Supplemental Figure 1. (d–f) negative BIC, silhouette and pq-values for the three generated data sets against predetermined numbers of clusters. The pq-value results for a different set of parameters are shown in Supplementary Figure 3. (g) The pq-values for various cluster distances. The maximum pq-value occurs for K = 2 when the distance between the clusters is larger than 3 SD. (h) The standard deviation of the pq-values decreases as the number of cells increases. (i) An increase in measurement noise reduces the resolution of the clustering and reduces the optimum number of clusters from K = 2 to K = 1. The red curve represents K = 2, and the blue curve represents K = 1.
Mentions: We assessed the performance of our statistical method for data generated using a mixture of Gaussian distributions with various parameters, including the number of genes (n = 2, 10, 20, 50), the number of data points (N = 100–1000), and the distance between clusters (0–8 times the standard deviation (SD) of a cluster), by comparing the optimum number of clusters with the results of other conventional methods, including the Gaussian mixture model with the negative signed BIC343740 (which is (−1) × BIC and represented as negative BIC) and silhouette index36 (Fig. 4, Supplemental Figures 1 and 2). The parameters for calculating the pq-values included the measurement noise (Σme = 0.5, 1.0, 1.5, 2.0, 2.5 × SD), cluster size (CS = 90%) and number of clusters (K = 1–5). As shown in Fig. 4, the numbers of clusters for maximum pq-values when the measurement noise is less than 1.0 × SD were consistent with the numbers for negative BICs in various numbers of genes and of data points. For larger measurement noise, a single cluster was selected in the pq-value method as expected. The apparent discrepancy in the optimum number of clusters between negative BICs and pq-values when the distance between two clusters is small is related to the rate of improperly assigned cell data to a cluster, which can be evaluated as the fraction of overlapping cluster distributions (Supplementary Fig. 3). Because pq-value method false assignment rate is controlled by the parameter CS, the discrepancies should come from differences in the standard of discrimination of neighboring clusters between the BIC and pq-value. The robustness of pq-values against skewing of the data distribution from the Gaussian distribution model also makes it different from the BIC (Supplementary Figure 4), as the pq-value does not rely on fitting the data to a model distribution.

View Article: PubMed Central - PubMed

ABSTRACT

Single-cell mRNA sequencing offers an unbiased approach to dissecting cell types as functional units in multicellular tissues. However, highly reliable cell typing based on single-cell gene expression analysis remains challenging because of the lack of methods for efficient sample preparation for high-throughput sequencing and evaluating the statistical reliability of the acquired cell types. Here, we present a highly efficient nucleic reaction chip (a vertical flow array chip (VFAC)) that uses porous materials to reduce measurement noise and improve throughput without a substantial increase in reagent. We also present a probabilistic evaluation method for cell typing depending on the amount of measurement noise. Applying the VFACs to 2580 monocytes provides 1967 single-cell expressions for 47 genes, including low-expression genes such as transcription factors. The statistical method can distinguish two cell types with probabilistic quality values, with the measurement noise level being considered for the first time. This approach enables the identification of various sub-types of cells in tissues and provides a foundation for subsequent analyses.

No MeSH data available.


Related in: MedlinePlus