Limits...
CTen: a web-based platform for identifying enriched cell types from heterogeneous microarray data.

Shoemaker JE, Lopes TJ, Ghosh S, Matsuoka Y, Kawaoka Y, Kitano H - BMC Genomics (2012)

Bottom Line: The web interface is designed for differential expression and gene clustering studies, and the enrichment results are presented as heatmaps or downloadable text files.In this work, we use an independent, cell-specific gene expression data set to assess CTen's performance in accurately identifying the appropriate cell type and provide insight into the suggested level of enrichment to optimally minimize the number of false discoveries.We show that CTen, when applied to microarray data developed from infected lung tissue, can correctly identify the cell signatures of key lymphocytes in a highly heterogeneous environment and compare its performance to another popular bioinformatics tool.

View Article: PubMed Central - HTML - PubMed

Affiliation: JST ERATO KAWAOKA Infection-induced Host Responses Project, Tokyo, Japan. jshoe@ims.u-tokyo.ac.jp

ABSTRACT

Background: Interpreting in vivo sampled microarray data is often complicated by changes in the cell population demographics. To put gene expression into its proper biological context, it is necessary to distinguish differential gene transcription from artificial gene expression induced by changes in the cellular demographics.

Results: CTen (cell type enrichment) is a web-based analytical tool which uses our highly expressed, cell specific (HECS) gene database to identify enriched cell types in heterogeneous microarray data. The web interface is designed for differential expression and gene clustering studies, and the enrichment results are presented as heatmaps or downloadable text files.

Conclusions: In this work, we use an independent, cell-specific gene expression data set to assess CTen's performance in accurately identifying the appropriate cell type and provide insight into the suggested level of enrichment to optimally minimize the number of false discoveries. We show that CTen, when applied to microarray data developed from infected lung tissue, can correctly identify the cell signatures of key lymphocytes in a highly heterogeneous environment and compare its performance to another popular bioinformatics tool. Furthermore, we discuss the strong implications cell type enrichment has in the design of effective microarray workflow strategies and show that, by combining CTen with gene expression clustering, we may be able to determine the relative changes in the number of key cell types.CTen is available at http://www.influenza-x.org/~jshoemaker/cten/

Show MeSH
The effect of threshold selection on the number and uniqueness of HECS genes. The distributions of the number of HECS genes per cell type as the threshold criteria used to define a HECS gene is raised from 2x to 25x the median expression value across all cell types for the (A) mouse and (B) human gene expression data. To quantify uniqueness, we determined the percentage of HECS genes that were mapped to n or fewer cell types (i.e., the cumulative %) for the (C) mouse and (D) human gene expression data for different threshold values. The results corresponding to the threshold values selected in the current implementation of CTen are colored blue and orange for the mouse and human data, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3473317&req=5

Figure 2: The effect of threshold selection on the number and uniqueness of HECS genes. The distributions of the number of HECS genes per cell type as the threshold criteria used to define a HECS gene is raised from 2x to 25x the median expression value across all cell types for the (A) mouse and (B) human gene expression data. To quantify uniqueness, we determined the percentage of HECS genes that were mapped to n or fewer cell types (i.e., the cumulative %) for the (C) mouse and (D) human gene expression data for different threshold values. The results corresponding to the threshold values selected in the current implementation of CTen are colored blue and orange for the mouse and human data, respectively.

Mentions: Importantly, as stated above, preset cutoffs were used in developing the mouse and human HECS databases. These cutoffs (15x and 10x the median expression level for a probe across all cell types) were selected to balance the quantity of genes with the uniqueness of the genes assigned to each cell type. Uniqueness was quantified by determining the percentage of genes identified as a HECS gene for n or fewer cell types. As seen in Figure 2A-B, raising the cutoff caused a sharp reduction in the number of genes but significantly improved the uniqueness (Figure 2C-D) of the genes assigned as HECS genes to each cell type. Increasing the cutoff for the mouse data beyond 15x did not significantly improve uniqueness and only served to limit the number of HECS genes per cell type to act as cell signatures. For the cutoffs considered for the human data, a cutoff of 15x slightly improves the uniqueness but the number of HECS genes per cell type became prohibitively small. Thus, the HECS expression threshold requirement was reduced to 10x the median expression value in the human dataset to ensure that all cell types are represented.


CTen: a web-based platform for identifying enriched cell types from heterogeneous microarray data.

Shoemaker JE, Lopes TJ, Ghosh S, Matsuoka Y, Kawaoka Y, Kitano H - BMC Genomics (2012)

The effect of threshold selection on the number and uniqueness of HECS genes. The distributions of the number of HECS genes per cell type as the threshold criteria used to define a HECS gene is raised from 2x to 25x the median expression value across all cell types for the (A) mouse and (B) human gene expression data. To quantify uniqueness, we determined the percentage of HECS genes that were mapped to n or fewer cell types (i.e., the cumulative %) for the (C) mouse and (D) human gene expression data for different threshold values. The results corresponding to the threshold values selected in the current implementation of CTen are colored blue and orange for the mouse and human data, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3473317&req=5

Figure 2: The effect of threshold selection on the number and uniqueness of HECS genes. The distributions of the number of HECS genes per cell type as the threshold criteria used to define a HECS gene is raised from 2x to 25x the median expression value across all cell types for the (A) mouse and (B) human gene expression data. To quantify uniqueness, we determined the percentage of HECS genes that were mapped to n or fewer cell types (i.e., the cumulative %) for the (C) mouse and (D) human gene expression data for different threshold values. The results corresponding to the threshold values selected in the current implementation of CTen are colored blue and orange for the mouse and human data, respectively.
Mentions: Importantly, as stated above, preset cutoffs were used in developing the mouse and human HECS databases. These cutoffs (15x and 10x the median expression level for a probe across all cell types) were selected to balance the quantity of genes with the uniqueness of the genes assigned to each cell type. Uniqueness was quantified by determining the percentage of genes identified as a HECS gene for n or fewer cell types. As seen in Figure 2A-B, raising the cutoff caused a sharp reduction in the number of genes but significantly improved the uniqueness (Figure 2C-D) of the genes assigned as HECS genes to each cell type. Increasing the cutoff for the mouse data beyond 15x did not significantly improve uniqueness and only served to limit the number of HECS genes per cell type to act as cell signatures. For the cutoffs considered for the human data, a cutoff of 15x slightly improves the uniqueness but the number of HECS genes per cell type became prohibitively small. Thus, the HECS expression threshold requirement was reduced to 10x the median expression value in the human dataset to ensure that all cell types are represented.

Bottom Line: The web interface is designed for differential expression and gene clustering studies, and the enrichment results are presented as heatmaps or downloadable text files.In this work, we use an independent, cell-specific gene expression data set to assess CTen's performance in accurately identifying the appropriate cell type and provide insight into the suggested level of enrichment to optimally minimize the number of false discoveries.We show that CTen, when applied to microarray data developed from infected lung tissue, can correctly identify the cell signatures of key lymphocytes in a highly heterogeneous environment and compare its performance to another popular bioinformatics tool.

View Article: PubMed Central - HTML - PubMed

Affiliation: JST ERATO KAWAOKA Infection-induced Host Responses Project, Tokyo, Japan. jshoe@ims.u-tokyo.ac.jp

ABSTRACT

Background: Interpreting in vivo sampled microarray data is often complicated by changes in the cell population demographics. To put gene expression into its proper biological context, it is necessary to distinguish differential gene transcription from artificial gene expression induced by changes in the cellular demographics.

Results: CTen (cell type enrichment) is a web-based analytical tool which uses our highly expressed, cell specific (HECS) gene database to identify enriched cell types in heterogeneous microarray data. The web interface is designed for differential expression and gene clustering studies, and the enrichment results are presented as heatmaps or downloadable text files.

Conclusions: In this work, we use an independent, cell-specific gene expression data set to assess CTen's performance in accurately identifying the appropriate cell type and provide insight into the suggested level of enrichment to optimally minimize the number of false discoveries. We show that CTen, when applied to microarray data developed from infected lung tissue, can correctly identify the cell signatures of key lymphocytes in a highly heterogeneous environment and compare its performance to another popular bioinformatics tool. Furthermore, we discuss the strong implications cell type enrichment has in the design of effective microarray workflow strategies and show that, by combining CTen with gene expression clustering, we may be able to determine the relative changes in the number of key cell types.CTen is available at http://www.influenza-x.org/~jshoemaker/cten/

Show MeSH