Limits...
Application of independent component analysis to microarrays.

Lee SI, Batzoglou S - Genome Biol. (2003)

Bottom Line: We apply linear and nonlinear independent component analysis (ICA) to project microarray data into statistically independent components that correspond to putative biological processes, and to cluster genes according to over- or under-expression in each component.We test the statistical significance of enrichment of gene annotations within clusters.ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Stanford University, Stanford, CA94305-9010, USA.

ABSTRACT
We apply linear and nonlinear independent component analysis (ICA) to project microarray data into statistically independent components that correspond to putative biological processes, and to cluster genes according to over- or under-expression in each component. We test the statistical significance of enrichment of gene annotations within clusters. ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.

Show MeSH

Related in: MedlinePlus

Three independent components of the human normal tissue data (dataset 5). Each gene is mapped to a point based on the value assigned to the gene in the 14th (x-axis), 15th (y-axis) and 55th (z-axis) independent components, which are enriched with liver-specific (red), muscle-specific (orange), and vulva-specific (green) genes, respectively. Genes not annotated as liver-, muscle- or vulva-specific are colored yellow.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC329130&req=5

Figure 3: Three independent components of the human normal tissue data (dataset 5). Each gene is mapped to a point based on the value assigned to the gene in the 14th (x-axis), 15th (y-axis) and 55th (z-axis) independent components, which are enriched with liver-specific (red), muscle-specific (orange), and vulva-specific (green) genes, respectively. Genes not annotated as liver-, muscle- or vulva-specific are colored yellow.

Mentions: Misra et al. [19] applied PCA to dataset 5 of 7,070 genes in 19 kinds of human normal tissue (containing 59 microarray experiments) produced by Hsiao et al. [45] available at [53]. The dataset they used contains 40 experiments; 19 additional microarray experiments have been performed subsequently by Hsiao et al. [45]. After applying PCA and a filtering method, Misra et al. [19] obtained 425 genes upon which they reapplied PCA and plotted a scatter plot with loadings (expression levels) of these genes in the two most dominant principal components (eigenarrays). By visual inspection they observed three linear clusters on the resulting two-dimensional plot, enriched for liver-specific, brain-specific and muscle-specific genes, respectively (no p values were provided), as annotated by Hsiao et al. [45]. We removed three experiments that made the expression matrix X to be nearly singular, and applied ICA on the remaining 56 experiments, resulting in 56 independent components. We generated 112 clusters using our default clustering parameter (C = 7.5%), and measured the enrichment of each of the seven tissue-specific categories annotated by Hsiao et al. [45] within each cluster. The three most significant independent components were enriched for liver-specific, muscle-specific and vulva-specific genes with p values of 10-133, 10-127 and 10-101, respectively. The fourth most significant cluster was brain-specific (p value = 10-86). In the ICA liver cluster, 214 genes were liver-specific (out of a total of 293), as compared with the 23 liver-specific genes identified by Misra et al. [19]. The ICA muscle cluster of 258 genes contains 211 muscle-specific genes compared to 19 muscle-specific genes identified by Misra et al. [19]. The ICA brain cluster consisting of 277 genes contains 258 brain-specific genes compared to 19 brain-specific genes identified by Misra et al. [19]. We generated a three-dimensional scatter plot of the coefficients of all genes annotated by Hsiao et al. [45] on the three most significant ICA components (Figure 3). We observe that the liver-specific, muscle-specific and vulva-specific genes are strongly biased to lie on the x-, y- and z-axes of the plot, respectively.


Application of independent component analysis to microarrays.

Lee SI, Batzoglou S - Genome Biol. (2003)

Three independent components of the human normal tissue data (dataset 5). Each gene is mapped to a point based on the value assigned to the gene in the 14th (x-axis), 15th (y-axis) and 55th (z-axis) independent components, which are enriched with liver-specific (red), muscle-specific (orange), and vulva-specific (green) genes, respectively. Genes not annotated as liver-, muscle- or vulva-specific are colored yellow.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC329130&req=5

Figure 3: Three independent components of the human normal tissue data (dataset 5). Each gene is mapped to a point based on the value assigned to the gene in the 14th (x-axis), 15th (y-axis) and 55th (z-axis) independent components, which are enriched with liver-specific (red), muscle-specific (orange), and vulva-specific (green) genes, respectively. Genes not annotated as liver-, muscle- or vulva-specific are colored yellow.
Mentions: Misra et al. [19] applied PCA to dataset 5 of 7,070 genes in 19 kinds of human normal tissue (containing 59 microarray experiments) produced by Hsiao et al. [45] available at [53]. The dataset they used contains 40 experiments; 19 additional microarray experiments have been performed subsequently by Hsiao et al. [45]. After applying PCA and a filtering method, Misra et al. [19] obtained 425 genes upon which they reapplied PCA and plotted a scatter plot with loadings (expression levels) of these genes in the two most dominant principal components (eigenarrays). By visual inspection they observed three linear clusters on the resulting two-dimensional plot, enriched for liver-specific, brain-specific and muscle-specific genes, respectively (no p values were provided), as annotated by Hsiao et al. [45]. We removed three experiments that made the expression matrix X to be nearly singular, and applied ICA on the remaining 56 experiments, resulting in 56 independent components. We generated 112 clusters using our default clustering parameter (C = 7.5%), and measured the enrichment of each of the seven tissue-specific categories annotated by Hsiao et al. [45] within each cluster. The three most significant independent components were enriched for liver-specific, muscle-specific and vulva-specific genes with p values of 10-133, 10-127 and 10-101, respectively. The fourth most significant cluster was brain-specific (p value = 10-86). In the ICA liver cluster, 214 genes were liver-specific (out of a total of 293), as compared with the 23 liver-specific genes identified by Misra et al. [19]. The ICA muscle cluster of 258 genes contains 211 muscle-specific genes compared to 19 muscle-specific genes identified by Misra et al. [19]. The ICA brain cluster consisting of 277 genes contains 258 brain-specific genes compared to 19 brain-specific genes identified by Misra et al. [19]. We generated a three-dimensional scatter plot of the coefficients of all genes annotated by Hsiao et al. [45] on the three most significant ICA components (Figure 3). We observe that the liver-specific, muscle-specific and vulva-specific genes are strongly biased to lie on the x-, y- and z-axes of the plot, respectively.

Bottom Line: We apply linear and nonlinear independent component analysis (ICA) to project microarray data into statistically independent components that correspond to putative biological processes, and to cluster genes according to over- or under-expression in each component.We test the statistical significance of enrichment of gene annotations within clusters.ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Stanford University, Stanford, CA94305-9010, USA.

ABSTRACT
We apply linear and nonlinear independent component analysis (ICA) to project microarray data into statistically independent components that correspond to putative biological processes, and to cluster genes according to over- or under-expression in each component. We test the statistical significance of enrichment of gene annotations within clusters. ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.

Show MeSH
Related in: MedlinePlus