Limits...
Discovering biclusters in gene expression data based on high-dimensional linear geometries.

Gan X, Liew AW, Yan H - BMC Bioinformatics (2008)

Bottom Line: We have proposed a novel geometric interpretation of the biclustering problem.We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space.An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, King's College London, UK. xiang.gan@kcl.ac.uk

ABSTRACT

Background: In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns.

Results: In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes.

Conclusion: We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis.

Show MeSH

Related in: MedlinePlus

An illustrative example where conventional clustering fails but biclustering works: (a) A data matrix, which appears random visually even after hierarchical clustering. (b) A hidden pattern embedded in the data would be uncovered if we permute the rows or columns appropriately.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2386490&req=5

Figure 1: An illustrative example where conventional clustering fails but biclustering works: (a) A data matrix, which appears random visually even after hierarchical clustering. (b) A hidden pattern embedded in the data would be uncovered if we permute the rows or columns appropriately.

Mentions: When a subset of genes shares similar transcriptional characteristics only across a subset of measures, the conventional algorithm may fail to uncover useful information between them. In Fig. 1a, we see a data matrix clustered using the hierarchical clustering algorithm, where no coherent pattern can be observed by naked eyes. However, Fig. 1b indicates that an interesting pattern actually exists within the data if we rearrange the data appropriately.


Discovering biclusters in gene expression data based on high-dimensional linear geometries.

Gan X, Liew AW, Yan H - BMC Bioinformatics (2008)

An illustrative example where conventional clustering fails but biclustering works: (a) A data matrix, which appears random visually even after hierarchical clustering. (b) A hidden pattern embedded in the data would be uncovered if we permute the rows or columns appropriately.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2386490&req=5

Figure 1: An illustrative example where conventional clustering fails but biclustering works: (a) A data matrix, which appears random visually even after hierarchical clustering. (b) A hidden pattern embedded in the data would be uncovered if we permute the rows or columns appropriately.
Mentions: When a subset of genes shares similar transcriptional characteristics only across a subset of measures, the conventional algorithm may fail to uncover useful information between them. In Fig. 1a, we see a data matrix clustered using the hierarchical clustering algorithm, where no coherent pattern can be observed by naked eyes. However, Fig. 1b indicates that an interesting pattern actually exists within the data if we rearrange the data appropriately.

Bottom Line: We have proposed a novel geometric interpretation of the biclustering problem.We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space.An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, King's College London, UK. xiang.gan@kcl.ac.uk

ABSTRACT

Background: In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns.

Results: In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes.

Conclusion: We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis.

Show MeSH
Related in: MedlinePlus