Limits...
Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm.

Tchagang AB, Phan S, Famili F, Shearer H, Fobert P, Huang Y, Zou J, Huang D, Cutler A, Liu Z, Pan Y - BMC Bioinformatics (2012)

Bottom Line: To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Knowledge Discovery Group, Institute for Information Technology, National Research Council Canada, 1200 Montréal Road, Ottawa, ON K1A 0R6, Canada. alain.tchagang@nrc-cnrc.gc.ca

ABSTRACT

Background: Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.

Results: We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.

Conclusions: Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

Show MeSH

Related in: MedlinePlus

Example of Divergent Patterns. Example of genes that may be specific to the W, P, Z1, and Z2 respectively. The x-axis corresponds to the experimental time points and the y-axis to the expression level of the genes across the time series and in the four samples. Note that each sample corresponds to the column chart: W, P, Z1, and Z2 respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3376030&req=5

Figure 7: Example of Divergent Patterns. Example of genes that may be specific to the W, P, Z1, and Z2 respectively. The x-axis corresponds to the experimental time points and the y-axis to the expression level of the genes across the time series and in the four samples. Note that each sample corresponds to the column chart: W, P, Z1, and Z2 respectively.

Mentions: Given the N × M × L gene expression matrix, our goal is to identify the set of genes that are controlled by the TFs at a given time point, to study similarities and differences between them, and to infer a temporal transcriptional regulatory network controlling SAR in A. thaliana. OPTricluster generated 24-1 = 15 combinations of samples. Below we present some of the results obtained by OPTricluster. Figure 7 for example shows an example of divergent patterns. The expression levels of these genes are relatively unchanged (constant) in three samples and behave differently in one. For example, in the first row, significant changes are visible within WT, whereas the other three genotypes stay relatively constant (within threshold of ± 0.5) across the three time point. Clearly, since the expression level of these genes stay constant in three of the four experimental conditions and change considerably in only one of them, these genes may represent potential targets of the TFs tested in this study.


Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm.

Tchagang AB, Phan S, Famili F, Shearer H, Fobert P, Huang Y, Zou J, Huang D, Cutler A, Liu Z, Pan Y - BMC Bioinformatics (2012)

Example of Divergent Patterns. Example of genes that may be specific to the W, P, Z1, and Z2 respectively. The x-axis corresponds to the experimental time points and the y-axis to the expression level of the genes across the time series and in the four samples. Note that each sample corresponds to the column chart: W, P, Z1, and Z2 respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3376030&req=5

Figure 7: Example of Divergent Patterns. Example of genes that may be specific to the W, P, Z1, and Z2 respectively. The x-axis corresponds to the experimental time points and the y-axis to the expression level of the genes across the time series and in the four samples. Note that each sample corresponds to the column chart: W, P, Z1, and Z2 respectively.
Mentions: Given the N × M × L gene expression matrix, our goal is to identify the set of genes that are controlled by the TFs at a given time point, to study similarities and differences between them, and to infer a temporal transcriptional regulatory network controlling SAR in A. thaliana. OPTricluster generated 24-1 = 15 combinations of samples. Below we present some of the results obtained by OPTricluster. Figure 7 for example shows an example of divergent patterns. The expression levels of these genes are relatively unchanged (constant) in three samples and behave differently in one. For example, in the first row, significant changes are visible within WT, whereas the other three genotypes stay relatively constant (within threshold of ± 0.5) across the three time point. Clearly, since the expression level of these genes stay constant in three of the four experimental conditions and change considerably in only one of them, these genes may represent potential targets of the TFs tested in this study.

Bottom Line: To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Knowledge Discovery Group, Institute for Information Technology, National Research Council Canada, 1200 Montréal Road, Ottawa, ON K1A 0R6, Canada. alain.tchagang@nrc-cnrc.gc.ca

ABSTRACT

Background: Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.

Results: We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.

Conclusions: Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

Show MeSH
Related in: MedlinePlus