Limits...
Short time-series microarray analysis: methods and challenges.

Wang X, Wu M, Li Z, Chan C - BMC Syst Biol (2008)

Bottom Line: Current efforts have shown promise in improving the analysis of short time-series microarray data, although challenges remain.This commentary addresses recent advances in methods for short-time series analysis including simplification-based approaches and the integration of multi-source information.Nevertheless, further studies and development of computational methods are needed to provide practical solutions to fully exploit the potential of this data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Chemical Engineering and Material Science, Michigan State University, East Lansing, MI 48824, USA. xwang@egr.msu.edu

ABSTRACT
The detection and analysis of steady-state gene expression has become routine. Time-series microarrays are of growing interest to systems biologists for deciphering the dynamic nature and complex regulation of biosystems. Most temporal microarray data only contain a limited number of time points, giving rise to short-time-series data, which imposes challenges for traditional methods of extracting meaningful information. To obtain useful information from the wealth of short-time series data requires addressing the problems that arise due to limited sampling. Current efforts have shown promise in improving the analysis of short time-series microarray data, although challenges remain. This commentary addresses recent advances in methods for short-time series analysis including simplification-based approaches and the integration of multi-source information. Nevertheless, further studies and development of computational methods are needed to provide practical solutions to fully exploit the potential of this data.

Show MeSH
The general process of time-series expression analysis starts with data collection from microarray experiments. The data then undergoes pre-processing procedures, such as normalization and quality evaluation. Next data mining techniques are used to discover patterns or characteristics, identify related pathways or reconstruct systems network for biological processes from short-time series data. To address the limited sampling in short-time series data, two strategies are introduced in the general process of microarray analysis. Simplification strategies reduce the data to discrete representations based on trends or states with respect to time to achieve more interpretable and biologically meaningful clusters. Such conceptual discretization is part of the pre-processing step, prior to data mining. Incorporating multi-source information takes a different strategy. In this strategy multi-source data, including various omics databases and prior biological information, are collected and integrated to obtain a comprehensive dataset and enhance the information content. To minimize the heterogeneity of omics data from different experiments, standardization can and have been imposed on omics databases. Current standards for high-through-put database include MIAME, MIAPE, MSI, MIMIx. MIAME has been implemented with GEO and ArrayExpress microarray databases. The integration of various omics databases or prior biological information can enhance the effectiveness and efficiency of mining and interpretation of short-time series data to achieve biological discoveries. For example, multi-source prior biological information, i.e., prior noise-distribution has been proposed to enhance the performance of the data mining and network inference [43,44]. In addition, pathway and functional knowledge and metabolic data from different databases have also enhanced the clustering results and pathway identification [39-42]. These studies are discussed and referenced in the text.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2474593&req=5

Figure 1: The general process of time-series expression analysis starts with data collection from microarray experiments. The data then undergoes pre-processing procedures, such as normalization and quality evaluation. Next data mining techniques are used to discover patterns or characteristics, identify related pathways or reconstruct systems network for biological processes from short-time series data. To address the limited sampling in short-time series data, two strategies are introduced in the general process of microarray analysis. Simplification strategies reduce the data to discrete representations based on trends or states with respect to time to achieve more interpretable and biologically meaningful clusters. Such conceptual discretization is part of the pre-processing step, prior to data mining. Incorporating multi-source information takes a different strategy. In this strategy multi-source data, including various omics databases and prior biological information, are collected and integrated to obtain a comprehensive dataset and enhance the information content. To minimize the heterogeneity of omics data from different experiments, standardization can and have been imposed on omics databases. Current standards for high-through-put database include MIAME, MIAPE, MSI, MIMIx. MIAME has been implemented with GEO and ArrayExpress microarray databases. The integration of various omics databases or prior biological information can enhance the effectiveness and efficiency of mining and interpretation of short-time series data to achieve biological discoveries. For example, multi-source prior biological information, i.e., prior noise-distribution has been proposed to enhance the performance of the data mining and network inference [43,44]. In addition, pathway and functional knowledge and metabolic data from different databases have also enhanced the clustering results and pathway identification [39-42]. These studies are discussed and referenced in the text.

Mentions: Improving short time-series analysis requires addressing the problems that arise due to limited sampling. Recent efforts by investigators to overcome the difficulties associated with limited sampling include decreasing the complexity of continuous time-series data based on simplification strategies [29,30] or enriching the information content of the data by incorporating multi-source information [31,32], see Figure 1 for a summary of possible options.


Short time-series microarray analysis: methods and challenges.

Wang X, Wu M, Li Z, Chan C - BMC Syst Biol (2008)

The general process of time-series expression analysis starts with data collection from microarray experiments. The data then undergoes pre-processing procedures, such as normalization and quality evaluation. Next data mining techniques are used to discover patterns or characteristics, identify related pathways or reconstruct systems network for biological processes from short-time series data. To address the limited sampling in short-time series data, two strategies are introduced in the general process of microarray analysis. Simplification strategies reduce the data to discrete representations based on trends or states with respect to time to achieve more interpretable and biologically meaningful clusters. Such conceptual discretization is part of the pre-processing step, prior to data mining. Incorporating multi-source information takes a different strategy. In this strategy multi-source data, including various omics databases and prior biological information, are collected and integrated to obtain a comprehensive dataset and enhance the information content. To minimize the heterogeneity of omics data from different experiments, standardization can and have been imposed on omics databases. Current standards for high-through-put database include MIAME, MIAPE, MSI, MIMIx. MIAME has been implemented with GEO and ArrayExpress microarray databases. The integration of various omics databases or prior biological information can enhance the effectiveness and efficiency of mining and interpretation of short-time series data to achieve biological discoveries. For example, multi-source prior biological information, i.e., prior noise-distribution has been proposed to enhance the performance of the data mining and network inference [43,44]. In addition, pathway and functional knowledge and metabolic data from different databases have also enhanced the clustering results and pathway identification [39-42]. These studies are discussed and referenced in the text.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2474593&req=5

Figure 1: The general process of time-series expression analysis starts with data collection from microarray experiments. The data then undergoes pre-processing procedures, such as normalization and quality evaluation. Next data mining techniques are used to discover patterns or characteristics, identify related pathways or reconstruct systems network for biological processes from short-time series data. To address the limited sampling in short-time series data, two strategies are introduced in the general process of microarray analysis. Simplification strategies reduce the data to discrete representations based on trends or states with respect to time to achieve more interpretable and biologically meaningful clusters. Such conceptual discretization is part of the pre-processing step, prior to data mining. Incorporating multi-source information takes a different strategy. In this strategy multi-source data, including various omics databases and prior biological information, are collected and integrated to obtain a comprehensive dataset and enhance the information content. To minimize the heterogeneity of omics data from different experiments, standardization can and have been imposed on omics databases. Current standards for high-through-put database include MIAME, MIAPE, MSI, MIMIx. MIAME has been implemented with GEO and ArrayExpress microarray databases. The integration of various omics databases or prior biological information can enhance the effectiveness and efficiency of mining and interpretation of short-time series data to achieve biological discoveries. For example, multi-source prior biological information, i.e., prior noise-distribution has been proposed to enhance the performance of the data mining and network inference [43,44]. In addition, pathway and functional knowledge and metabolic data from different databases have also enhanced the clustering results and pathway identification [39-42]. These studies are discussed and referenced in the text.
Mentions: Improving short time-series analysis requires addressing the problems that arise due to limited sampling. Recent efforts by investigators to overcome the difficulties associated with limited sampling include decreasing the complexity of continuous time-series data based on simplification strategies [29,30] or enriching the information content of the data by incorporating multi-source information [31,32], see Figure 1 for a summary of possible options.

Bottom Line: Current efforts have shown promise in improving the analysis of short time-series microarray data, although challenges remain.This commentary addresses recent advances in methods for short-time series analysis including simplification-based approaches and the integration of multi-source information.Nevertheless, further studies and development of computational methods are needed to provide practical solutions to fully exploit the potential of this data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Chemical Engineering and Material Science, Michigan State University, East Lansing, MI 48824, USA. xwang@egr.msu.edu

ABSTRACT
The detection and analysis of steady-state gene expression has become routine. Time-series microarrays are of growing interest to systems biologists for deciphering the dynamic nature and complex regulation of biosystems. Most temporal microarray data only contain a limited number of time points, giving rise to short-time-series data, which imposes challenges for traditional methods of extracting meaningful information. To obtain useful information from the wealth of short-time series data requires addressing the problems that arise due to limited sampling. Current efforts have shown promise in improving the analysis of short time-series microarray data, although challenges remain. This commentary addresses recent advances in methods for short-time series analysis including simplification-based approaches and the integration of multi-source information. Nevertheless, further studies and development of computational methods are needed to provide practical solutions to fully exploit the potential of this data.

Show MeSH