Limits...
Integrative missing value estimation for microarray data.

Hu J, Li H, Waterman MS, Zhou XJ - BMC Bioinformatics (2006)

Bottom Line: Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements.With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 900089, USA. jianjunh@usc.edu

ABSTRACT

Background: Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples.

Results: We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.

Conclusion: We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements. With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.

Show MeSH
Framework of iMISS. iMISS (Integrative MISSing value estimation using multiple datasets) is composed four steps including reference dataset selection, neighbour gene selection, local imputation, and accuracy estimation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1622759&req=5

Figure 1: Framework of iMISS. iMISS (Integrative MISSing value estimation using multiple datasets) is composed four steps including reference dataset selection, neighbour gene selection, local imputation, and accuracy estimation.

Mentions: In Figure 1 we show the framework of the integrative missing value estimation (iMISS). There are four steps in the estimation process. The first step is to select a set of microarray datasets as reference datasets based on their expression similarity to the target dataset. The second step is to select the top k neighbor genes based on the target dataset and the reference datasets. Two methods have been tested: one based on order statistics and the other on average distance. Next, one can use any local missing value estimation algorithm such as LLS and KNN to impute missing values in the dataset. Since it is difficult to know in advance whether the reference datasets are sufficient to produce high quality estimations, a fourth step is introduced to assess estimation quality of integrative imputation algorithms using a submatrix imputation approach.


Integrative missing value estimation for microarray data.

Hu J, Li H, Waterman MS, Zhou XJ - BMC Bioinformatics (2006)

Framework of iMISS. iMISS (Integrative MISSing value estimation using multiple datasets) is composed four steps including reference dataset selection, neighbour gene selection, local imputation, and accuracy estimation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1622759&req=5

Figure 1: Framework of iMISS. iMISS (Integrative MISSing value estimation using multiple datasets) is composed four steps including reference dataset selection, neighbour gene selection, local imputation, and accuracy estimation.
Mentions: In Figure 1 we show the framework of the integrative missing value estimation (iMISS). There are four steps in the estimation process. The first step is to select a set of microarray datasets as reference datasets based on their expression similarity to the target dataset. The second step is to select the top k neighbor genes based on the target dataset and the reference datasets. Two methods have been tested: one based on order statistics and the other on average distance. Next, one can use any local missing value estimation algorithm such as LLS and KNN to impute missing values in the dataset. Since it is difficult to know in advance whether the reference datasets are sufficient to produce high quality estimations, a fourth step is introduced to assess estimation quality of integrative imputation algorithms using a submatrix imputation approach.

Bottom Line: Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements.With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 900089, USA. jianjunh@usc.edu

ABSTRACT

Background: Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples.

Results: We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.

Conclusion: We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements. With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.

Show MeSH