Limits...
Integrative missing value estimation for microarray data.

Hu J, Li H, Waterman MS, Zhou XJ - BMC Bioinformatics (2006)

Bottom Line: Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements.With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 900089, USA. jianjunh@usc.edu

ABSTRACT

Background: Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples.

Results: We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.

Conclusion: We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements. With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.

Show MeSH
Performance with respect to the number of reference datasets. Performance comparison with respect to the number of reference datasets. In general, iMISS algorithms achieve best performance when the number of reference datasets is between 3–7.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1622759&req=5

Figure 6: Performance with respect to the number of reference datasets. Performance comparison with respect to the number of reference datasets. In general, iMISS algorithms achieve best performance when the number of reference datasets is between 3–7.

Mentions: Figure 6 shows two different trends of the algorithm performance with respect to the number of reference datasets. For integrative algorithms that outperform the base algorithms such as iLLS-O for all three datasets, iLLS-D for DER7 and OGA8 datasets, and iKNN-D for FER4 dataset, the performance of these integrative algorithms in general is not very sensitive to the number of reference datasets R. For example, the performance gains of iLLS-D and iLLS-O over LLS for DER7 and OGA8 are significant for R ranging from 4 to 7, although the optimal value of R depends on the target dataset. On the other hand, for integrative algorithms in which the neighbor selection method does not match well with the imputation procedure, increasing the number of reference datasets usually leads to even worse results. This is the case for KNN-O for all three datasets, KNN-D for OGA8, and iLLS-D for FER4. In both situations, including too many (e.g. eight in this study) reference datasets leads to performance degradation.


Integrative missing value estimation for microarray data.

Hu J, Li H, Waterman MS, Zhou XJ - BMC Bioinformatics (2006)

Performance with respect to the number of reference datasets. Performance comparison with respect to the number of reference datasets. In general, iMISS algorithms achieve best performance when the number of reference datasets is between 3–7.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1622759&req=5

Figure 6: Performance with respect to the number of reference datasets. Performance comparison with respect to the number of reference datasets. In general, iMISS algorithms achieve best performance when the number of reference datasets is between 3–7.
Mentions: Figure 6 shows two different trends of the algorithm performance with respect to the number of reference datasets. For integrative algorithms that outperform the base algorithms such as iLLS-O for all three datasets, iLLS-D for DER7 and OGA8 datasets, and iKNN-D for FER4 dataset, the performance of these integrative algorithms in general is not very sensitive to the number of reference datasets R. For example, the performance gains of iLLS-D and iLLS-O over LLS for DER7 and OGA8 are significant for R ranging from 4 to 7, although the optimal value of R depends on the target dataset. On the other hand, for integrative algorithms in which the neighbor selection method does not match well with the imputation procedure, increasing the number of reference datasets usually leads to even worse results. This is the case for KNN-O for all three datasets, KNN-D for OGA8, and iLLS-D for FER4. In both situations, including too many (e.g. eight in this study) reference datasets leads to performance degradation.

Bottom Line: Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements.With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 900089, USA. jianjunh@usc.edu

ABSTRACT

Background: Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples.

Results: We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.

Conclusion: We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements. With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.

Show MeSH