Limits...
Multiple imputations applied to the DREAM3 phosphoproteomics challenge: a winning strategy.

Guex N, Migliavacca E, Xenarios I - PLoS ONE (2010)

Bottom Line: Here we present the strategy we used to make a winning prediction for the DREAM3 phosphoproteomics challenge.We discuss the accuracy of our findings and show that multiple imputations applied to this dataset is a powerful method to accurately estimate the missing data.We postulate that multiple imputations methods might become an integral part of experimental design as a mean to achieve cost savings in experimental design or to increase the quantity of samples that could be handled for a given cost.

View Article: PubMed Central - PubMed

Affiliation: Vital-IT Group, Swiss Institute of Bioinformatics, Lausanne, Switzerland. nicolas.guex@isb-sib.ch

ABSTRACT
DREAM is an initiative that allows researchers to assess how well their methods or approaches can describe and predict networks of interacting molecules [1]. Each year, recently acquired datasets are released to predictors ahead of publication. Researchers typically have about three months to predict the masked data or network of interactions, using any predictive method. Predictions are assessed prior to an annual conference where the best predictions are unveiled and discussed. Here we present the strategy we used to make a winning prediction for the DREAM3 phosphoproteomics challenge. We used Amelia II, a multiple imputation software method developed by Gary King, James Honaker and Matthew Blackwell[2] in the context of social sciences to predict the 476 out of 4624 measurements that had been masked for the challenge. To chose the best possible multiple imputation parameters to apply for the challenge, we evaluated how transforming the data and varying the imputation parameters affected the ability to predict additionally masked data. We discuss the accuracy of our findings and show that multiple imputations applied to this dataset is a powerful method to accurately estimate the missing data. We postulate that multiple imputations methods might become an integral part of experimental design as a mean to achieve cost savings in experimental design or to increase the quantity of samples that could be handled for a given cost.

Show MeSH

Related in: MedlinePlus

Inspection of the challenge data through Principal Component Analysis (PCA).All measurements classes were pooled together, irrespective of the CellType, Time, Stimulus and Inhibitor. Scatter plots with representation of the various classes were produced with the s.class command of the ade4 R package. The various classes are: Top left: CellType (Normal, Cancer). Top right: Time (0, 30, 180 mn). Bottom left: The seven stimuli. Bottom right: The seven inhibitors.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2807461&req=5

pone-0008012-g003: Inspection of the challenge data through Principal Component Analysis (PCA).All measurements classes were pooled together, irrespective of the CellType, Time, Stimulus and Inhibitor. Scatter plots with representation of the various classes were produced with the s.class command of the ade4 R package. The various classes are: Top left: CellType (Normal, Cancer). Top right: Time (0, 30, 180 mn). Bottom left: The seven stimuli. Bottom right: The seven inhibitors.

Mentions: To get a “feel” for the data, we performed a principal component analysis (PCA) using the dudi.pca module of the ade4 package[12] of R (Figure 3). It is obviously apparent that there is a large difference between Cancer and Normal cells. Likewise, some grouping is also apparent for the various time points. Measurements at time zero and 180mn cluster in relatively tight neighboring regions of the PCA space. In contrast, there is a large dispersion of the measurements for time 30 mn. Moreover, those measurements tend to be further away from measurements at time zero than measurements at time 180 mn are. Those observations led us to try various parameters that would account for time effect and cell type effect (cross-section Normal vs Cancer) during the multiple imputation process.


Multiple imputations applied to the DREAM3 phosphoproteomics challenge: a winning strategy.

Guex N, Migliavacca E, Xenarios I - PLoS ONE (2010)

Inspection of the challenge data through Principal Component Analysis (PCA).All measurements classes were pooled together, irrespective of the CellType, Time, Stimulus and Inhibitor. Scatter plots with representation of the various classes were produced with the s.class command of the ade4 R package. The various classes are: Top left: CellType (Normal, Cancer). Top right: Time (0, 30, 180 mn). Bottom left: The seven stimuli. Bottom right: The seven inhibitors.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2807461&req=5

pone-0008012-g003: Inspection of the challenge data through Principal Component Analysis (PCA).All measurements classes were pooled together, irrespective of the CellType, Time, Stimulus and Inhibitor. Scatter plots with representation of the various classes were produced with the s.class command of the ade4 R package. The various classes are: Top left: CellType (Normal, Cancer). Top right: Time (0, 30, 180 mn). Bottom left: The seven stimuli. Bottom right: The seven inhibitors.
Mentions: To get a “feel” for the data, we performed a principal component analysis (PCA) using the dudi.pca module of the ade4 package[12] of R (Figure 3). It is obviously apparent that there is a large difference between Cancer and Normal cells. Likewise, some grouping is also apparent for the various time points. Measurements at time zero and 180mn cluster in relatively tight neighboring regions of the PCA space. In contrast, there is a large dispersion of the measurements for time 30 mn. Moreover, those measurements tend to be further away from measurements at time zero than measurements at time 180 mn are. Those observations led us to try various parameters that would account for time effect and cell type effect (cross-section Normal vs Cancer) during the multiple imputation process.

Bottom Line: Here we present the strategy we used to make a winning prediction for the DREAM3 phosphoproteomics challenge.We discuss the accuracy of our findings and show that multiple imputations applied to this dataset is a powerful method to accurately estimate the missing data.We postulate that multiple imputations methods might become an integral part of experimental design as a mean to achieve cost savings in experimental design or to increase the quantity of samples that could be handled for a given cost.

View Article: PubMed Central - PubMed

Affiliation: Vital-IT Group, Swiss Institute of Bioinformatics, Lausanne, Switzerland. nicolas.guex@isb-sib.ch

ABSTRACT
DREAM is an initiative that allows researchers to assess how well their methods or approaches can describe and predict networks of interacting molecules [1]. Each year, recently acquired datasets are released to predictors ahead of publication. Researchers typically have about three months to predict the masked data or network of interactions, using any predictive method. Predictions are assessed prior to an annual conference where the best predictions are unveiled and discussed. Here we present the strategy we used to make a winning prediction for the DREAM3 phosphoproteomics challenge. We used Amelia II, a multiple imputation software method developed by Gary King, James Honaker and Matthew Blackwell[2] in the context of social sciences to predict the 476 out of 4624 measurements that had been masked for the challenge. To chose the best possible multiple imputation parameters to apply for the challenge, we evaluated how transforming the data and varying the imputation parameters affected the ability to predict additionally masked data. We discuss the accuracy of our findings and show that multiple imputations applied to this dataset is a powerful method to accurately estimate the missing data. We postulate that multiple imputations methods might become an integral part of experimental design as a mean to achieve cost savings in experimental design or to increase the quantity of samples that could be handled for a given cost.

Show MeSH
Related in: MedlinePlus