Limits...
Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences.

Fox NS, Starmans MH, Haider S, Lambin P, Boutros PC - BMC Bioinformatics (2014)

Bottom Line: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions.Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures.Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers.

View Article: PubMed Central - HTML - PubMed

Affiliation: Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, Canada. Paul.Boutros@oicr.on.ca.

ABSTRACT

Background: The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms.

Methods: To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients).

Results: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures.

Conclusions: Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers.

Show MeSH

Related in: MedlinePlus

Methods comparison. Compare the contribution of annotation, dataset handling and algorithm choice as a function of the number of pre-processing methods included in the ensemble classification for the Hu signature and Winter metagene. Each point represents the log2 of the average hazard ratio using the ensemble approach of all combinations of x pipelines for the specific factor specified.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4061774&req=5

Figure 6: Methods comparison. Compare the contribution of annotation, dataset handling and algorithm choice as a function of the number of pre-processing methods included in the ensemble classification for the Hu signature and Winter metagene. Each point represents the log2 of the average hazard ratio using the ensemble approach of all combinations of x pipelines for the specific factor specified.

Mentions: On both platforms there was a significant difference between annotations. On HG-U133A, alternative annotation had higher hazard ratios (p = 2.61 × 10−2, paired t-test). In direct contrast, HG-U133 Plus 2.0 performed better with default annotation (p = 1.31 × 10−3, paired t-test). By contrast, the optimal pre-processing algorithm was similar in both platforms, with RMA and MBEI performing better than GCRMA and MAS5 (p = 3.23 × 10−3-3.53 × 10−7, paired t-test). RMA and MBEI showed similar results (p = 0.241, paired t-test) as did GCRMA and MAS5 (p = 0.074, paired t-test). Furthermore, we analyzed the effect of changing the number of variants in the ensemble when creating only ensembles from common pipeline variants (Figure 6). Once again, variant success is not necessarily consistent across signatures.


Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences.

Fox NS, Starmans MH, Haider S, Lambin P, Boutros PC - BMC Bioinformatics (2014)

Methods comparison. Compare the contribution of annotation, dataset handling and algorithm choice as a function of the number of pre-processing methods included in the ensemble classification for the Hu signature and Winter metagene. Each point represents the log2 of the average hazard ratio using the ensemble approach of all combinations of x pipelines for the specific factor specified.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4061774&req=5

Figure 6: Methods comparison. Compare the contribution of annotation, dataset handling and algorithm choice as a function of the number of pre-processing methods included in the ensemble classification for the Hu signature and Winter metagene. Each point represents the log2 of the average hazard ratio using the ensemble approach of all combinations of x pipelines for the specific factor specified.
Mentions: On both platforms there was a significant difference between annotations. On HG-U133A, alternative annotation had higher hazard ratios (p = 2.61 × 10−2, paired t-test). In direct contrast, HG-U133 Plus 2.0 performed better with default annotation (p = 1.31 × 10−3, paired t-test). By contrast, the optimal pre-processing algorithm was similar in both platforms, with RMA and MBEI performing better than GCRMA and MAS5 (p = 3.23 × 10−3-3.53 × 10−7, paired t-test). RMA and MBEI showed similar results (p = 0.241, paired t-test) as did GCRMA and MAS5 (p = 0.074, paired t-test). Furthermore, we analyzed the effect of changing the number of variants in the ensemble when creating only ensembles from common pipeline variants (Figure 6). Once again, variant success is not necessarily consistent across signatures.

Bottom Line: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions.Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures.Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers.

View Article: PubMed Central - HTML - PubMed

Affiliation: Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, Canada. Paul.Boutros@oicr.on.ca.

ABSTRACT

Background: The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms.

Methods: To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients).

Results: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures.

Conclusions: Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers.

Show MeSH
Related in: MedlinePlus