Limits...
Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences.

Fox NS, Starmans MH, Haider S, Lambin P, Boutros PC - BMC Bioinformatics (2014)

Bottom Line: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions.Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures.Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers.

View Article: PubMed Central - HTML - PubMed

Affiliation: Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, Canada. Paul.Boutros@oicr.on.ca.

ABSTRACT

Background: The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms.

Methods: To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients).

Results: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures.

Conclusions: Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers.

Show MeSH

Related in: MedlinePlus

Experimental design. Outline of the experimental design for ensemble classification and evaluation of a biomarker. Microarray data is pre-processed in 24 different ways to calculate mRNA abundance levels (Stage 1). Risk groups are subsequently assigned for the evaluated biomarker (Stage 2). Each of the resulting classifications represents a vote for whether the patient is in the low or the high risk group. The ensemble score is a summation over these individual classifications and ranges from 0 to 24 (Stage 3). Only unanimously classified patients (ensemble scores 0 and 24) are considered robust and are evaluated with Cox proportional hazard ratio modeling and Kaplan-Meier survival curves (Stage 4).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4061774&req=5

Figure 1: Experimental design. Outline of the experimental design for ensemble classification and evaluation of a biomarker. Microarray data is pre-processed in 24 different ways to calculate mRNA abundance levels (Stage 1). Risk groups are subsequently assigned for the evaluated biomarker (Stage 2). Each of the resulting classifications represents a vote for whether the patient is in the low or the high risk group. The ensemble score is a summation over these individual classifications and ranges from 0 to 24 (Stage 3). Only unanimously classified patients (ensemble scores 0 and 24) are considered robust and are evaluated with Cox proportional hazard ratio modeling and Kaplan-Meier survival curves (Stage 4).

Mentions: All analyses were performed in the R statistical environment (v2.15.2). The first step was to pre-process each dataset in 24 different ways: all combinations of 6 pre-processing algorithms, 2 types of gene annotations and 2 approaches for dataset handling. Thus, each pipeline was defined by three factors (FigureĀ 1). Each of these is outlined in detail in the following paragraphs.


Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences.

Fox NS, Starmans MH, Haider S, Lambin P, Boutros PC - BMC Bioinformatics (2014)

Experimental design. Outline of the experimental design for ensemble classification and evaluation of a biomarker. Microarray data is pre-processed in 24 different ways to calculate mRNA abundance levels (Stage 1). Risk groups are subsequently assigned for the evaluated biomarker (Stage 2). Each of the resulting classifications represents a vote for whether the patient is in the low or the high risk group. The ensemble score is a summation over these individual classifications and ranges from 0 to 24 (Stage 3). Only unanimously classified patients (ensemble scores 0 and 24) are considered robust and are evaluated with Cox proportional hazard ratio modeling and Kaplan-Meier survival curves (Stage 4).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4061774&req=5

Figure 1: Experimental design. Outline of the experimental design for ensemble classification and evaluation of a biomarker. Microarray data is pre-processed in 24 different ways to calculate mRNA abundance levels (Stage 1). Risk groups are subsequently assigned for the evaluated biomarker (Stage 2). Each of the resulting classifications represents a vote for whether the patient is in the low or the high risk group. The ensemble score is a summation over these individual classifications and ranges from 0 to 24 (Stage 3). Only unanimously classified patients (ensemble scores 0 and 24) are considered robust and are evaluated with Cox proportional hazard ratio modeling and Kaplan-Meier survival curves (Stage 4).
Mentions: All analyses were performed in the R statistical environment (v2.15.2). The first step was to pre-process each dataset in 24 different ways: all combinations of 6 pre-processing algorithms, 2 types of gene annotations and 2 approaches for dataset handling. Thus, each pipeline was defined by three factors (FigureĀ 1). Each of these is outlined in detail in the following paragraphs.

Bottom Line: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions.Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures.Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers.

View Article: PubMed Central - HTML - PubMed

Affiliation: Informatics and Bio-computing Platform, Ontario Institute for Cancer Research, Toronto, Canada. Paul.Boutros@oicr.on.ca.

ABSTRACT

Background: The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms.

Methods: To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients).

Results: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures.

Conclusions: Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers.

Show MeSH
Related in: MedlinePlus