Limits...
A comparative analysis of biomarker selection techniques.

Dessì N, Pascariello E, Pes B - Biomed Res Int (2013)

Bottom Line: It is recognized that different feature selection techniques may result in different set of biomarkers, that is, different groups of genes highly correlated to a given pathological condition, but few direct comparisons exist which quantify these differences in a systematic way.As a case study, we considered three benchmarks deriving from DNA microarray experiments and conducted a comparative analysis among eight selection methods, representatives of different classes of feature selection techniques.Our results show that the proposed approach can provide useful insight about the pattern of agreement of biomarker discovery techniques.

View Article: PubMed Central - PubMed

Affiliation: Dipartimento di Matematica e Informatica, Università degli Studi di Cagliari, Via Ospedale 72, 09124 Cagliari, Italy.

ABSTRACT
Feature selection has become the essential step in biomarker discovery from high-dimensional genomics data. It is recognized that different feature selection techniques may result in different set of biomarkers, that is, different groups of genes highly correlated to a given pathological condition, but few direct comparisons exist which quantify these differences in a systematic way. In this paper, we propose a general methodology for comparing the outcomes of different selection techniques in the context of biomarker discovery. The comparison is carried out along two dimensions: (i) measuring the similarity/dissimilarity of selected gene sets; (ii) evaluating the implications of these differences in terms of both predictive performance and stability of selected gene sets. As a case study, we considered three benchmarks deriving from DNA microarray experiments and conducted a comparative analysis among eight selection methods, representatives of different classes of feature selection techniques. Our results show that the proposed approach can provide useful insight about the pattern of agreement of biomarker discovery techniques.

Show MeSH
Prostate dataset: AUC versus number of genes.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3842054&req=5

fig8: Prostate dataset: AUC versus number of genes.

Mentions: As regards the evaluation of predictive performance, we trained a linear SVM classifier on each of the P = 20 gene subsets (of a given size) selected by a given ranking method from the reduced datasets randomly drawn from the original dataset: these reduced datasets serve at this stage as training sets. The average AUC performance, measured on the independent test sets (see Section 2.2), is shown in Figure 6 (Colon), Figure 7 (Leukemia), and Figure 8 (Prostate) for both univariate (χ2, IG, SU, GR, and OR) and multivariate methods (RF, SVM_RFE, and SVM_ONE); in each figure, the AUC trend is reported for gene subsets of increasing size.


A comparative analysis of biomarker selection techniques.

Dessì N, Pascariello E, Pes B - Biomed Res Int (2013)

Prostate dataset: AUC versus number of genes.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3842054&req=5

fig8: Prostate dataset: AUC versus number of genes.
Mentions: As regards the evaluation of predictive performance, we trained a linear SVM classifier on each of the P = 20 gene subsets (of a given size) selected by a given ranking method from the reduced datasets randomly drawn from the original dataset: these reduced datasets serve at this stage as training sets. The average AUC performance, measured on the independent test sets (see Section 2.2), is shown in Figure 6 (Colon), Figure 7 (Leukemia), and Figure 8 (Prostate) for both univariate (χ2, IG, SU, GR, and OR) and multivariate methods (RF, SVM_RFE, and SVM_ONE); in each figure, the AUC trend is reported for gene subsets of increasing size.

Bottom Line: It is recognized that different feature selection techniques may result in different set of biomarkers, that is, different groups of genes highly correlated to a given pathological condition, but few direct comparisons exist which quantify these differences in a systematic way.As a case study, we considered three benchmarks deriving from DNA microarray experiments and conducted a comparative analysis among eight selection methods, representatives of different classes of feature selection techniques.Our results show that the proposed approach can provide useful insight about the pattern of agreement of biomarker discovery techniques.

View Article: PubMed Central - PubMed

Affiliation: Dipartimento di Matematica e Informatica, Università degli Studi di Cagliari, Via Ospedale 72, 09124 Cagliari, Italy.

ABSTRACT
Feature selection has become the essential step in biomarker discovery from high-dimensional genomics data. It is recognized that different feature selection techniques may result in different set of biomarkers, that is, different groups of genes highly correlated to a given pathological condition, but few direct comparisons exist which quantify these differences in a systematic way. In this paper, we propose a general methodology for comparing the outcomes of different selection techniques in the context of biomarker discovery. The comparison is carried out along two dimensions: (i) measuring the similarity/dissimilarity of selected gene sets; (ii) evaluating the implications of these differences in terms of both predictive performance and stability of selected gene sets. As a case study, we considered three benchmarks deriving from DNA microarray experiments and conducted a comparative analysis among eight selection methods, representatives of different classes of feature selection techniques. Our results show that the proposed approach can provide useful insight about the pattern of agreement of biomarker discovery techniques.

Show MeSH