Limits...
A jackknife and voting classifier approach to feature selection and classification.

Taylor SL, Kim K - Cancer Inform (2011)

Bottom Line: While common classification methods such as random forest and support vector machines are effective at separating groups, they do not directly translate into a clinically-applicable classification rule based on a small number of features.We present a simple feature selection and classification method for biomarker detection that is intuitively understandable and can be directly extended for application to a clinical setting.We found our jackknife procedure and voting classifier to perform comparably to these two methods in terms of accuracy.Voting classifiers in combination with a robust feature selection method such as our jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.

View Article: PubMed Central - PubMed

Affiliation: Division of Biostatistics, Department of Public Health Sciences, University of California School of Medicine, Davis, CA, USA.

ABSTRACT
With technological advances now allowing measurement of thousands of genes, proteins and metabolites, researchers are using this information to develop diagnostic and prognostic tests and discern the biological pathways underlying diseases. Often, an investigator's objective is to develop a classification rule to predict group membership of unknown samples based on a small set of features and that could ultimately be used in a clinical setting. While common classification methods such as random forest and support vector machines are effective at separating groups, they do not directly translate into a clinically-applicable classification rule based on a small number of features.We present a simple feature selection and classification method for biomarker detection that is intuitively understandable and can be directly extended for application to a clinical setting. We first use a jackknife procedure to identify important features and then, for classification, we use voting classifiers which are simple and easy to implement. We compared our method to random forest and support vector machines using three benchmark cancer 'omics datasets with different characteristics. We found our jackknife procedure and voting classifier to perform comparably to these two methods in terms of accuracy. Further, the jackknife procedure yielded stable feature sets. Voting classifiers in combination with a robust feature selection method such as our jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.

No MeSH data available.


Related in: MedlinePlus

Mean accuracy of weigthed voting classifier versus mean BSS/WSS. Mean accuracy of the weighted voting classifier using three features versus the mean BSS/WSS of these features for two gene expression data sets (leukemia, lung cancer) and a proteomics data set (prostate cancer). Mean values were calculated across 1,000 random training:test set partitions. Features to include in the classifiers were identified through a jackknife procedure through which features were ranked according to their frequency of occurrence in the top 1% most significant features based on t-statistics across all jackknife samples. Mean BSS/WSS was calculated separately using the training and test set portions of each random partition.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3091410&req=5

f5-cin-2011-133: Mean accuracy of weigthed voting classifier versus mean BSS/WSS. Mean accuracy of the weighted voting classifier using three features versus the mean BSS/WSS of these features for two gene expression data sets (leukemia, lung cancer) and a proteomics data set (prostate cancer). Mean values were calculated across 1,000 random training:test set partitions. Features to include in the classifiers were identified through a jackknife procedure through which features were ranked according to their frequency of occurrence in the top 1% most significant features based on t-statistics across all jackknife samples. Mean BSS/WSS was calculated separately using the training and test set portions of each random partition.

Mentions: We further evaluated the relationship between classifier performance and the BSS/WSS of features in the classifier using the weighted voting classifier with just the first three features in order of frequency of occurrence in MRV repetitions. Three features accounted for much of the classifier’s performance particularly for the lung cancer and leukemia data sets. Accuracy generally increased as the mean BSS/WSS of the three features included in the classifier increased in the training and test sets (Fig. 5). The lung cancer and leukemia data sets had the highest mean BSS/WSS values and also the highest accuracies while the lowest BSS/WSS values and accuracies occurred for the prostate cancer data set. Considering the test sets, when the mean BSS/WSS of these three features was greater than 1, the mean accuracy of the weighted vote classifier was greater than 80% for all data sets (Leukemia: 89%, Lung Cancer: 95%, Prostate Cancer: 81%) and was considerably lower when the mean BSS/WSS was less than 1 (Leukemia: 82%, Lung Cancer: 80%, Prostate Cancer: 66%).


A jackknife and voting classifier approach to feature selection and classification.

Taylor SL, Kim K - Cancer Inform (2011)

Mean accuracy of weigthed voting classifier versus mean BSS/WSS. Mean accuracy of the weighted voting classifier using three features versus the mean BSS/WSS of these features for two gene expression data sets (leukemia, lung cancer) and a proteomics data set (prostate cancer). Mean values were calculated across 1,000 random training:test set partitions. Features to include in the classifiers were identified through a jackknife procedure through which features were ranked according to their frequency of occurrence in the top 1% most significant features based on t-statistics across all jackknife samples. Mean BSS/WSS was calculated separately using the training and test set portions of each random partition.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3091410&req=5

f5-cin-2011-133: Mean accuracy of weigthed voting classifier versus mean BSS/WSS. Mean accuracy of the weighted voting classifier using three features versus the mean BSS/WSS of these features for two gene expression data sets (leukemia, lung cancer) and a proteomics data set (prostate cancer). Mean values were calculated across 1,000 random training:test set partitions. Features to include in the classifiers were identified through a jackknife procedure through which features were ranked according to their frequency of occurrence in the top 1% most significant features based on t-statistics across all jackknife samples. Mean BSS/WSS was calculated separately using the training and test set portions of each random partition.
Mentions: We further evaluated the relationship between classifier performance and the BSS/WSS of features in the classifier using the weighted voting classifier with just the first three features in order of frequency of occurrence in MRV repetitions. Three features accounted for much of the classifier’s performance particularly for the lung cancer and leukemia data sets. Accuracy generally increased as the mean BSS/WSS of the three features included in the classifier increased in the training and test sets (Fig. 5). The lung cancer and leukemia data sets had the highest mean BSS/WSS values and also the highest accuracies while the lowest BSS/WSS values and accuracies occurred for the prostate cancer data set. Considering the test sets, when the mean BSS/WSS of these three features was greater than 1, the mean accuracy of the weighted vote classifier was greater than 80% for all data sets (Leukemia: 89%, Lung Cancer: 95%, Prostate Cancer: 81%) and was considerably lower when the mean BSS/WSS was less than 1 (Leukemia: 82%, Lung Cancer: 80%, Prostate Cancer: 66%).

Bottom Line: While common classification methods such as random forest and support vector machines are effective at separating groups, they do not directly translate into a clinically-applicable classification rule based on a small number of features.We present a simple feature selection and classification method for biomarker detection that is intuitively understandable and can be directly extended for application to a clinical setting.We found our jackknife procedure and voting classifier to perform comparably to these two methods in terms of accuracy.Voting classifiers in combination with a robust feature selection method such as our jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.

View Article: PubMed Central - PubMed

Affiliation: Division of Biostatistics, Department of Public Health Sciences, University of California School of Medicine, Davis, CA, USA.

ABSTRACT
With technological advances now allowing measurement of thousands of genes, proteins and metabolites, researchers are using this information to develop diagnostic and prognostic tests and discern the biological pathways underlying diseases. Often, an investigator's objective is to develop a classification rule to predict group membership of unknown samples based on a small set of features and that could ultimately be used in a clinical setting. While common classification methods such as random forest and support vector machines are effective at separating groups, they do not directly translate into a clinically-applicable classification rule based on a small number of features.We present a simple feature selection and classification method for biomarker detection that is intuitively understandable and can be directly extended for application to a clinical setting. We first use a jackknife procedure to identify important features and then, for classification, we use voting classifiers which are simple and easy to implement. We compared our method to random forest and support vector machines using three benchmark cancer 'omics datasets with different characteristics. We found our jackknife procedure and voting classifier to perform comparably to these two methods in terms of accuracy. Further, the jackknife procedure yielded stable feature sets. Voting classifiers in combination with a robust feature selection method such as our jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.

No MeSH data available.


Related in: MedlinePlus