Limits...
Genetic programming based ensemble system for microarray data classification.

Liu KH, Tong M, Xie ST, Yee Ng VT - Comput Math Methods Med (2015)

Bottom Line: Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process.The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system.By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.

View Article: PubMed Central - PubMed

Affiliation: Software School of Xiamen University, Xiamen, Fujian 361005, China ; Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon 999077, Hong Kong.

ABSTRACT
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.

Show MeSH

Related in: MedlinePlus

Decompose GPES into different phases.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4355811&req=5

fig3: Decompose GPES into different phases.

Mentions: We decompose the training process of GPES into five different phases, as shown in Figure 3. Phase 1 generates 300 candidate trees with random feature and sample subsets; Phase 2 selects accurate trees from the candidates. Phase 1 and Phase 2 consist of the decision tree building process, and they use the first training (Phase 1) set and the validation (Phase 2) set. Phase 3 deploys GP to evolve a group of candidates; Phase 4 selects accurate individuals from the population in the last generation; Phase 5 uses a forward search algorithm to select the final ensemble committee. In general, Phases 3–5 consist of the GP evolutionary process. Phase 3 uses the first training set for training base classifiers and the validation sets for calculating fitness value, while Phases 4 and 5 share the second training set and the validation set for retraining individuals in last generation, so as to realize the selection of above-average individuals and the forward search step efficiently.


Genetic programming based ensemble system for microarray data classification.

Liu KH, Tong M, Xie ST, Yee Ng VT - Comput Math Methods Med (2015)

Decompose GPES into different phases.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4355811&req=5

fig3: Decompose GPES into different phases.
Mentions: We decompose the training process of GPES into five different phases, as shown in Figure 3. Phase 1 generates 300 candidate trees with random feature and sample subsets; Phase 2 selects accurate trees from the candidates. Phase 1 and Phase 2 consist of the decision tree building process, and they use the first training (Phase 1) set and the validation (Phase 2) set. Phase 3 deploys GP to evolve a group of candidates; Phase 4 selects accurate individuals from the population in the last generation; Phase 5 uses a forward search algorithm to select the final ensemble committee. In general, Phases 3–5 consist of the GP evolutionary process. Phase 3 uses the first training set for training base classifiers and the validation sets for calculating fitness value, while Phases 4 and 5 share the second training set and the validation set for retraining individuals in last generation, so as to realize the selection of above-average individuals and the forward search step efficiently.

Bottom Line: Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process.The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system.By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.

View Article: PubMed Central - PubMed

Affiliation: Software School of Xiamen University, Xiamen, Fujian 361005, China ; Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon 999077, Hong Kong.

ABSTRACT
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.

Show MeSH
Related in: MedlinePlus