Limits...
A comparison of machine learning techniques for survival prediction in breast cancer.

Vanneschi L, Farinaccio A, Mauri G, Antoniotti M, Provero P, Giacobini M - BioData Min (2011)

Bottom Line: We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptrons and Random Forests in classifying patients from the NKI breast cancer dataset, and comparably to the scoring-based method originally proposed by the authors of the 70-gene signature.Furthermore, Genetic Programming is able to perform an automatic feature selection.Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Biology Unit, Molecular Biotechnology Center, University of Torino, Italy. paolo.provero@unito.it.

ABSTRACT

Background: The ability to accurately classify cancer patients into risk classes, i.e. to predict the outcome of the pathology on an individual basis, is a key ingredient in making therapeutic decisions. In recent years gene expression data have been successfully used to complement the clinical and histological criteria traditionally used in such prediction. Many "gene expression signatures" have been developed, i.e. sets of genes whose expression values in a tumor can be used to predict the outcome of the pathology. Here we investigate the use of several machine learning techniques to classify breast cancer patients using one of such signatures, the well established 70-gene signature.

Results: We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptrons and Random Forests in classifying patients from the NKI breast cancer dataset, and comparably to the scoring-based method originally proposed by the authors of the 70-gene signature. Furthermore, Genetic Programming is able to perform an automatic feature selection.

Conclusions: Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data.

No MeSH data available.


Related in: MedlinePlus

The best-fitness model. Tree representation and the traditional Lisp representation of the model with the best fitness found by GP over the studied 50 independent runs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3108919&req=5

Figure 1: The best-fitness model. Tree representation and the traditional Lisp representation of the model with the best fitness found by GP over the studied 50 independent runs.

Mentions: • Maybe more importantly, GP can potentially offer biological insight and generate hypotheses for experimental work (see also [8]). Indeed an important result of our analysis is that the trees produced by GP tend to contain a limited number of features, and therefore are easily interpretable in biological terms. For example, the bestperforming tree is shown in Figure 1 and includes 7 genes (features).


A comparison of machine learning techniques for survival prediction in breast cancer.

Vanneschi L, Farinaccio A, Mauri G, Antoniotti M, Provero P, Giacobini M - BioData Min (2011)

The best-fitness model. Tree representation and the traditional Lisp representation of the model with the best fitness found by GP over the studied 50 independent runs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3108919&req=5

Figure 1: The best-fitness model. Tree representation and the traditional Lisp representation of the model with the best fitness found by GP over the studied 50 independent runs.
Mentions: • Maybe more importantly, GP can potentially offer biological insight and generate hypotheses for experimental work (see also [8]). Indeed an important result of our analysis is that the trees produced by GP tend to contain a limited number of features, and therefore are easily interpretable in biological terms. For example, the bestperforming tree is shown in Figure 1 and includes 7 genes (features).

Bottom Line: We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptrons and Random Forests in classifying patients from the NKI breast cancer dataset, and comparably to the scoring-based method originally proposed by the authors of the 70-gene signature.Furthermore, Genetic Programming is able to perform an automatic feature selection.Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computational Biology Unit, Molecular Biotechnology Center, University of Torino, Italy. paolo.provero@unito.it.

ABSTRACT

Background: The ability to accurately classify cancer patients into risk classes, i.e. to predict the outcome of the pathology on an individual basis, is a key ingredient in making therapeutic decisions. In recent years gene expression data have been successfully used to complement the clinical and histological criteria traditionally used in such prediction. Many "gene expression signatures" have been developed, i.e. sets of genes whose expression values in a tumor can be used to predict the outcome of the pathology. Here we investigate the use of several machine learning techniques to classify breast cancer patients using one of such signatures, the well established 70-gene signature.

Results: We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptrons and Random Forests in classifying patients from the NKI breast cancer dataset, and comparably to the scoring-based method originally proposed by the authors of the 70-gene signature. Furthermore, Genetic Programming is able to perform an automatic feature selection.

Conclusions: Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data.

No MeSH data available.


Related in: MedlinePlus