Limits...
Network-based Prediction of Cancer under Genetic Storm.

Ay A, Gong D, Kahveci T - Cancer Inform (2014)

Bottom Line: Here we present a new network-based supervised classification technique, namely the NBC method.We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods.Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics, Colgate University, Hamilton, NY, USA. ; Department of Biology, Colgate University, Hamilton, NY, USA.

ABSTRACT
Classification of cancer patients using traditional methods is a challenging task in the medical practice. Owing to rapid advances in microarray technologies, currently expression levels of thousands of genes from individual cancer patients can be measured. The classification of cancer patients by supervised statistical learning algorithms using the gene expression datasets provides an alternative to the traditional methods. Here we present a new network-based supervised classification technique, namely the NBC method. We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods. Our results on five large cancer datasets demonstrate that NBC method outperforms traditional classification techniques. Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy. Finally, in-depth analysis of the correlation-based co-expression networks chosen by our network-based classifier in different cancer classes shows that there are drastic changes in the network models of different cancer types.

No MeSH data available.


Related in: MedlinePlus

Accuracy dependency of the NBC method to the number of genes and gene-to-gene associations in the network. Heat maps depicting the accuracy levels for varying number of genes and gene-to-gene interaction density are shown. In the figure, columns refer to the cancer datasets: leukemia, breast, lung, NCI60, and colon. Similarly, rows correspond to the feature selection methods: SVM-FS, symmetrical uncertainty, χ2, information gain, and PAM. The x-axis in each heat map refers to the Pearson correlation cutoff used to determine the gene-to-gene associations. The y-axis denotes the number of genes used in the NBC method.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4214593&req=5

f1-cin-suppl.3-2014-015: Accuracy dependency of the NBC method to the number of genes and gene-to-gene associations in the network. Heat maps depicting the accuracy levels for varying number of genes and gene-to-gene interaction density are shown. In the figure, columns refer to the cancer datasets: leukemia, breast, lung, NCI60, and colon. Similarly, rows correspond to the feature selection methods: SVM-FS, symmetrical uncertainty, χ2, information gain, and PAM. The x-axis in each heat map refers to the Pearson correlation cutoff used to determine the gene-to-gene associations. The y-axis denotes the number of genes used in the NBC method.

Mentions: So far, in our experiments we have demonstrated that NBC yields better or similar accuracy as compared to state-of-the-art methods. Next, we focus further on the NBC method to understand its characteristics, strengths, and limitations. Briefly, two parameters characterize the predictive models generated by NBC. These are (i) the number of genes selected and (ii) the Pearson correlation threshold. These two parameters control the number of nodes and edges in the network models generated by NBC, respectively. We vary the values of these two parameters and report the accuracy of NBC for each parameter setting. More specifically, we vary the number of genes in the [50:300] interval and the Pearson correlation threshold in the [0.6:0.95] interval. Figure 1 presents the results.


Network-based Prediction of Cancer under Genetic Storm.

Ay A, Gong D, Kahveci T - Cancer Inform (2014)

Accuracy dependency of the NBC method to the number of genes and gene-to-gene associations in the network. Heat maps depicting the accuracy levels for varying number of genes and gene-to-gene interaction density are shown. In the figure, columns refer to the cancer datasets: leukemia, breast, lung, NCI60, and colon. Similarly, rows correspond to the feature selection methods: SVM-FS, symmetrical uncertainty, χ2, information gain, and PAM. The x-axis in each heat map refers to the Pearson correlation cutoff used to determine the gene-to-gene associations. The y-axis denotes the number of genes used in the NBC method.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4214593&req=5

f1-cin-suppl.3-2014-015: Accuracy dependency of the NBC method to the number of genes and gene-to-gene associations in the network. Heat maps depicting the accuracy levels for varying number of genes and gene-to-gene interaction density are shown. In the figure, columns refer to the cancer datasets: leukemia, breast, lung, NCI60, and colon. Similarly, rows correspond to the feature selection methods: SVM-FS, symmetrical uncertainty, χ2, information gain, and PAM. The x-axis in each heat map refers to the Pearson correlation cutoff used to determine the gene-to-gene associations. The y-axis denotes the number of genes used in the NBC method.
Mentions: So far, in our experiments we have demonstrated that NBC yields better or similar accuracy as compared to state-of-the-art methods. Next, we focus further on the NBC method to understand its characteristics, strengths, and limitations. Briefly, two parameters characterize the predictive models generated by NBC. These are (i) the number of genes selected and (ii) the Pearson correlation threshold. These two parameters control the number of nodes and edges in the network models generated by NBC, respectively. We vary the values of these two parameters and report the accuracy of NBC for each parameter setting. More specifically, we vary the number of genes in the [50:300] interval and the Pearson correlation threshold in the [0.6:0.95] interval. Figure 1 presents the results.

Bottom Line: Here we present a new network-based supervised classification technique, namely the NBC method.We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods.Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics, Colgate University, Hamilton, NY, USA. ; Department of Biology, Colgate University, Hamilton, NY, USA.

ABSTRACT
Classification of cancer patients using traditional methods is a challenging task in the medical practice. Owing to rapid advances in microarray technologies, currently expression levels of thousands of genes from individual cancer patients can be measured. The classification of cancer patients by supervised statistical learning algorithms using the gene expression datasets provides an alternative to the traditional methods. Here we present a new network-based supervised classification technique, namely the NBC method. We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods. Our results on five large cancer datasets demonstrate that NBC method outperforms traditional classification techniques. Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy. Finally, in-depth analysis of the correlation-based co-expression networks chosen by our network-based classifier in different cancer classes shows that there are drastic changes in the network models of different cancer types.

No MeSH data available.


Related in: MedlinePlus