Limits...
Network-based Prediction of Cancer under Genetic Storm.

Ay A, Gong D, Kahveci T - Cancer Inform (2014)

Bottom Line: Here we present a new network-based supervised classification technique, namely the NBC method.We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods.Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics, Colgate University, Hamilton, NY, USA. ; Department of Biology, Colgate University, Hamilton, NY, USA.

ABSTRACT
Classification of cancer patients using traditional methods is a challenging task in the medical practice. Owing to rapid advances in microarray technologies, currently expression levels of thousands of genes from individual cancer patients can be measured. The classification of cancer patients by supervised statistical learning algorithms using the gene expression datasets provides an alternative to the traditional methods. Here we present a new network-based supervised classification technique, namely the NBC method. We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods. Our results on five large cancer datasets demonstrate that NBC method outperforms traditional classification techniques. Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy. Finally, in-depth analysis of the correlation-based co-expression networks chosen by our network-based classifier in different cancer classes shows that there are drastic changes in the network models of different cancer types.

No MeSH data available.


Related in: MedlinePlus

Dependency of the network density on cancer datasets and feature selection methods. Heat maps depicting the network density levels for varying number of genes and Pearson correlation cutoffs are shown. Columns refer to the cancer datasets: leukemia, breast, lung, NCI60, and colon. Rows correspond to the feature selection methods: SVM-FS, symmetrical uncertainty, χ2, information gain, and PAM. The x-axis in each heat map refers to the Pearson correlation cutoff used to determine the gene-to-gene associations. The y-axis denotes the number of genes used in the NBC method.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4214593&req=5

f3-cin-suppl.3-2014-015: Dependency of the network density on cancer datasets and feature selection methods. Heat maps depicting the network density levels for varying number of genes and Pearson correlation cutoffs are shown. Columns refer to the cancer datasets: leukemia, breast, lung, NCI60, and colon. Rows correspond to the feature selection methods: SVM-FS, symmetrical uncertainty, χ2, information gain, and PAM. The x-axis in each heat map refers to the Pearson correlation cutoff used to determine the gene-to-gene associations. The y-axis denotes the number of genes used in the NBC method.

Mentions: Next, we focus on one of the most fundamental characteristics of the network models constructed by the NBC method, namely, we study the density of the resulting networks (ie, average number of gene-to-gene associations) formed by the NBC method for different cancer datasets. Figure 3 plots the results for varying number of genes and Pearson correlation threshold values. We observe that network density depends on the number of genes and the correlation threshold. In general, as the number of genes increases and correlation threshold decreases, the number of associations in the network increases. While this qualitative behavior is dataset independent, it shows slight quantitative differences. For example, some of the networks formed by the NBC method for the leukemia, NCI60, and lung cancer datasets are very dense networks (up to ~99, ~52, and ~102 average gene-to-gene associations, respectively). However, the breast and colon cancer datasets show significantly lower density levels.


Network-based Prediction of Cancer under Genetic Storm.

Ay A, Gong D, Kahveci T - Cancer Inform (2014)

Dependency of the network density on cancer datasets and feature selection methods. Heat maps depicting the network density levels for varying number of genes and Pearson correlation cutoffs are shown. Columns refer to the cancer datasets: leukemia, breast, lung, NCI60, and colon. Rows correspond to the feature selection methods: SVM-FS, symmetrical uncertainty, χ2, information gain, and PAM. The x-axis in each heat map refers to the Pearson correlation cutoff used to determine the gene-to-gene associations. The y-axis denotes the number of genes used in the NBC method.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4214593&req=5

f3-cin-suppl.3-2014-015: Dependency of the network density on cancer datasets and feature selection methods. Heat maps depicting the network density levels for varying number of genes and Pearson correlation cutoffs are shown. Columns refer to the cancer datasets: leukemia, breast, lung, NCI60, and colon. Rows correspond to the feature selection methods: SVM-FS, symmetrical uncertainty, χ2, information gain, and PAM. The x-axis in each heat map refers to the Pearson correlation cutoff used to determine the gene-to-gene associations. The y-axis denotes the number of genes used in the NBC method.
Mentions: Next, we focus on one of the most fundamental characteristics of the network models constructed by the NBC method, namely, we study the density of the resulting networks (ie, average number of gene-to-gene associations) formed by the NBC method for different cancer datasets. Figure 3 plots the results for varying number of genes and Pearson correlation threshold values. We observe that network density depends on the number of genes and the correlation threshold. In general, as the number of genes increases and correlation threshold decreases, the number of associations in the network increases. While this qualitative behavior is dataset independent, it shows slight quantitative differences. For example, some of the networks formed by the NBC method for the leukemia, NCI60, and lung cancer datasets are very dense networks (up to ~99, ~52, and ~102 average gene-to-gene associations, respectively). However, the breast and colon cancer datasets show significantly lower density levels.

Bottom Line: Here we present a new network-based supervised classification technique, namely the NBC method.We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods.Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics, Colgate University, Hamilton, NY, USA. ; Department of Biology, Colgate University, Hamilton, NY, USA.

ABSTRACT
Classification of cancer patients using traditional methods is a challenging task in the medical practice. Owing to rapid advances in microarray technologies, currently expression levels of thousands of genes from individual cancer patients can be measured. The classification of cancer patients by supervised statistical learning algorithms using the gene expression datasets provides an alternative to the traditional methods. Here we present a new network-based supervised classification technique, namely the NBC method. We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods. Our results on five large cancer datasets demonstrate that NBC method outperforms traditional classification techniques. Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy. Finally, in-depth analysis of the correlation-based co-expression networks chosen by our network-based classifier in different cancer classes shows that there are drastic changes in the network models of different cancer types.

No MeSH data available.


Related in: MedlinePlus