Limits...
Network-based Prediction of Cancer under Genetic Storm.

Ay A, Gong D, Kahveci T - Cancer Inform (2014)

Bottom Line: Here we present a new network-based supervised classification technique, namely the NBC method.We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods.Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics, Colgate University, Hamilton, NY, USA. ; Department of Biology, Colgate University, Hamilton, NY, USA.

ABSTRACT
Classification of cancer patients using traditional methods is a challenging task in the medical practice. Owing to rapid advances in microarray technologies, currently expression levels of thousands of genes from individual cancer patients can be measured. The classification of cancer patients by supervised statistical learning algorithms using the gene expression datasets provides an alternative to the traditional methods. Here we present a new network-based supervised classification technique, namely the NBC method. We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods. Our results on five large cancer datasets demonstrate that NBC method outperforms traditional classification techniques. Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy. Finally, in-depth analysis of the correlation-based co-expression networks chosen by our network-based classifier in different cancer classes shows that there are drastic changes in the network models of different cancer types.

No MeSH data available.


Related in: MedlinePlus

Intra- and inter-class prediction errors for different cancer datasets. In each graph, x-axis represents the class on which the model is built. y-axis represents the class on which the prediction is made.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4214593&req=5

f2-cin-suppl.3-2014-015: Intra- and inter-class prediction errors for different cancer datasets. In each graph, x-axis represents the class on which the model is built. y-axis represents the class on which the prediction is made.

Mentions: NBC method constructs a different and unique network for each cancer class and uses these networks and predictor functions constructed by linear regression to predict expression levels for the selected genes in each sample. In the next step, for each sample, it compares these class-specific predictions to actual gene expression levels in the sample. The method assigns the sample to the class that gives the minimum distance between the predicted and actual gene expression levels in the L2 norm. To see how distinctive our method is in separating different classes, we computed the prediction errors for inter- and intra-subclasses using each class-specific predictor function of the NBC method. More specifically, we computed the error using the relative L2 norm. Relative L2 norm is defined as //Y − f (X)//2 / //Y//2, where Y represents the actual gene expression levels for the test sample and f(X) represents the predicted gene expression levels for the same test sample. Figure 2 presents the results for all of the five cancer datasets we used. We make two important observations from these results. First, the prediction errors explain the classification accuracies of our method. As an example, for NCI60 dataset NBC classifier provides perfect accuracy levels, and in Figure 2, we observe that the models created by the NBC classifier yield the least prediction error for the samples in the same class (ie, the diagonal entries have the lowest values). However, the same cannot be observed for the breast cancer dataset, which provides the lowest classification accuracies out of five cancer datasets (see Table 1). Second, our results suggest that models for different classes have different prediction errors. For example, for the breast cancer, the model for class 1 produces significantly lower prediction error for the test samples in class 1 as compared to the samples in other classes. However, the model for class 2 fails to predict the samples from its own class, since it gives lower prediction errors for other classes. Similar observations can be seen in the model for class 3. These results suggest that the low classification accuracy (see Table 1, and Supplementary Tables 4 and 5) for breast cancer is because of the inaccurate predictions of cancer patients in classes 2 and 3.


Network-based Prediction of Cancer under Genetic Storm.

Ay A, Gong D, Kahveci T - Cancer Inform (2014)

Intra- and inter-class prediction errors for different cancer datasets. In each graph, x-axis represents the class on which the model is built. y-axis represents the class on which the prediction is made.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4214593&req=5

f2-cin-suppl.3-2014-015: Intra- and inter-class prediction errors for different cancer datasets. In each graph, x-axis represents the class on which the model is built. y-axis represents the class on which the prediction is made.
Mentions: NBC method constructs a different and unique network for each cancer class and uses these networks and predictor functions constructed by linear regression to predict expression levels for the selected genes in each sample. In the next step, for each sample, it compares these class-specific predictions to actual gene expression levels in the sample. The method assigns the sample to the class that gives the minimum distance between the predicted and actual gene expression levels in the L2 norm. To see how distinctive our method is in separating different classes, we computed the prediction errors for inter- and intra-subclasses using each class-specific predictor function of the NBC method. More specifically, we computed the error using the relative L2 norm. Relative L2 norm is defined as //Y − f (X)//2 / //Y//2, where Y represents the actual gene expression levels for the test sample and f(X) represents the predicted gene expression levels for the same test sample. Figure 2 presents the results for all of the five cancer datasets we used. We make two important observations from these results. First, the prediction errors explain the classification accuracies of our method. As an example, for NCI60 dataset NBC classifier provides perfect accuracy levels, and in Figure 2, we observe that the models created by the NBC classifier yield the least prediction error for the samples in the same class (ie, the diagonal entries have the lowest values). However, the same cannot be observed for the breast cancer dataset, which provides the lowest classification accuracies out of five cancer datasets (see Table 1). Second, our results suggest that models for different classes have different prediction errors. For example, for the breast cancer, the model for class 1 produces significantly lower prediction error for the test samples in class 1 as compared to the samples in other classes. However, the model for class 2 fails to predict the samples from its own class, since it gives lower prediction errors for other classes. Similar observations can be seen in the model for class 3. These results suggest that the low classification accuracy (see Table 1, and Supplementary Tables 4 and 5) for breast cancer is because of the inaccurate predictions of cancer patients in classes 2 and 3.

Bottom Line: Here we present a new network-based supervised classification technique, namely the NBC method.We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods.Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics, Colgate University, Hamilton, NY, USA. ; Department of Biology, Colgate University, Hamilton, NY, USA.

ABSTRACT
Classification of cancer patients using traditional methods is a challenging task in the medical practice. Owing to rapid advances in microarray technologies, currently expression levels of thousands of genes from individual cancer patients can be measured. The classification of cancer patients by supervised statistical learning algorithms using the gene expression datasets provides an alternative to the traditional methods. Here we present a new network-based supervised classification technique, namely the NBC method. We compare NBC to five traditional classification techniques (support vector machines (SVM), k-nearest neighbor (kNN), naïve Bayes (NB), C4.5, and random forest (RF)) using 50-300 genes selected by five feature selection methods. Our results on five large cancer datasets demonstrate that NBC method outperforms traditional classification techniques. Our analysis suggests that using symmetrical uncertainty (SU) feature selection method with NBC method provides the most accurate classification strategy. Finally, in-depth analysis of the correlation-based co-expression networks chosen by our network-based classifier in different cancer classes shows that there are drastic changes in the network models of different cancer types.

No MeSH data available.


Related in: MedlinePlus