Limits...
Lung cancer prediction using neural network ensemble with histogram of oriented gradient genomic features.

Adetiba E, Olugbara OO - ScientificWorldJournal (2015)

Bottom Line: The Voss DNA encoding was used to map the nucleotide sequences of mutated and normal genomes to obtain the equivalent numerical genomic sequences for training the selected classifiers.The histogram of oriented gradient (HOG) and local binary pattern (LBP) state-of-the-art feature extraction schemes were applied to extract representative genomic features from the encoded sequences of nucleotides.The ANN ensemble and HOG best fit the training dataset of this study with an accuracy of 95.90% and mean square error of 0.0159.

View Article: PubMed Central - PubMed

Affiliation: ICT and Society Research Group, Durban University of Technology, P.O. Box 1334, Durban 4000, South Africa.

ABSTRACT
This paper reports an experimental comparison of artificial neural network (ANN) and support vector machine (SVM) ensembles and their "nonensemble" variants for lung cancer prediction. These machine learning classifiers were trained to predict lung cancer using samples of patient nucleotides with mutations in the epidermal growth factor receptor, Kirsten rat sarcoma viral oncogene, and tumor suppressor p53 genomes collected as biomarkers from the IGDB.NSCLC corpus. The Voss DNA encoding was used to map the nucleotide sequences of mutated and normal genomes to obtain the equivalent numerical genomic sequences for training the selected classifiers. The histogram of oriented gradient (HOG) and local binary pattern (LBP) state-of-the-art feature extraction schemes were applied to extract representative genomic features from the encoded sequences of nucleotides. The ANN ensemble and HOG best fit the training dataset of this study with an accuracy of 95.90% and mean square error of 0.0159. The result of the ANN ensemble and HOG genomic features is promising for automated screening and early detection of lung cancer. This will hopefully assist pathologists in administering targeted molecular therapy and offering counsel to early stage lung cancer patients and persons in at risk populations.

No MeSH data available.


Related in: MedlinePlus

Performance plot for the best ANN in Table 7.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4352926&req=5

fig7: Performance plot for the best ANN in Table 7.

Mentions: In the first experimental setup, the number of iterations for training the network called epochs in ANN parlance was set to 500. In order to eliminate the incidence of overfitting that may happen, if the number of epochs is either too small or too large, we configured the network to stop the training when the best generalization is reached. This was achieved by partitioning the HOG data into 70% training, 15% validation, and 15% testing subdataset. The HOG training set was used to train the network while the validation set was used to measure the error and the network training stops, when the error starts to increase for the validation dataset. Furthermore, we varied the number of neurons in the hidden layer from 10 in step of 10 to 100 and recorded the mean square errors (MSE) and accuracies (from the confusion matrix plot) for each trial. Table 7 shows the MSE and accuracies we obtained for the different networks with varying number of neurons in the hidden layer. For the ten different ANN configurations shown in Table 7, the 8th MLP-ANN gave the best accuracy of 87.6%, MSE of 0.0355, and the best validation performance of 0.0584 at 490 epochs. The confusion matrix and the best performance plot of the 8th MLP-ANN are as shown in Figures 6 and 7, respectively. A similar result of 87.2% accuracy was reported by the authors in [30] for a study on the use of SCG-BP for face expression recognition.


Lung cancer prediction using neural network ensemble with histogram of oriented gradient genomic features.

Adetiba E, Olugbara OO - ScientificWorldJournal (2015)

Performance plot for the best ANN in Table 7.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4352926&req=5

fig7: Performance plot for the best ANN in Table 7.
Mentions: In the first experimental setup, the number of iterations for training the network called epochs in ANN parlance was set to 500. In order to eliminate the incidence of overfitting that may happen, if the number of epochs is either too small or too large, we configured the network to stop the training when the best generalization is reached. This was achieved by partitioning the HOG data into 70% training, 15% validation, and 15% testing subdataset. The HOG training set was used to train the network while the validation set was used to measure the error and the network training stops, when the error starts to increase for the validation dataset. Furthermore, we varied the number of neurons in the hidden layer from 10 in step of 10 to 100 and recorded the mean square errors (MSE) and accuracies (from the confusion matrix plot) for each trial. Table 7 shows the MSE and accuracies we obtained for the different networks with varying number of neurons in the hidden layer. For the ten different ANN configurations shown in Table 7, the 8th MLP-ANN gave the best accuracy of 87.6%, MSE of 0.0355, and the best validation performance of 0.0584 at 490 epochs. The confusion matrix and the best performance plot of the 8th MLP-ANN are as shown in Figures 6 and 7, respectively. A similar result of 87.2% accuracy was reported by the authors in [30] for a study on the use of SCG-BP for face expression recognition.

Bottom Line: The Voss DNA encoding was used to map the nucleotide sequences of mutated and normal genomes to obtain the equivalent numerical genomic sequences for training the selected classifiers.The histogram of oriented gradient (HOG) and local binary pattern (LBP) state-of-the-art feature extraction schemes were applied to extract representative genomic features from the encoded sequences of nucleotides.The ANN ensemble and HOG best fit the training dataset of this study with an accuracy of 95.90% and mean square error of 0.0159.

View Article: PubMed Central - PubMed

Affiliation: ICT and Society Research Group, Durban University of Technology, P.O. Box 1334, Durban 4000, South Africa.

ABSTRACT
This paper reports an experimental comparison of artificial neural network (ANN) and support vector machine (SVM) ensembles and their "nonensemble" variants for lung cancer prediction. These machine learning classifiers were trained to predict lung cancer using samples of patient nucleotides with mutations in the epidermal growth factor receptor, Kirsten rat sarcoma viral oncogene, and tumor suppressor p53 genomes collected as biomarkers from the IGDB.NSCLC corpus. The Voss DNA encoding was used to map the nucleotide sequences of mutated and normal genomes to obtain the equivalent numerical genomic sequences for training the selected classifiers. The histogram of oriented gradient (HOG) and local binary pattern (LBP) state-of-the-art feature extraction schemes were applied to extract representative genomic features from the encoded sequences of nucleotides. The ANN ensemble and HOG best fit the training dataset of this study with an accuracy of 95.90% and mean square error of 0.0159. The result of the ANN ensemble and HOG genomic features is promising for automated screening and early detection of lung cancer. This will hopefully assist pathologists in administering targeted molecular therapy and offering counsel to early stage lung cancer patients and persons in at risk populations.

No MeSH data available.


Related in: MedlinePlus