Limits...
Computer-aided lung nodule recognition by SVM classifier based on combination of random undersampling and SMOTE.

Sui Y, Wei Y, Zhao D - Comput Math Methods Med (2015)

Bottom Line: However, problems of unbalanced datasets often have detrimental effects on the performance of classification.Eight features including 2D and 3D features are extracted for training and classification.Experimental results show that for different sizes of training datasets our RU-SMOTE-SVM classifier gets the highest classification accuracy among the four kinds of classifiers, and the average classification accuracy is more than 92.94%.

View Article: PubMed Central - PubMed

Affiliation: Software College, Northeastern University, Shenyang 110004, China.

ABSTRACT
In lung cancer computer-aided detection/diagnosis (CAD) systems, classification of regions of interest (ROI) is often used to detect/diagnose lung nodule accurately. However, problems of unbalanced datasets often have detrimental effects on the performance of classification. In this paper, both minority and majority classes are resampled to increase the generalization ability. We propose a novel SVM classifier combined with random undersampling (RU) and SMOTE for lung nodule recognition. The combinations of the two resampling methods not only achieve a balanced training samples but also remove noise and duplicate information in the training sample and retain useful information to improve the effective data utilization, hence improving performance of SVM algorithm for pulmonary nodules classification under the unbalanced data. Eight features including 2D and 3D features are extracted for training and classification. Experimental results show that for different sizes of training datasets our RU-SMOTE-SVM classifier gets the highest classification accuracy among the four kinds of classifiers, and the average classification accuracy is more than 92.94%.

Show MeSH

Related in: MedlinePlus

Flow chart of algorithm of RU-SMOTE-SVM classification.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4419492&req=5

fig4: Flow chart of algorithm of RU-SMOTE-SVM classification.

Mentions: The main process of our algorithm is as follows. Firstly, calculate the difference between the number of majority class and minority class samples in the training data and determine the number of removing and increasing samples, respectively. Then, reduce the majority class samples and increase the minority class samples by RU and SMOTE algorithms according to the predetermined values, respectively. Set an original value of α, train SVM with the new training samples, and calculate the classification parameters. Finally, adjust α value to get the optimum classification performance to make the classifier have better generalization ability on the unbalanced data. The training process is to solve the objective function iteratively to obtain the optimal classification hyperplane, and the ultima α determines the discriminate function and the rule of classification. The flow chart of our algorithm is illustrated in Figure 4.


Computer-aided lung nodule recognition by SVM classifier based on combination of random undersampling and SMOTE.

Sui Y, Wei Y, Zhao D - Comput Math Methods Med (2015)

Flow chart of algorithm of RU-SMOTE-SVM classification.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4419492&req=5

fig4: Flow chart of algorithm of RU-SMOTE-SVM classification.
Mentions: The main process of our algorithm is as follows. Firstly, calculate the difference between the number of majority class and minority class samples in the training data and determine the number of removing and increasing samples, respectively. Then, reduce the majority class samples and increase the minority class samples by RU and SMOTE algorithms according to the predetermined values, respectively. Set an original value of α, train SVM with the new training samples, and calculate the classification parameters. Finally, adjust α value to get the optimum classification performance to make the classifier have better generalization ability on the unbalanced data. The training process is to solve the objective function iteratively to obtain the optimal classification hyperplane, and the ultima α determines the discriminate function and the rule of classification. The flow chart of our algorithm is illustrated in Figure 4.

Bottom Line: However, problems of unbalanced datasets often have detrimental effects on the performance of classification.Eight features including 2D and 3D features are extracted for training and classification.Experimental results show that for different sizes of training datasets our RU-SMOTE-SVM classifier gets the highest classification accuracy among the four kinds of classifiers, and the average classification accuracy is more than 92.94%.

View Article: PubMed Central - PubMed

Affiliation: Software College, Northeastern University, Shenyang 110004, China.

ABSTRACT
In lung cancer computer-aided detection/diagnosis (CAD) systems, classification of regions of interest (ROI) is often used to detect/diagnose lung nodule accurately. However, problems of unbalanced datasets often have detrimental effects on the performance of classification. In this paper, both minority and majority classes are resampled to increase the generalization ability. We propose a novel SVM classifier combined with random undersampling (RU) and SMOTE for lung nodule recognition. The combinations of the two resampling methods not only achieve a balanced training samples but also remove noise and duplicate information in the training sample and retain useful information to improve the effective data utilization, hence improving performance of SVM algorithm for pulmonary nodules classification under the unbalanced data. Eight features including 2D and 3D features are extracted for training and classification. Experimental results show that for different sizes of training datasets our RU-SMOTE-SVM classifier gets the highest classification accuracy among the four kinds of classifiers, and the average classification accuracy is more than 92.94%.

Show MeSH
Related in: MedlinePlus