Limits...
A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM

View Article: PubMed Central - PubMed

ABSTRACT

Class imbalance ubiquitously exists in real life, which has attracted much interest from various domains. Direct learning from imbalanced dataset may pose unsatisfying results overfocusing on the accuracy of identification and deriving a suboptimal model. Various methodologies have been developed in tackling this problem including sampling, cost-sensitive, and other hybrid ones. However, the samples near the decision boundary which contain more discriminative information should be valued and the skew of the boundary would be corrected by constructing synthetic samples. Inspired by the truth and sense of geometry, we designed a new synthetic minority oversampling technique to incorporate the borderline information. What is more, ensemble model always tends to capture more complicated and robust decision boundary in practice. Taking these factors into considerations, a novel ensemble method, called Bagging of Extrapolation Borderline-SMOTE SVM (BEBS), has been proposed in dealing with imbalanced data learning (IDL) problems. Experiments on open access datasets showed significant superior performance using our model and a persuasive and intuitive explanation behind the method was illustrated. As far as we know, this is the first model combining ensemble of SVMs with borderline information for solving such condition.

No MeSH data available.


The effect of Extrapolation Borderline-SMOTE. X with no frame is the sample belonging to the minority after Extrapolation Borderline-SMOTE. A synthetic sample labeling to the minority explores towards the actual boundary and it seldom violates the original decision boundary with the help of δ.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5304315&req=5

fig3: The effect of Extrapolation Borderline-SMOTE. X with no frame is the sample belonging to the minority after Extrapolation Borderline-SMOTE. A synthetic sample labeling to the minority explores towards the actual boundary and it seldom violates the original decision boundary with the help of δ.

Mentions: However, the interpolation between samples used in SMOTE or Borderline-SMOTE restricts the ability of exploring towards the actual boundary. As we would make use of ensemble SVMs, samples near decision boundary can be roughly characterized from support hyperplane learned by the first SVM. Taking this into consideration, a novel synthetic minority oversampling method is proposed as shown in Algorithm 3 and Figure 3 describes our ideology.


A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM
The effect of Extrapolation Borderline-SMOTE. X with no frame is the sample belonging to the minority after Extrapolation Borderline-SMOTE. A synthetic sample labeling to the minority explores towards the actual boundary and it seldom violates the original decision boundary with the help of δ.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5304315&req=5

fig3: The effect of Extrapolation Borderline-SMOTE. X with no frame is the sample belonging to the minority after Extrapolation Borderline-SMOTE. A synthetic sample labeling to the minority explores towards the actual boundary and it seldom violates the original decision boundary with the help of δ.
Mentions: However, the interpolation between samples used in SMOTE or Borderline-SMOTE restricts the ability of exploring towards the actual boundary. As we would make use of ensemble SVMs, samples near decision boundary can be roughly characterized from support hyperplane learned by the first SVM. Taking this into consideration, a novel synthetic minority oversampling method is proposed as shown in Algorithm 3 and Figure 3 describes our ideology.

View Article: PubMed Central - PubMed

ABSTRACT

Class imbalance ubiquitously exists in real life, which has attracted much interest from various domains. Direct learning from imbalanced dataset may pose unsatisfying results overfocusing on the accuracy of identification and deriving a suboptimal model. Various methodologies have been developed in tackling this problem including sampling, cost-sensitive, and other hybrid ones. However, the samples near the decision boundary which contain more discriminative information should be valued and the skew of the boundary would be corrected by constructing synthetic samples. Inspired by the truth and sense of geometry, we designed a new synthetic minority oversampling technique to incorporate the borderline information. What is more, ensemble model always tends to capture more complicated and robust decision boundary in practice. Taking these factors into considerations, a novel ensemble method, called Bagging of Extrapolation Borderline-SMOTE SVM (BEBS), has been proposed in dealing with imbalanced data learning (IDL) problems. Experiments on open access datasets showed significant superior performance using our model and a persuasive and intuitive explanation behind the method was illustrated. As far as we know, this is the first model combining ensemble of SVMs with borderline information for solving such condition.

No MeSH data available.