# [ Back to Basics - 02 ] Borderline-SMOTE01

** Published:**

**Relevant Paper:** Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning

In this post I’m going to write about one of the variants of SMOTE algorithm - that is borderline-SMOTE01. You can find my previous post related to SMOTE here.

Basically, borderline-SMOTE01 was developed to improve the performance of SMOTE. This means that borderline-SMOTE01’s core algorithm is derived from SMOTE’s algorithm. The only difference lies on the source of the creation of synthetic samples. In SMOTE, synthetic samples are created from every original minority samples. Meanwhile, in borderline-SMOTE01, synthetic samples are created from minority samples resided on and nearby the borderline (we’ll call them with borderline samples).

The primary rationale behind generating new samples from borderline samples is borderline samples are more crucial in the process of classification than the ones far from the borderline. In other words, borderline samples are more inclined to be misclassified. Therefore, it’s important to strengthen the borderline samples.

Alright, let’s delve into the algorithm!

**Step 1.** For each minority sample (let’s call it as **min_s**), calculate its **m** nearest neighbors (including the majority samples). The number of majority samples in its m nearest neighbors is denoted by **maj_num** in which **0** <= **maj_num** <= **m**

**Step 2.** In this step we’ll specify whether **min_s** is a borderline sample using the following conditional rules:

**I. **If **maj_num = m**, then all the nearest neighbors of **min_s** are majority samples. In this case, **min_s** is considered as noise and will not be processed in the following steps.

**II. **If **m** / 2 <= **maj_num** <= **m**, then the majority samples occupy more than 50% of the nearest neighbors of **min_s** (the number of majority samples is larger than the number of minority samples). In this case, **min_s** is proned to be misclassified and therefore considered to be included in the list of borderline samples (**borderline_samples**).

**III. **If 0 <= **maj_num** < **m** / 2, then **min_s** is not considered to be the borderline sample. In this case, min_s will not be processed further.

**Step 3.** For each minority sample (**b_sample**) in **borderline_samples**, do the followings:

a) Find its **k** nearest neighbors (only includes the minority samples)

b) Randomly select **x** nearest neighbors from its **k** nearest neighbors (**1** <= **x** <= **k**)

c) For each sample in the **x** nearest neighbors (**x_nearest**):

c.1) Calculate the difference between **b_sample** and **x_nearest** (**diff_b_sample_x**). The output is in the form of feature vector

c.2) Multiple **diff_b_sample_x** with a random number between 0 and 1. Suppose the output of this multiplication step is **multi_diff_randnum**

c.3) Add **multi_diff_randum** to **b_sample**

I hope this post might give you a basic understanding of borderline-SMOTE01 algorithm. Feel free to comment if you found any irrelevant or missing information.