Support Vector Machine - Part 3 (Final) - Finding the Optimal Hyperplane

5 minute read



From the previous tutorial we computed the distance between the hyperplane and a data point, then doubled the value to get the margin. However, we can not say if that is the optimal separating hyperplane as we need to do more research on another possible hyperplanes.

In this tutorial we’ll see how to find the optimal hyperplane based on all the possible hyperplane positions between two different classes.

Finding the Optimal Hyperplane

Let’s take a look at the previous margin of the data train.

A hyperplane with its equation and the vector w that is perpendicular to the hyperplane

As we can see, it is not the optimal hyperplane as intuitively, we can get the bigger margin if we move the hyperplane to the right. We can move it to the right till it reaches a certain position as if the position exceeds the limit, it will have a new point of reference and surely the created margin will be reduced.

Therefore, we will use another approach in which we create two new hyperplanes separating the data and there is no any data point between them. Afterwards, we create a new hyperplane crossing the line representing the new margin in the middle. Here is the illustration.

Two hyperplanes separating the data train (X and Y) without any data point between them and the optimal hyperplane

From the above illustration, we can see that the data point A and B become the part of the hyperplane X and Y respectively. We also see that the hyperplane Z crosses the margin P in the middle. By applying this approach, there are no data points between the limiting hyperplanes (X and Y) which means it makes the margin of the data train is created from the distance between the hyperplane and any of two data points reside on the limiting hyperplanes. Based on this condition, this hyperplane is considered as the optimal separating hyperplane.

Two and Three Dimensional Vector in the Equation of a Hyperplane

We’ve known that the equation of a hyperplane can be represented in w.x = 0, where w = (-b, -a, 1) and x = (1, x, y). This representation is for three dimensional vector, yet there is another way to represent the equation of a hyperplane, namely w.x + b = 0. What is the difference between both equations?

We can see that we need to add a b value to the latter equation which means it is a hyperplane’s equation which is represented in a two dimensions vector space. We can prove it by the following procedure:

Proof of the latter equation

In this tutorial, we’ll use the hyperplane’s equation having only two vector’s elements.

The Constraints

Suppose we have a hyperplane with this equation: w.x + b = 0. We also have the limiting hyperplanes which are represented in these equations respectively: w.x + b = d and w.x + b = -d. These equations state that the distance between the limiting hyperplanes and the optimal hyperplane is equal. However, we can reduce the complexity of the equation by replacing the value d with one (it can be any value and I use one just for the simplicity).

The next step is we will assure that there is no any data point between the limiting hyperplanes and we can utilize their equations to create the following constraints:

The Constraints

From the constraints, we can check whether a data point satisfies the rule. Let’s take an example for the data point A. We can see that this data point resides exactly in one of the limiting hyperplane which means it satisfies the equation of w.x + b = 1 or in other words it’s just the equation of a line, namely y = ax - b + 1 where -b + 1 is a constant. The procedures to determine whether a data point follows the rule is still applied to another data point residing outside the limiting hyperplane. If the equation of w.x + b returns a value which is less that 1 and more than -1, than the data point does not satisfy the constraints and for this case we will not choose this kind of limiting hyperplanes to create the optimal hyperplane.

Furthermore, we can get a single constraint for the limiting hyperplane just by combining both constraints specified before. This single constraint will be used as the equation for the matter of optimization later.

The Single Constraint

The Margin

Let’s take a look at this illustration.

The limiting hyperplane and we wanna compute the distance between both of them

As a reminder, our goal is to find the optimal hyperplane in which it is the same as finding the biggest margin of the data train. If you recall again, we got the optimal hyperplane by creating the limiting hyperplanes where there are two data points becoming a part of them.

The limiting hyperplane, margin M, and the vector w

One of the approach to find the value of the margin is by converting the margin M to the vector representation and then we can compute the norm of that vector. To do the conversion, we utilize the vector w as the base vector and the idea is we get the vector M as the result of the multiplication of the vector w by a scalar. Here is the details of the process.

The limiting hyperplane, margin M, unit vector of w, and a new vector (margin's norm times w)

We’ve got the vector representation for the margin and now we’ll see how to compute the norm of the margin by applying the vector in the equation of a hyperplane.

Computing the Margin

The Optimization Problem

Finally, we’ve got the way to compute the margin and according to the formula, we can only change the norm of w to get the maximum margin.

As we can see, when we maximize the norm of w, the margin will become smaller. So, our task is to find the limiting hyperplanes that satisfies the constraint and gives us the minimum value for the norm of w.

To get the smallest norm, we can use the single constraint which then gives us this optimization problem:

The Optimization Problem


  • Images - Personal Documents