Support Vector Machines

Introduction

Mastering Machine Learning is not that difficult, you just need a lot of time and effort to do so. Many young learners often start with classification or regression as it is the easiest and then hesitate to move forward into the more complex forms of ML. Let us take a simple analogy here, let us assume that we have arsenal filled with weapons like swords, daggers, axes, knives etc. Here, regression acts as a sword which can efficiently slice and dice data, but it cannot deal with data that is complex and small in size. Classification acts as an axe on wood, it chops up the data into categories but can’t do much more than that. Support Vector Machines are like the dagger, it allows the user to perform quick moves on targets at a close range in this case complex and smaller sized datasets. It can perform these tasks with much more efficiency than regression would.

What is a Support Vector Machine

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges.  However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiate the two classes very well. Here it acts more like classification but can work with more complex forms of data too. As the name suggests a SVM works by manipulating and using values from Vectors.  We have to remember that anything that happens inside an SVM works in vector space. We also must know that an SVM is what is called a binary classifier, i.e. it separates the data into only two groups.

Decision Making and Learning process of a SVM

Decision making is fairly simple in a SVM. Once the best separating Hyper-plane( line ) is found any unknown data point will be classified on the basis of which side of the hyper-plane does it fall on. The training process consists of different methods of finding the best separating hyper-plane for your data. In order to not complicate the simple intuition I will not be discussing on how does the SVM find the best separating hyper-plane.  There are a lot of sources online discussing how the SVM algorithm learns you can find a better description there. SVM sort of acts like a classifier but it can also work with more complex forms of data.

We have learnt the basic intuition of a SVM. Now we will try to implement it in Python 3. We will be working on the Wisconsin Breast Cancer Prediction problem. We need a few things before we start, firstly the dataset, you can download the dataset from here. After visiting the link right-click on the file named ‘breast-cancer-Wisconsin.data’ and click save link as. Then save it to your working directory and open up a text editor of your choice. Now we are ready to roll!

Code

Python Code to Implement an SVM with the Breast Cancer Classification Problem
This Python 3 Code creates an SVM and tries to decide if the recorded features corresponds to a Benign or Malignant form of Breast Cancer

You may have noticed this line df.replace(‘?’, -999999, inplace=True)  basically what this piece of code does is, it takes all the missing attributes and fills it with a value that is going to be considered as an outlier. You may have also noticed df.drop([“ID”], 1, inplace=True) basically this line is removing a column in the dataset known as ID which corresponds to the ID of the patients, we have to remove it as it is useless. Let us think for a while, does the patient ID have any connection with the person having a specific type of Breast cancer? well, obviously no! So we have removed it. Keeping it will reduce the efficiency of our SVM by as much as 40 percent! This piece of code is very simple to understand and shows a working model of an SVM. We have also specified an array which contains a few measurements that we will use to test the model. I would also recommend you to see this page. Which describes all the features used in the model and can help you to understand which data are considered as the features and which are considered as the output. Now let us see the prediction that our model made for the test measurements that we have supplied to it.

Predictions made by the SVM

The predictions made by the model
This is the prediction that the model has made for us based on the sample measurements we provided. Here the model predicts that the patient with these specific measurements has a 98% chance of having a Malignant form of Breast Cancer

However we must note that, the dataset we used is trained only on Cancer patients. This model is useful for determining the type of Breast Cancer and not whether the patient actually has cancer. This limitation is imposed by our dataset but of course, you can get creative and make something awesome out of this!

Conclusion and more information

To sum it all up, a Support Vector Machine is a machine learning algorithm that can separate data with high complexity relatively easily! The time required to build a SVM is very less and it also works with a good accuracy if used correctly! Now the question arises, When to use a SVM over Classification? The answer to this is, When our data has a very high dimensionality and is very complex we are better off using a SVM over a Classification algorithm as a SVM is much more suited to perform these kinds of tasks and does it very efficiently. Here are a few Pros and Cons of a Support Vector Machine:

  • Pros:
    • It works really well with clear margin of separation
    • It is effective in high dimensional spaces.
    • It is effective in cases where number of dimensions is greater than the number of samples.
    • It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
  • Cons:
    • It doesn’t perform well, when we have large data set because the required training time is higher
    • It also doesn’t perform very well, when the data set has more noise i.e. target classes are overlapping
    • SVM doesn’t directly provide probability estimates, these are calculated using an expensive five-fold cross-validation. It is related SVC method of Python scikit-learn library.

There are a few parameters that can also be provided into the svm.SVC() command in Python. Firstly, we have a parameter called C which stands for the Penalty parameter of the SVM, it helps to decide whether we want a smooth decision boundary or correctly classified points. Another parameter that we can specify is called gamma, also known as the kernel coefficient, gamma helps in exactly fitting the model into the data. A high gamma can cause a problem known as Overfitting where a model fits the training data so accurately that any new point is wrongly classified.

So in my opinion this was a small brief about a SVM. Hope you guys liked this intuitive explanation on the fundamentals of a Support Vector Machine along with a practical tutorial. Thank you so much for spending you valuable time here. In the next blog post I will be discussing about another commonly used algorithm called KNearestNeighbors.
Until then, have a nice day and enjoy Deep Learning. 🙂

-MANAS HEJMADI

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s