Introduction to Deep Learning

We have completed all the main tasks associated with Machine Learning. Now we can thoroughly focus on a popular and widely used subset of ML, Deep Learning. Understanding Deep Learning thoroughly is a very important aspect for getting a job relating to this field. Deep Learning models have by far outnumbered all the successful attempts made by other models. In short, Deep Learning is the most powerful tool that is accessible in the ML Space. Firstly we need a rough introduction and the right mindset to advance into this field

Basic Intuition

most of the DL models are inspired by the human brain. They have the concepts of neurons which can transmit information, learn, make new connections etc. DL is based on the theory that after repeated training the neurons stimulate connections between them that fits the data the most, here it uses weights and biases. We will discuss about this in the future. A single neuron is called a Percepteron. Deep Learning models take more time to train but often have lower error rates, are flexible and is proven to be better than other models. There are many kinds of DL Models such as:

  • Artificial Neural Network [ANN]
  • Convolutional Neural Network [CNN]
  • Recurrent Neural Network [RNN] using LSTM Cells
  • Self Organizing Maps [SOM]
  • Deep Restricted Boltzmann Machines [RBM]
  • AutoEncoders

There are many more models, but these models are the most popular ones. We will be learning about these models in the future. I would like to say that if you want to enter the ML Field Deep Learning is a must. At the end of the day, DL is just another type of model. Once you learn Deep Learning I ensure you that you will feel confident in your skill level. Learning it is very easy as there are a lot of videos online on YouTube for example, or courses on Udemy etc. Deep Learning is a bliss. I hope you would enjoy Deep Learning. But there are certain things that you need to know to develop a proper mindset to learn the concepts of DL.

Deep Learning mindset

Deep Learning is not as intuitive and easy as the core concepts of ML. I will be explaining about them but if you want a full in-depth explanation along with mathematics and other things you have to refer online. Another thing is that, the code used to write a DL Model is not so straightforward or easy. It requires some intermediate skill in Python. You have to plan out each decision that you take. At the end of the day, Deep Learning will help you a lot in the future. You should also be ready to learn new and difficult concepts. At first, it seems difficult but after seeing the Python Code and understanding the theory a couple of times it becomes easy!

So All the best on your new journey! We will be learning the concepts at a very intuitive and down to earth level. In the next blog we will be discussing on the first form of Supervised DL, the Artificial Neural Network [ANN]. So until then have a nice day and enjoy Deep Learning. 🙂

-MANAS HEJMADI

Advertisements

Other forms of Classic ML

We have discussed the main types of ML Algorithms until now but there is a lot more to learn in this vast field. We have discussed only the things that were commonly used and the most popular. Now we will be shedding some light on the other forms of ML. I do not have so much knowledge regarding these types of algorithms, but I will try to make it as intuitive as possible. As I do not know much about this I will not be able to show you a practical demo on python. I feel that it is good to have some extra knowledge on these topics to ensure that you are going to grasp it in the future. We are mainly going to shed some light on:

  • Ensemble learners
  • Naive Bayes
  • Random Forests

These explanations will not have great detail, this just serves as a building block for the future.

Ensemble Learners

Ensemble Learners has a very simple concept we have a certain problem and a dataset, we split the dataset into training and testing samples. Once that is done, we proceed onto fitting our data into different types of algorithms such as a kNN, K-Means, KNearestNeighbors, Linear Regression, Decision Tree etc. Then the mean i.e the average of the outputs are taken and then combined to form a output.

The advantages of Ensemble learners are:

  • Lower error rates
  • Higher accuracy
  • Less Overfitting
  • Works very well

So this was basically a short description on Ensemble learners.

Naive Bayes

The Naive Bayes algorithm is an algorithm for classification problem. It is primarily used for text classification, which involves high-dimensional training data. A few examples of the applications of this algorithm are Spam Filtering, Sentiment Analysis and Classifying news articles. This algorithm is called ‘Naive’ because it makes the assumption that the occurrence of a certain feature is independent of the other features. This model consists of mathematics, statistics and mainly probability.

This was another short description on the Naive Bayes algorithm.

Random Forests

In the simplest form, Random Forests is just what it sounds like! They are a group of trees, here Decision Trees. As the number of trees increase the accuracy increases too. It is like daisy-chaining multiple Decision Trees together to solve a simple problem. This was the simplest explanation of a random forest that I could present to you.

So now we have done a little bit of effort in getting the absolute basics of some other forms of ML. Now we can be proud to say that we have completed most of the Classical Machine Learning tasks. We can advance into deep learning relatively soon! Until then, have a nice day and Enjoy Deep Learning! :)]

-MANAS HEJMADI

Unsupervised Machine Learning – Clustering using K-Means

Familiarization with Unsupervised Learning

Until now we have dealt with supervised learning only! This is the first time we will be trying out a form of Unsupervised Learning. Just as a refresher, Unsupervised Learning is a type of learning where we do not help the system during training by supplying it with labels, instead the machine itself figures out what to do with it. Unsupervised Learning may seem complex at first but I assure you that under the hood, it is easy. You have to just prepare your mindset while dealing with Unsupervised Learning. It is similar to reading a textbook without your teacher’s help, you can still kind of understand what is written, but if a teacher was present it would make a big difference. Whenever you are learning complex topics, just close your eyes and create a simple analogy. This will help you a lot when we come to the difficult topics such as Restricted Boltzmann Machines & AutoEncoders.

Introduction to Clustering

Clustering is a form of Unsupervised Machine Learning where after repeated training the model figures out how to group the data into individual clusters. Similar to a kNN the K in K-Means represents the number of clusters we want. if we specify the value of K as 2, then the model will split the data among 2 individual clusters.  Again to simplify, the goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. The K-Means Clustering produces 2 outputs, The Centroids, which are the center of the cluster & The Labels, once the n Clusters are formed, here n stands for the value of k. They are assigned a label. This type of model allows us to save time by not needing to sort, arrange and manipulate the data and feed it into the model. Instead it allows us to input the data and get back useful groups decided by the value of k. This is a relatively simple algorithm, we will discuss more about its learning strategy later. Actually the topics of Classic Machine Learning are relatively easier than those of Deep Learning. Over many iterations the machine understands the groups of data very well and there occurs one point where the algorithm ends a.k.a convergence of the algorithm. That point of time is when we have the output of our clustering, i.e The Centroids & The Labels for the clusters, along with the actual clusters.

Learning Strategy of K-Means Clustering

Every ML Algorithm needs some sort of training to work properly, to understand everything that has been fed to it, to create relationships with the data etc. let us see how the K-Means Algorithm learns. There are 4 main steps to train a K-Means algorithm, they are:

  • Initialization, n Centroids are generated at a random location, here n refers to the value of K that we specify.
  • Assignment, n Clusters are created by associating each data observation with the closest Centroid. This creates k number of clusters.
  • Update, The Center of mass ( I am referring to Center of Mass as the center of the entire cluster ) becomes the new Centroid Location.
  • Iterate, This process then continues until the model reaches convergence, i.e when the clusters are correctly formed.

So let us briefly discuss about its learning strategy, Firstly it creates a Centroid at a random location. Then the distance(typically Euclidean distance) is measured from that centroid to the other datapoints. Once that step is done the model creates a cluster consisting of datapoints that are closest to the centroid. Then the Update phase starts, here the new center of mass is calculated for each cluster and then that center of mass becomes the new cluster. Now we reach the iterate phase where the model checks if it has reached convergence, if it hasn’t reached convergence then the model starts again with the Assignment stage and continues to calibrate the centroid from its previous location to a more stable location. This happens to all the n Clusters, again where n is the value of k decided by the developer. Here is the learning process visualized.

RAW Data
This is the RAW Data that is fed into the model
Initialization of Centroids at random location
Firstly, the Centroids are randomly generated
The first Cluster
Then the Centroids calculate the distance to other datapoints and groups the closes ones to form a cluster
This picture shows us the centroids updating the position
Once the Cluster is formed, the center of mass is calculated and the center of the cluster is the new position of the Centroid
This picture is showing the process of recreating the clusters
The previous cluster is now discarded and the distance to the other points are measured once again
This is the convergence
A new cluster is formed from the new centroid, this entire process continues until convergence, in this case the model has reached convergence and now gives us the new clusters A,B & C along with the respective Centroids

Now that you have visually understood how the K-Means Clustering works, we can perform a simple demo in Python that shows the proof of concept. We will not be using an actual dataset but instead we will make our own simple data that will just show you how the K-Means Clustering Algorithm performs.

Code

The Python code requied to execute a simple demo using K-Means
This is the code to execute a simple clustering demo using the K-Means Clustering Algorithm

here we are creating the sample training data in the variable X. Now let us see the output with visualization created using the Matplotlib library.

Output

This picture shows the visualization of the RAW Data
These are all the RAW Datapoints that are fed into the Algorithm.

We specified a K Value of 3 so we are expecting 3 clusters. Let us see the final results

Shows the output with 3 clusters
We can clearly see 3 clusters have been formed, the ‘X’ marks are the Centroids, we can clearly see that they are in the center of the clusters.

Whoa! That is really great. We have 3 correct clusters, just as we expected!

So here we can see that our model has made 3 clusters just after a few seconds of training. However we were not trying to imitate the demo that I gave you to understand the learning strategy of K-Means. This is a completely separate example.  This was the demo that I wanted to show you.

Conclusion and more information

Clustering is a truly unique form of Machine Learning. We have learnt all the things required to understand the basics of Clustering using the K-Means Algorithm. How do we find the correct value of k? Well we can use just trial and error, but it is not worth your time. So there is a rule of thumb called The Elbow method. Let me Show it

picture of the elbow method
This is a picture representing the elbow method

In this elbow method we try different values of K and observe the degree of variance. we can see that in the first few K values we see big jumps in variance, after a K value of 4 we see limited increase, thus 4 is the best K value to use in order to get good results as well as minimize the load on the computer.

So guys in my opinion this is all that you need to know in order to understand the concept of Clustering using the K-Means Algorithm. There is a good news for all of you following this blog series. We have officially ended the Classical Machine Learning series. In the next blog we will do a short discussion on other forms of Classic ML such as Naive Bayes and Ensemble learning. After that we will dive into the beautiful bliss that is Deep Learning. Once you learn the basics of Deep Learning you will be confident in doing such tasks.

Thank you following me and reading this blog, I am pleased to share my knowledge. Have a nice day and enjoy Deep Learning. 🙂

-MANAS HEJMADI

 

Decision Tree Classifier

Introduction

This type of learning typically involves the use of Decision Trees. It is similar to a flowchart. models where the target variable can take a discreet set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. It has certain conditions that needs to be followed in order to advance into the leaf nodes. The first node of the tree is known as its parent node or it’s root node. All the leaf nodes branch out from this parent node. In short, Decision Trees are a type of Supervised Machine Learning Algorithm where the data is continuously split according to certain parameters. The tree can be explained by two entities, namely decision nodes and leaves. The leaves are the decisions or the final outcomes. And the decision nodes are where the data is split. As mentioned before, this model sort of works like a flowchart. From now on I will be referring to a Decision Tree Classifier as DTC. According to the input a decision is made ( this corresponds to the decision nodes on a DTC ) then based on the output of that decision the information travels to the next decision node where another condition is solved.

pts.png

This process continues until we reach the last node of the DTC called the leaf node, which we will refer to as the output node. Based on the features that the initial data has the model has now successfully classified the observation into a category. A DTC is a relatively simple algorithm in the ML Pipeline.

To understand better we can see a picture here of a DTC trained to the Iris dataset which we discussed in the Classification part of this blog series

dtx

Over a period of time with repeated training the DTC now knows all the features that it has to use along with the decision boundaries that it needs. Now any new observation goes through the entire tree answering the questions and conditions imposed by the decision nodes. Once all the questions and conditions are satisfied the output lands on a certain output node. At this stage, the new observation has been classified into its respective category. This model is relatively easy to learn and is considered one of the easiest in the entire ML Pipeline.

Further information on Decision Trees

Advantages:

  1. Easy to Understand: Decision tree output is very easy to understand even for people from non-analytical & ML background. It does not require any statistical knowledge to read and interpret them. Its graphical representation is very intuitive and users can easily relate their hypothesis.
  2. Useful in Data exploration: Decision tree is one of the fastest way to identify most significant variables and relation between two or more variables. With the help of decision trees, we can create new variables / features that has better power to predict target variable. It can also be used in data exploration stage. For example, we are working on a problem where we have information available in hundreds of variables, in that case a decision tree will help to identify most significant variable.
  3. Less data cleaning required: It requires less data cleaning compared to some other modeling techniques. It is not influenced by outliers and missing values to a fair degree.
  4. Data type is not a constraint: It can handle both numerical and categorical variables.
  5. Non Parametric Method: Decision tree is considered to be a non-parametric method. This means that decision trees have no assumptions about the space distribution and the classifier structure.

Disadvantages:

 

  • Over fitting: Over fitting is one of the most practical difficulty for decision tree models. This problem gets solved by setting constraints on model parameters and pruning.
  • Not fit for continuous variables: While working with continuous numerical variables, decision tree looses information when it categorizes variables in different categories.

 

Terminology:

 

  • Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets.
  • Splitting: It is a process of dividing a node into two or more sub-nodes.
  • Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node.
  • Leaf/ Terminal/ Output Node: Nodes do not split is called Leaf or Terminal node.
  • Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say opposite process of splitting.
  • Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree.
  • Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes where as sub-nodes are the child of parent node.

Are tree based model better than Linear models?

Well, the answer to that question is that it depends on the type of problem we want to solve. here are a few key factors that will help you in choosing the right model:

  1. If the relationship between dependent & independent variable is well approximated by a linear model, linear regression will outperform tree based model.
  2. If there is a high non-linearity & complex relationship between dependent & independent variables, a tree model will outperform a classical regression method
  3. If you need to build a model which is easy to explain to people, a decision tree model will always do better than a linear model. Decision tree models are even simpler to interpret than linear regression!

Now that we have understood the rough intuition of a Decision Tree we will try to solve a problem in Python. For this we will make our very own problem. Imagine that we have an unknown fruit X in a container and we are told the weight of the fruit as well as its texture( smooth or bumpy ). We are also given one more clue that the type of fruit in the container can only be either an apple or an orange. How will you model this? Let’s find out. Let us label our outputs in the form of Fruit{ 0:”Apple”, 1:”Orange”} and the input in the form of Weight, Texture{0: Bumpy, 1: Smooth}. We will use a DTC and get our results. Here we will be simulating what a ML Researcher does, creating our own data through observations. We have noted the weights & texture of 6 fruits( 3 apples & 3 oranges each ) in the variable X and named the fruits accordingly{ 0:”Apple”, 1:”Orange”} in variable y.

Code

Code for our DTC
This is the Python code that we need to write in order to create a DTC and train it on our custom dataset

Output

The output given by our DTC that we previously coded

nteredI hope you enjoyed this simple intuition, explanation and tutorial on the DecisionTreeClassifier. In the next blog we will discuss about an unsupervised form of Machine Learning, Clustering using the kMeans Algorithm. Until then have a nice day and enjoy Machine Learning! 🙂

-MANAS HEJMADI

 

 

The KNearestNeighbors Algorithm

Introduction

After learning Support Vector Machines we are going to advance into the territory of kNN a.k.a KNearestNeighbors. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). kNN has been used in statistical estimation and pattern recognition. kNN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The kNN algorithm is among the simplest of all machine learning algorithms.

Classification & Learning process

kNN is a classification algorithm. This algorithm works by finding the euclidean distance between points of the nearest classes and the new point. The number of points that it needs to find the distance between is denoted by K. if K = 3 then then the model has to find 3 closest points and the majority of the closest distance is taken and the class is identified. For example we have the value of K as 3, two classes ‘A’ & ‘B’ along with a new point named ‘C’. How do we predict which class does ‘C’ belong to? The answer here is using a kNN Classifier. Let us see a diagrammatic representation

Diagram of a kNN classifiying a new point on basis of euclidean distance.
This diagram represents the way a kNN classifies a new point on the basis of the euclidean distance to other points.

Here we can see that because our K value is 3, we are checking the distance to other points only 3 times. We can clearly make out that the new yellow point ‘C’ is closer to the class ‘A’ than the class ‘B’ simply because the total euclidean distance from the points in class ‘A’ is lesser than the ones in class ‘B’. Thus the kNN classifies this new point to be of type ‘A’. The more the value of ‘K’ more the accuracy of the classification. However there occurs a point of diminishing returns where a high value of K doesn’t give the best results but instead taxes the computer. We Should always choose a generous amount for K. The accuracy of this model is counted by dividing the number of closest points and the total K value. In this case the accuracy is 2(closest points) / 3(value of K) so 2/3 = 0.66% so we have an accuracy of 66%. Generally we would not want to have an even K Value.  Although a accuracy of 66% is not that great we have a good enough classification here. So for the purpose of this example it is fine. However to increase the accuracy, we can increase the value of K. This is a simple and intuitive understanding of how a kNN Classifies a new point.

Things to keep in mind

  • The dataset that we are going to use needs to have as little noise as possible, here ‘noise’ refers to the overlapping of the target classes.
  • We need to have labelled features in our dataset as a kNN is a supervised type of ML.
  • Our dataset needs to contain only relevant features. If there are some features that do not contribute anything to the final answer like “patient name” etc. we have to drop it as it increases the complexity and decreases the overall accuracy.
  • Avoid using large K values for a kNN working on a large dataset as it takes a lot of time to train because it needs to find the distance to other points n number of times, where n stands for the value of K.
  • Some popular use cases of a kNN include:
    • Recommendation Systems
    • Credit Risk Analysis
    • Predictive Trip Planning
    • Stock price recommendation
  • a kNN is a supervised Classifier

Now that you have a rough idea about what is a kNN algorithm, let us solve a simple problem in Python. We are going to work on a dataset which contains all the records of Heart Disease Patients. Our task is to classify which category does the new patient belong to! Firstly we need to download our dataset, to do that go to this link. This is a link to my Dropbox where I have added a file named ‘Heart.data’ click on download and save it to your working directory.

Feature Description

This picture consists of definitions of all the attributes that we specify to the kNN
This is the meaning of all the features that we are going to present to the kNN.

Code

Picture used to represent the code used to create a kNN which classifies Heart Disease Patients
This piece of Python code creates a kNN model to classify heart disease patients based on the specific features that have been inputted in the test array

You may ask, where have you specified the value of K? The answer to that is on line 17 in the program. clf = neighbors.KNeighborsClassifier(7) This line creates a kNN Classifier with a K Value of 7 and stores it into a reference variable called ‘clf’.  The KNeighborsClassifier is a part of the neighbors module of scikit-learn in Python 3.  You may have noticed that in line 9, 10 and 11 we are removing certain columns from our dataset! This is because that contains data which is useless for our predictions as they just increase the complexity of the dataset. We are predicting on a sample array of features that we have created. Let us see the output of this model based on the sample features that we provided. For further information on the dataset you can visit here.

Predictions made by the kNN

Predictions made by our kNN
Our model is predicting that a person with the given features has a lesser than 50% chance of getting Heart Disease. It has an accuracy of almost 63%

It is sad to see that we have an accuracy of only 63 percent! However this is not completely our fault. The model itself predicts whether the person has a 50% chance of diameter narrowing. It also consisted of a lot of missing attributes and bad data. We have replaced all the missing attributes with an outlier(-99999) and thus most of our data-points are treated as outliers affecting our overall accuracy. But none the less, we have successfully created a kNN Classifier that can classify Heart Disease Patients!

Conclusion

To sum it all up, The KNearestNeighbors algorithm is useful in Classification of smaller chunks of data. It is a method that finds the euclidean distance between n points, where n is the value of K that the user specifies. This approach is useful in many cases. a kNN is a relatively easy type of algorithm in the ML Pipeline. We also created and used a kNN to classify Heart Disease Patients.

Hope you learnt something new today and enjoyed today’s session. In the next blog post we will be discussing about a very simple topic in the ML Pipeline, Decision Trees. Until then Have a nice day and enjoy Machine Learning! 🙂

-MANAS HEJMADI

 

 

 

Support Vector Machines

Introduction

Mastering Machine Learning is not that difficult, you just need a lot of time and effort to do so. Many young learners often start with classification or regression as it is the easiest and then hesitate to move forward into the more complex forms of ML. Let us take a simple analogy here, let us assume that we have arsenal filled with weapons like swords, daggers, axes, knives etc. Here, regression acts as a sword which can efficiently slice and dice data, but it cannot deal with data that is complex and small in size. Classification acts as an axe on wood, it chops up the data into categories but can’t do much more than that. Support Vector Machines are like the dagger, it allows the user to perform quick moves on targets at a close range in this case complex and smaller sized datasets. It can perform these tasks with much more efficiency than regression would.

What is a Support Vector Machine

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges.  However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiate the two classes very well. Here it acts more like classification but can work with more complex forms of data too. As the name suggests a SVM works by manipulating and using values from Vectors.  We have to remember that anything that happens inside an SVM works in vector space. We also must know that an SVM is what is called a binary classifier, i.e. it separates the data into only two groups.

Decision Making and Learning process of a SVM

Decision making is fairly simple in a SVM. Once the best separating Hyper-plane( line ) is found any unknown data point will be classified on the basis of which side of the hyper-plane does it fall on. The training process consists of different methods of finding the best separating hyper-plane for your data. In order to not complicate the simple intuition I will not be discussing on how does the SVM find the best separating hyper-plane.  There are a lot of sources online discussing how the SVM algorithm learns you can find a better description there. SVM sort of acts like a classifier but it can also work with more complex forms of data.

We have learnt the basic intuition of a SVM. Now we will try to implement it in Python 3. We will be working on the Wisconsin Breast Cancer Prediction problem. We need a few things before we start, firstly the dataset, you can download the dataset from here. After visiting the link right-click on the file named ‘breast-cancer-Wisconsin.data’ and click save link as. Then save it to your working directory and open up a text editor of your choice. Now we are ready to roll!

Code

Python Code to Implement an SVM with the Breast Cancer Classification Problem
This Python 3 Code creates an SVM and tries to decide if the recorded features corresponds to a Benign or Malignant form of Breast Cancer

You may have noticed this line df.replace(‘?’, -999999, inplace=True)  basically what this piece of code does is, it takes all the missing attributes and fills it with a value that is going to be considered as an outlier. You may have also noticed df.drop([“ID”], 1, inplace=True) basically this line is removing a column in the dataset known as ID which corresponds to the ID of the patients, we have to remove it as it is useless. Let us think for a while, does the patient ID have any connection with the person having a specific type of Breast cancer? well, obviously no! So we have removed it. Keeping it will reduce the efficiency of our SVM by as much as 40 percent! This piece of code is very simple to understand and shows a working model of an SVM. We have also specified an array which contains a few measurements that we will use to test the model. I would also recommend you to see this page. Which describes all the features used in the model and can help you to understand which data are considered as the features and which are considered as the output. Now let us see the prediction that our model made for the test measurements that we have supplied to it.

Predictions made by the SVM

The predictions made by the model
This is the prediction that the model has made for us based on the sample measurements we provided. Here the model predicts that the patient with these specific measurements has a 98% chance of having a Malignant form of Breast Cancer

However we must note that, the dataset we used is trained only on Cancer patients. This model is useful for determining the type of Breast Cancer and not whether the patient actually has cancer. This limitation is imposed by our dataset but of course, you can get creative and make something awesome out of this!

Conclusion and more information

To sum it all up, a Support Vector Machine is a machine learning algorithm that can separate data with high complexity relatively easily! The time required to build a SVM is very less and it also works with a good accuracy if used correctly! Now the question arises, When to use a SVM over Classification? The answer to this is, When our data has a very high dimensionality and is very complex we are better off using a SVM over a Classification algorithm as a SVM is much more suited to perform these kinds of tasks and does it very efficiently. Here are a few Pros and Cons of a Support Vector Machine:

  • Pros:
    • It works really well with clear margin of separation
    • It is effective in high dimensional spaces.
    • It is effective in cases where number of dimensions is greater than the number of samples.
    • It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
  • Cons:
    • It doesn’t perform well, when we have large data set because the required training time is higher
    • It also doesn’t perform very well, when the data set has more noise i.e. target classes are overlapping
    • SVM doesn’t directly provide probability estimates, these are calculated using an expensive five-fold cross-validation. It is related SVC method of Python scikit-learn library.

There are a few parameters that can also be provided into the svm.SVC() command in Python. Firstly, we have a parameter called C which stands for the Penalty parameter of the SVM, it helps to decide whether we want a smooth decision boundary or correctly classified points. Another parameter that we can specify is called gamma, also known as the kernel coefficient, gamma helps in exactly fitting the model into the data. A high gamma can cause a problem known as Overfitting where a model fits the training data so accurately that any new point is wrongly classified.

So in my opinion this was a small brief about a SVM. Hope you guys liked this intuitive explanation on the fundamentals of a Support Vector Machine along with a practical tutorial. Thank you so much for spending you valuable time here. In the next blog post I will be discussing about another commonly used algorithm called KNearestNeighbors.
Until then, have a nice day and enjoy Deep Learning. 🙂

-MANAS HEJMADI

Linear Regression – The next step

In the ML Pipeline, after Classification the next step is Regression. We are going to specifically discuss about a subset of Regression called Linear regression. Linear regression is an approach to create a model to understand the relationship between input and output numerical variables. It helps one to understand how the typical value of the dependent variable changes when any one of the independent variables is varied. It is used when we have an independent variable ( Features that are inputted to the model, example: number of rooms, location of house, year the house was built etc.)  and another variable that depends on the initial variable ( example – Price  of the house ). As the values of the independent variable( features ) change the values of the dependent variable changes. Using regression we can find the value of the dependent variable according to the change in value of the independent variable. The way regression works & its error finding will be discussed in the conclusion

let us take the example of Predicting the cost of a new house we cannot use classification here as it simply does not make sense. We are predicting the cost & not the category the house belongs to. Firstly we need to input the features, i.e all the factors which affect the final price of the house.

First, we need a dataset of house pricing, an easily available dataset is one that we can load from the python module sklearn. We need to use this python code to load the dataset:

from sklearn.datasets import load_boston

After this step we can proceed! but first we must know all the features that we are inputting into the regression algorithm.

Feature Description

Feature Description
All the features that will be used along with the description of each

These are all the features that we will be inputting into the model. The code for this is very Simple we have,

Python Code

THe Python Imlementation code for the House Pricing Problem
The Python Implementation Code for the House Pricing problem

Notice that, we have a variable called ‘test_array’. This array contains all the sample values that I have provided to serve as a testing medium. This piece of Code is relatively small and easy to understand. You may have also noticed that in the last line I am multiplying the prediction by 1000. This is because I want the output to be in the proper cash denominations. I am converting the prediction into a float and applying a ‘math.ceil’ operation to round off the prediction to make it more readable. Now let us see the output or prediction that the model makes based on the features that we give.

Prediction or Output

Prediction or output of the model
This is the prediction that the model makes. The model predicts that a house in a city with the listed features has a average price of about 21,000 dollars.

Conclusion & More information

So to sum it all up, Regression is a technique that has been borrowed from Statistics. It involves finding the relationship between the independent variables and the dependent variables. The way it does this is by using some simple high school geometry. It creates a line using the formula y = mx + b where Y is the prediction, m is the slope or gradient of the line, X is the x-axis of the line and b is the y-intercept of the line. This line is known as the line of best fit. The distance between the points and the line is known as the error and the error is calculated using the formula of Mean Squared Error.   Formula: (predicted value – actual value)². So here, we are getting the difference between the predicted value and the actual value. i.e The Distance between them. After this, we are squaring it. Thus we have the Mean Squared Error. Regression has many types such as polynomial regression etc. but we have gone ahead with Linear Regression as it is an easier and highly used concept. Using regression we can find out the values of the dependent variable by manipulating the independent variables. Regression is also used in cases of Weather prediction, Stock Price Prediction etc. This algorithm comes under  Supervised Classification and is very easy to learn. The line tries to fit the data in such a way that any further predictions will be done according to the line. This algorithm is useful for data which has a linear relationships with each other.

Hope you guys enjoyed this session, In the next Blog we will be talking about the Support Vector Machine. Until then have an nice day and enjoy Deep Learning! 🙂

-MANAS HEJMADI