Classification – The ‘Hello World’ of ML

This is the first topic that we are going to speak about in this practical machine learning series. Classification is one among the simpler types of algorithms. Classification is the act of creating a line of separation between two or more dimensional distinct groups of data. Once the algorithm has found out the line, any new points(data) that are introduced will be categorized into any of the features(groups of data) depending upon the position of that point from the line. This algorithm works best with scenarios where you want to categorize data depending on their distinct features.

Here is the definition of Classification from Wikipedia

In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

In short, Classification is the process of predicting which set the given data belongs to. This is very useful in Machine Learning, Statistics and Data Science. A few common use cases of Classification are:

  • Disease Prediction, Breast Cancer Prediction, Heart disease prediction etc.
  • Video Grouping on YouTube, Once a creator uploads his video, using tags and specific Search Engine Optimization techniques YouTube categorizes the video into its specific genre.
  • Image Classification
  • Voice Classification
  • E-mail spam filtering

The most common programming introduction to Classification is the Iris Classification. It is like the ‘Hello World’ of Classification. This problem constitutes of finding which species does a given flower belong to in the Iris family using the data. I will be providing a small snippet of code showing the implementation of the Iris classification problem solved in Python 3.

Download the dataset from here

right click on the ‘iris.data’ file and click on ‘Save link as’ and save it to your working directory. Then make sure you have all the basic required modules installed on your Python environment. Now you are ready to go!

Code

Code for the Iris classification problem
This is the code for the Iris classification problem

Results

Output of the sample program
This picture is displaying the predictions that the model has predicted. model predicts that a flower with these measurements are most likely to be a iris-setosa

We are using the modules Numpy, Pandas, and sklearn. This is a very simple problem. That’s why I referred to it as the ‘Hello World’ of Classification. Can you believe it, we have written our first classification problem in just 16 lines of Python, that is insane!

You can also Check out popular YouTube Channels such as Sentdex and Siraj Raval. These guys are really good and experienced trainers in this field. However, they teach more about Deep Learning. You are going to study Deep Learning anyways, so watching their videos will give you a head start!

Conclusion

Classification is just grouping of data according to the features that they possess. We coded a simple classification problem in just 16 lines of Python! We have decided to use the classification library called Neighbors. which includes the classifier called KNeighborsClassifier() This is not the only library that exists. There are other libraries such as svm.SVC() and DecisionTreeClassifier() but for the sake of convenience we have chosen KNeighborsClassifier() In short, Classification is very easy and is just a stepping stone towards the bigger and more complex models such as Restricted Boltzmann Machines and Auto Encoders etc.

Hope you enjoyed this small introduction and tutorial for Classification. Thank you so much for reading this and have a nice day 🙂

-MANAS HEJMADI

2 thoughts on “Classification – The ‘Hello World’ of ML

Leave a reply to manashejmadi Cancel reply