In the ML Pipeline, after Classification the next step is Regression. We are going to specifically discuss about a subset of Regression called Linear regression. Linear regression is an approach to create a model to understand the relationship between input and output numerical variables. It helps one to understand how the typical value of the dependent variable changes when any one of the independent variables is varied. It is used when we have an independent variable ( Features that are inputted to the model, example: number of rooms, location of house, year the house was built etc.) and another variable that depends on the initial variable ( example – Price of the house ). As the values of the independent variable( features ) change the values of the dependent variable changes. Using regression we can find the value of the dependent variable according to the change in value of the independent variable. The way regression works & its error finding will be discussed in the conclusion
let us take the example of Predicting the cost of a new house we cannot use classification here as it simply does not make sense. We are predicting the cost & not the category the house belongs to. Firstly we need to input the features, i.e all the factors which affect the final price of the house.
First, we need a dataset of house pricing, an easily available dataset is one that we can load from the python module sklearn. We need to use this python code to load the dataset:
from sklearn.datasets import load_boston
After this step we can proceed! but first we must know all the features that we are inputting into the regression algorithm.
These are all the features that we will be inputting into the model. The code for this is very Simple we have,
Notice that, we have a variable called ‘test_array’. This array contains all the sample values that I have provided to serve as a testing medium. This piece of Code is relatively small and easy to understand. You may have also noticed that in the last line I am multiplying the prediction by 1000. This is because I want the output to be in the proper cash denominations. I am converting the prediction into a float and applying a ‘math.ceil’ operation to round off the prediction to make it more readable. Now let us see the output or prediction that the model makes based on the features that we give.
Prediction or Output
Conclusion & More information
So to sum it all up, Regression is a technique that has been borrowed from Statistics. It involves finding the relationship between the independent variables and the dependent variables. The way it does this is by using some simple high school geometry. It creates a line using the formula y = mx + b where Y is the prediction, m is the slope or gradient of the line, X is the x-axis of the line and b is the y-intercept of the line. This line is known as the line of best fit. The distance between the points and the line is known as the error and the error is calculated using the formula of Mean Squared Error. Formula: (predicted value – actual value)². So here, we are getting the difference between the predicted value and the actual value. i.e The Distance between them. After this, we are squaring it. Thus we have the Mean Squared Error. Regression has many types such as polynomial regression etc. but we have gone ahead with Linear Regression as it is an easier and highly used concept. Using regression we can find out the values of the dependent variable by manipulating the independent variables. Regression is also used in cases of Weather prediction, Stock Price Prediction etc. This algorithm comes under Supervised Classification and is very easy to learn. The line tries to fit the data in such a way that any further predictions will be done according to the line. This algorithm is useful for data which has a linear relationships with each other.
Hope you guys enjoyed this session, In the next Blog we will be talking about the Support Vector Machine. Until then have an nice day and enjoy Deep Learning! 🙂