Regression tutorial

Simple example —
Deducing the value of a house based on the sampled prices of the market

The idea is to find the model which links an important characteristic with the price, and deduce the value of any house.
We choose a polynomial model of order 1 ( y = a*x + b ), which we will fit by linear least squares regression.

This is our sample of the market :

 

House Market Samples
surface area (m2) price (Thousand USD)
45 275
32 190
85 512
69 414

 

Simply copy and paste the table values in the variable editor:

house-market-data-entry

Plot the values as follow:

--> plot(B(:,1),B(:,2),"+")

house-market

If there was a perfect polynomial model of order 1 of this market,
we would have the following :

275 = 45 * a + b
190 = 32 * a + b
512 = 85 * a + b
414 = 69 * a + b

What we are looking for are the best a and b — according to a chosen criteria : this is why we talked about least squares. We can grade how good a model is by looking at how far each sample’s estimation is to the actual value.
The distance we choose to calculate is the difference between these values, squared (so that it is always positive and there is only one solution1,2).

To compute the linear regression of the system, do as follow:

--> x=B(:,1)'
--> y=B(:,2)'
--> [a, b] = reglin(x, y);
--> plot(x, a*x+b, "red")

house-market-reglin

Next episode: the cost function

For the samples we have given, this is the formula to calculate the
sum of the squared distances (the grade of a model, or cost function) :

S(a, b) = [275 – (a * 45 + b)]2 + [190 – (a * 32 + b)]2 + [512 – (a * 85 + b)]2 + [414 – (a * 69 + b)]2
--> deff("z=S(a,b)",["z = [275 - (a * 45 + b)].^2 + [190 - (a * 32 + b)].^2 + [512 - (a * 85 + b)].^2 + [414 - (a * 69 + b)].^2"])