Supervised Learning-Regression-Part One

Supervised Learning-Regression- Part One


           My goal for this post is to explain the mathematical concepts behind supervised learning. For this post, I will be borrowing heavily from Andrew Ng’s (very well written and clear) notes as well as his Coursera course.
           Supervised learning is a program that finds patterns from a well-defined data set and can be split into two types: regression and classification. Supervised learning seeks some function that receives input(s) X in order to predict, with reasonable accuracy, some output y (think y=f(x)). With classification, Y is discrete. Discrete values are countable within a finite measure of time (e.g. the change in your pocket or your sexual assignment). Regression, on the other hand, seeks to find a continuous output. One way to think of continuous values is decimals. The temperature could be 50 degrees or 50.00 degrees or 50.01 degrees. These values could not be counted within a finite time.
As stated before, our goal is to find some function that takes a set of past data and create a function that can accurately predict Y output values of new data set. This set of past data is called a “training set.” We create our view or model of the data from this and if we have done our math correctly we will be able to predict Y from any new incoming set of data called a “test set.” Let’s look at a simple data set that I have borrowed from Dr. Ng: house size and house price.

With this data the living area is the input X and the price would be the output Y. We use the variable m to denote how many training examples there are, in this case m = 5. I have plotted the above data with the matlibplot and numpy Python language libraries.
With this data we could make some hypothesis about the Y output. The best approach for this is to give something similar to y = ax + b. In this case, a stands as the weight or parametric value that affects how much x affects the output.  We need to find the best fit for this data or the a and b values that give us the most accurate predictions. We can draw a line to represent this fit. Such a line would look like the one below:





Does this line give us an accurate prediction of the data? Not really, because we assume that y will always move in a positive direction. We can move towards a more accurate estimate by moving our function to least amount of error. More explicitly, what value of a and b should aim for? We can find this with something called the cost function. Given the length of this post, I will explain this concept in the next post.

No comments:

Post a Comment