All good code starts on paper:
- Gather our data & determine our starting conditions
- Compute our cost function (or how much error our hypothesis function has)
- Perform gradient descent to minimize the cost function
- Return the proper a values for the most accurate predictive function
I'll be using the same data points from the last post, so the function to load the data will remain the same. I want to set up my hypothesis function with some guess as to what my a values are. If I simply set them to zero, the computer will do the heavy lifting and determine the proper value with gradient descent. I will need to set up a learning rate and the number of iterations. The number of iterations determines how many times we will the program or how much the model will be "trained." We have 97 data points, so we don't want to train our model too heavily, but we also want to make sure we have a good amount of training. I chose 1000 (mostly because that's what we use in class), but also because it is a good middle ground for a small dataset. Remember, our learning rate is the size of steps we take towards a minimum. Similarly, the iterations are the number of steps we take and gradient descent is the direction. I wrapped all of these attributes into a "main" function and the code looks like this:
Compute our cost function
If you recall, we defined our cost function as:
This function will return mean squared error for the supplied values of a. That code has been wrapped up into a function called the cost_function. This code is computation and we need to iterate through all the data with a for loop.
Perform Gradient Descent to minimize the cost function
The code for gradient descent will be broken down into two parts. The gradient descent runner, which updates the steps simultaneously and the step gradient function which calculates the steps that will be passed. The synchronous nature of Python ensures that each value will be updated simultaneously. The code for both functions is below:
Now we have a set of functions that should return the minimal value of a and provides a the most accurate possible prediction. How do we measure if we have improved? We compare our original error value and the error value post gradient descent. I added some code that prints to the command line.
When I run my program, it is extremely quick (less than 1 second) and we have good values for a! Here are the results:
When we started, our average error was approximately 64, but after we ran our program it was close to 11. It seems that our program has improved by a significant amount!
Soon, I hope to do this same code with the scikitlearn library, which could reduce this code down to less than 15 lines.
No comments:
Post a Comment