OP & FP: Data Transportation and Transformation

I haven’t been blogging as much recently, but it hasn’t been due to laziness (I promise!). In my spare time, I’ve been working on a few things. Primarily my time is devoted to my side hustles. My goal isn’t to win the lottery, but to have money working for me when I’m not working. This has been such a struggle and is very time consuming, but very worth it. I haven’t stopped learning how to program, though! In fact, my remaining time has been dedicated to my continuing learning. My time has been dedicated to learning Node, Angular, Gosu, and Elixir.

When you’re learning a new language, the journey tends to be easier as the concepts translate well. My skillset up until a few months ago only involved object oriented languages, but I recently decided to add Elixir to the mix and delved into the world of functional-programming.
The question is what does it mean to be object-oriented or functional-oriented? First, it is important to ask: what is computing? I believe computing can be summed up as data transformation and transportation. Every encounter with code I’ve had, including machine learning algorithms, boils down to taking some input and turning it into output. Perhaps you need to get data from a database and render in a field on your webpage. That is data transportation. Or perhaps you’ve made a time clock and need to automatically compute an employee’s wages for the day. This is data transformation.

We are instructing our computers to transport and transform data all day. So how are we transforming and transporting our data? With object-oriented programming this process involves objects, which Zed Shaw likens to nouns. Our objects are abstractions of our data and, like nouns, they have properties (adjectives) and they have methods (verbs). We’ve taken our data and turned it into a digital noun and many applications and architecture now revolve around this concept.

To be sure, this concept has served us well for years, but what if our architecture revolved around data and what we can do with data? This is where functional programming comes into play. Languages like Elixir look like any other language on the surface. Elixir organizes data into modules which essentially says: I have this data and here are some things I can do to that data. This can best be described with code. Data is left alone and things that can be done to that data are separated into their own module.

To better understand this, code is needed. This is a simple Ruby script representing a bank account.




 
We turn our bank account in a noun. It has properties: an account holder and a balance. It has actions. We can add to the account, we can subtract from the account, and we describe our account to whomever. How might we do this in functional programming? I’ll use Elixir here!



Functional programming offers the advantage of using less lines of code. Our code is also kept separate from our data. Our data looks like this:



Admittedly, now that I’m used to object-oriented, I feel that it is more intuitive, but were I new programmer, I’d think functional oriented makes more sense. The big question right now: which is better? I don’t know the answer to that, but I have had far more luck out using frameworks like Phoenix than I have with Rails or Sails.js. I believe both paradigms are here to stay and offer distinct advantages. This leads me to a new development in my programming journey: the realization that certain tools are good for certain things and that it is good to be a polyglot.


One note before I end this post: you don't need a functional language in order to do functional programming! You can do functional programming in almost every object-oriented language. In fact, Java 8 supports functional programming. You can learn about that here! I hope to be blogging about Elixir and Phoenix in the future. I believe that these technologies will be in serious use in the future! 

The Page Object Framework in Less than 3 minutes (Hopefully)

If you want to work in test automation, you have to understand the page object model. This model provides the advantages of sorting out your test code in a clear manner as well as not having to fix test code whenever UI changes are made.

What is a Page Object?
An object, as Zed Shaw explains it, is essentially a noun. In terms of computer science, it is a noun that represents a specific thing. For instance, an object could represent a car or a house. This object has attached to it certain properties and it can do certain things (methods it can perform). For automated testing, when we are testing a page, we can a page into an object by assigning it to a class. For instance, let's say we have a login page with two fields for a username and a password. These fields are properties of the login page. Additionally, there are certain actions you can perform on this page. For the purposes of testing, you could say that you can login correctly, login incorrectly (wrong username, wrong password, or both items wrong, etc.), or any other actions you may need to test. With a page object our login page becomes a class with HTML code as its properties and testing methods as its methods. So what would this look like?

















Our LoginPage class has three HTML elements and we have identified each element by the ID (the best way to select an element in testing). We have created a method that can take any username or password we give it and login. We can then user this method in our step definition and more importantly use it in more than one step definition.

I haven't had much time to write blog posts lately, but I wanted to get one out there! Stay tuned for a larger explanation of the page object format as well as things I've learned about machine learning!

Coding Your First Learning Algorithm - Part Two - The Code without ML Libraries

Steps to Coding Our Linear Regression Algorithm
All good code starts on paper:

  1. Gather our data & determine our starting conditions
  2. Compute our cost function (or how much error our hypothesis function has)
  3. Perform gradient descent to minimize the cost function
  4. Return the proper a values for the most accurate predictive function
Gather our data & determine our starting conditions
I'll be using the same data points from the last post, so the function to load the data will remain the same. I want to set up my hypothesis function with some guess as to what my a values are. If I simply set them to zero, the computer will do the heavy lifting and determine the proper value with gradient descent. I will need to set up a learning rate and the number of iterations. The number of iterations determines how many times we will the program or how much the model will be "trained." We have 97 data points, so we don't want to train our model too heavily, but we also want to make sure we have a good amount of training. I chose 1000 (mostly because that's what we use in class), but also because it is a good middle ground for a small dataset. Remember, our learning rate is the size of steps we take towards a minimum. Similarly, the iterations are the number of steps we take and gradient descent is the direction. I wrapped all of these attributes into a "main" function and the code looks like this:



Compute our cost function
If you recall, we defined our cost function as:


This function will return mean squared error for the supplied values of a. That code has been wrapped up into a function called the cost_function. This code is computation and we need to iterate through all the data with a for loop.



Perform Gradient Descent to minimize the cost function
The code for gradient descent will be broken down into two parts. The gradient descent runner, which updates the steps simultaneously and the step gradient function which calculates the steps that will be passed. The synchronous nature of Python ensures that each value will be updated simultaneously. The code for both functions is below: 


Return the proper a values for the most accurate predictive function
Now we have a set of functions that should return the minimal value of a and provides a the most accurate possible prediction. How do we measure if we have improved? We compare our original error value and the error value post gradient descent. I added some code that prints to the command line. 


When I run my program, it is extremely quick (less than 1 second) and we have good values for a! Here are the results:


When we started, our average error was approximately 64, but after we ran our program it was close to 11. It seems that our program has improved by a significant amount!

Soon, I hope to do this same code with the scikitlearn library, which could reduce this code down to less than 15 lines. 

Coding Your First Supervised Learning Program-Part One-Describing Our Dataset

Special Thanks:

  • Siraj Raval
  • Dr. Andrew Ng
  • Dr. Jason Brownlee
  • Harrison from Sentdex
--------------------------------------------------------------------------------------------------------------------------
For my first demo of a supervised linear regression learning program, I will be using the dataset from Andrew Ng's Machine Learning course and doing some descriptive work. This dataset explores the relationship between a city's population and the profitability of a food truck. In Dr. Ng's first programming assignment, he provides this data in a .txt format with our x and y values separated by a comma. Dr. Ng's data file is labeled 'ex1data.txt.' To get started, I created a directory labeled SimpleLinearRegession and I placed the data file inside this directory.

Let's Describe Our Data
I want to get an understanding of what's going with our data. We're going to write our program in Python (it is the easiest programming language you'll ever learn). In order to access our data in the .txt file, Python requires a library called numpy for a command called loadtxt. In order to install these packages, I'd recommend getting Python's package installer, PIP. We will be using the numpy library for a lot of other cool stuff later. I will import this functionality with the following script:



What's cool about Python is how human readable it is (much like Ruby). As described before, we have a dataset with x and y values separated by commas which looks like this: 



We can make this data usable in our Python program by creating a variable and loading the data from the .txt file, like so:




What does our data look like now? In Python, we can see what our data variable looks like by typing 'print data' into our Python folder and then executing the python program (python your_file.py). You should get something like this:




At this point, our program returns an array of arrays. Each array contains a x and a y value. From this, we can determine our m value or the amount of data we have in our dataset. If you type, 'print len(data)', then your program should return 97, so m = 97. What I would like to do now is plot my data. We can do this in Python with a library called matplotlib and use pyplot with the code below: 



In this case, plt acts as an alias for the package so that we don't have to type it out over and over again. We can create a scatter plot for our data with the command 'plt.scatter().' This command requires two arguments, an x and y variable. However, we have an array of arrays, with each array containing an x and y value. Each array would only pass as one argument, as such we need to split these values up into a group of x coordinates and y coordinates to pass into our scatter method. We can do so by initializing two empty arrays, iterating through our data array, and then splitting our first array element into the x array and the second into the y array. That code looks like this: 





We can now plot our data with the scatter method. We can add labels to the x and y axis with the '.xlabel' and '.ylabel' methods. Keep in mind, that the scatter plot does need the '.show' method for the user to see the scatter plot. The code to create and show our scatter plot looks like this: 



This code should produce a scatter plot, like the one below, whenever the program is executed.





Now we have an idea of what our data looks like. In the next post, we'll code our linear regression algorithm and determine how city population (x) affects our profits(y). 

The code for what I have so far is here.







Supervised Learning- Linear Regression-Part Two

Recap:
In my last post, I explained how given a set of data we can then create a line that tries to predict outputs. This prediction can be presented as a function, F(x) = y = ax + b. We also discussed how this line could be inaccurate without some analysis of errors.


A Couple Notes:
If anyone is taking Professor Ng’s Coursera course, there may be confusion with to the notation I used and the notation he uses. It is more proper, for the field of machine learning, to use Professor Ng’s notation, but I decided to start with the y = ax + b as it is a more common point and less confusing when starting out.

Also, the notation we use for our data points will come in handy as we progress. Let’s look at our data points and get an understanding of our notation:


X Notation
X Value-Living Area
Y Notation
Y Value-Price
x1
2104
y1
400
x2
1600
y2
330
x3
2400
y3
369
x4
1416
y4
232
x5
3000
y5
540

Also keep in mind, the term for “any given x” or “any given y” would be: x(i) and y(i) respectively.

What is an error?
An error is rather intuitive. Given a predictive function, F(x), measure how far off the predicted outputs are from known outputs. Simply predicted - actual. Imagine if I took F(2104) and determined that y = 500. My error in this case would be 100 (500 - 400). If I took F(1600) and determined that y = 300, then my error would be -30. You may ask, how does one get a negative amount of error? Would you be saying you’re so wrong that you’re right? This is because at this point we are measuring a range rather than error. We handle this squaring our errors. In case you’re curious why we don’t use absolute value, read about Jensen's Inequality.

This can be represented as:

Mean Squared Error and the Cost Function
We can take our function and check the error with each x and y value, but what do we do after that? The error data is not really meaningful to us yet as it is simply a collection of error values. What we’re going to do is very similar to taking the average of collected errors. The sum of all our errors can be represented as:
Where the preceding term, sigma, is just a fancy way of representing a sum. Remember that our m value is the sum of all of our data in the set. For the purpose of our goal, instead of dividing the above term by m, we will multiply it by 2 and put it over one making (f(xi) -yi)2our numerator. Our end equation looks like this and we set our error to a variable J:

In machine learning, J is called the cost function. The one I have provided is different in notation, but fundamentally the same as one you might find in a machine learning textbook. Now you might notice a problem: our original equation is y = ax + b. What do we do with the a and b in this case? We will calculate for mean squared error based off different parameters, which makes our job extremely easy. Think about it this way: we always know what our x and y value will be if a and b are constant, but we need the values of a and b to make our algorithm work. Our calculation of any given a or b is noted as follows:


A Change in Notation
It is at this point, I’d to change the notation in order to align more closely with Professor Ng’s class. We will recognize our b term is a parameter like a and notate each parameter as a. So,


This will help us with the calculus that will happen later on. Similarly, the cost function will look like this:

Finding the right parameters
Our goal now is find values of a that produce the lowest cost, J. By obtaining the minimal value of J, we will then have the most accurate function possible. How do we find the lowest value of J? To do so, we need to find the relationship of a to the cost function. More specifically, we need to see how J changes and at what rate it changes with regards to our parameters. The rate of change in calculus is found by with the derivative. To understand a little better let’s say we had the function:
The way find the derivative is by eliminating the constant, 3, and multiplying the coefficient 2 by the exponent. With this we get 6x and we know that the above function changes at a rate of 6x. So the derivative of the above function (to read as “f prime of x”) is:
However, the cost function takes two parameters and as such we must evaluate the rate of change one variable has on another variable. We do this with a partial derivative. For example, take the following function:
We can then take the derivatives of each variable, x and y, and the find out how they change in relation to one another. The partial derivative is denoted as such:

Essentially what we’re doing is seeing how y changes with regards to x by holding y constant. In the above example, the derivative of y evaluate to 1 and x squared remains constant. So y changes at a rate of x2 with regards to y.

For our cost function, we want to see how a parameter a changes with given set outputs and inputs. Our inputs and outputs are constants when evaluating our functions. The derivative term with respect to our cost function, looks like this:

We could then multiply the derivative term by some value, which we will call , and subtract the product of and the derivative term from our original proposed parameter a. So the algorithm to find the minimum would look like this:


We would then assign the result of this function to the parameters and plug them back into the equation and repeat the process until we converge to the minimum value of our function. This process is called Gradient Descent.

Gradient Descent Visualized
While it might be difficult to understand that algorithm, it is easy to visualize and understand what gradient descent is trying to accomplish. Imagine if we took our x and y terms and expanded a plot to a third dimension. We would represent our cost function value on the z axis and plot the various values of a. A plot might look like this:
You can imagine the function as someone navigating a landscape, where they are standing on a high point and seeking the lowest point in the valley. Each do above represents a step towards the lowest point in the valley, or the minimum. This step is our term and it represents our learning rate. The value we set the learning rate determines the size of the step we take towards the minimum. As such, it is important to select the right learning rate. If the learning rate is too large, then the a value will remain too large and then never reach the minimum. If is too small, then we may never reach the minimum on a practical timescale.

Conclusion
We have taken a function that we believe can predict the price of house given the size of the house. We tested the predictive power of the function by placing the size of the house and seeing how the predicted value differed from each actual value we knew. We then added those errors up and used the value of our parameters in relation to the mean error squared to see what the optimal (or minimal) should be.

If you really think about it, through this long process we were able to learn how to predict the price of a house. Were we to place this process into code and run it, we would see that our code would take in past data and perform a predicting task. It would then learn to improve the predicting task with time. This is the very definition of a learning machine!

In the next post, I will be using Python to make a machine learn.





Supervised Learning-Regression-Part One

Supervised Learning-Regression- Part One


           My goal for this post is to explain the mathematical concepts behind supervised learning. For this post, I will be borrowing heavily from Andrew Ng’s (very well written and clear) notes as well as his Coursera course.
           Supervised learning is a program that finds patterns from a well-defined data set and can be split into two types: regression and classification. Supervised learning seeks some function that receives input(s) X in order to predict, with reasonable accuracy, some output y (think y=f(x)). With classification, Y is discrete. Discrete values are countable within a finite measure of time (e.g. the change in your pocket or your sexual assignment). Regression, on the other hand, seeks to find a continuous output. One way to think of continuous values is decimals. The temperature could be 50 degrees or 50.00 degrees or 50.01 degrees. These values could not be counted within a finite time.
As stated before, our goal is to find some function that takes a set of past data and create a function that can accurately predict Y output values of new data set. This set of past data is called a “training set.” We create our view or model of the data from this and if we have done our math correctly we will be able to predict Y from any new incoming set of data called a “test set.” Let’s look at a simple data set that I have borrowed from Dr. Ng: house size and house price.

With this data the living area is the input X and the price would be the output Y. We use the variable m to denote how many training examples there are, in this case m = 5. I have plotted the above data with the matlibplot and numpy Python language libraries.
With this data we could make some hypothesis about the Y output. The best approach for this is to give something similar to y = ax + b. In this case, a stands as the weight or parametric value that affects how much x affects the output.  We need to find the best fit for this data or the a and b values that give us the most accurate predictions. We can draw a line to represent this fit. Such a line would look like the one below:





Does this line give us an accurate prediction of the data? Not really, because we assume that y will always move in a positive direction. We can move towards a more accurate estimate by moving our function to least amount of error. More explicitly, what value of a and b should aim for? We can find this with something called the cost function. Given the length of this post, I will explain this concept in the next post.

What is Machine Learning?

In the beginning of January, I decided to take some coursework via MOOC's on Machine Learning and Artificial Intelligence. One sensation we hear about in the media is how much AI is going to change the world for the better and worse. Many of the staple jobs in the US are projected to be done by intelligent machines in the coming decade and cause massive structural unemployment. Some ever worry AI will take over like we have seen in Hollywood (unlikely I think). Regardless, we must acknowledge and push forward in this area because the benefits (as well as the risk of not pursuing AI) are too great to ignore.

I've been taking Andrew Ng's Intro to Machine Learning via Coursera, Udacity's Intro to AI and Linear Algebra refresher, and watching a ton of Siraj Rival videos on YouTube. Machine Learning is one the hardest things I've tried to learn in my life. It involves a ton of math, probability, and programming, but this is precisely why I like it. I love the challenge. I am, however, a lazy student. I am trying to get into a GA Tech's Online Masters in Computer Science, so I can do machine learning full time, but I also have to make up for mediocre undergraduate performance in a non-CS field. I have learned, since teaching myself to program, that this blog and using the Feynman method (teaching what you've only just learned in order to grasp it fully) will keep me accountable and push me to a deep understanding. Regardless of what my academic career looks like, I will learn machine learning because I like it.

So, the fun part: What is machine learning? Tom Mitchell, a computer science professor at Carnegie Melon, defines machine learning as this:

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

Makes perfect sense right? The definition is not as intimidating as it looks at first glance. All Dr. Mitchell is saying is: if you give a computer data about the past regarding some task and it is able to improve how that task is done, then you can say that the program has learned.

Why is is this important? Is Machine Learning AI?
This is important because machine learning is our first big step in the ultimate goal in artificial intelligence: making a machine as intelligent as a human being. It is important to remember that machine learning is a subset of artificial intelligence, not the field as a whole. Learning is only a small part of what we want artificial minds to do. Some researchers, like Monica Anderson, say machine learning is the only kind of artificial intelligence we have, though the goal is to incorporate things like abstract thinking into our machines.

So how does machine learning differ from conventional programming?

Conventional programming forces a programmer to define all the parameters and data needed to perform a task. Machine learning takes a series of rules about a task, data about the past, and the desired output and reaches the desired output by repeatedly attempting the task until reaching the desired output. For example, imagine trying to make a winning chess program. You could program all the moves and countermoves, but Chess strategy has been something that has been studied for about 2,000 years. The programming task would be monumental. However, if you supply the rules on how pieces move and what a winning scenario is, then with machine learning, a program can learn to play chess and win.

How does a computer learn to play chess? Or any task for that matter?
Well, this is where the math comes in. Machine learning has also been called "the extraction of knowledge from data." Machine learning uses advanced concepts from statistics and probability (that I will be covering in future sections) to determine patterns in datasets. Just hang in there! I will teach you this in detail. I do want to explain conceptually the different ways machines learn from data.

Supervised Learning
This is probably the most intuitive learning method. A program is supplied a well labeled dataset and a correct outcome. For instance, supplying a program images and labelling the images (e.g. "this is a cat"). With this method, when the program is supplied more data, the more it is able to learn and when it encounters unclassified data, it should be able to classify that data correctly. My next couple of posts will revolve around this and walking through a "hello world" style machine learning program. This is the most common type of machine learning.

Unsupervised Learning
This kind of learning is when a computer is given an unlabelled set of data and we tell the computer: "here is an unstructured dataset, can you give it structure?" This is a powerful algorithm and one way to think of it is how we learn something completely new. If you watch one or two YouTube videos on a complex subject, then you aren't likely to learn much. However, if you watch thousands of videos, read a hundred books, and attend network with others learning the same thing, then you will learn much more.

Reinforcement Learning
Similar to unsupervised learning, this algorithm takes data about what classifies something (e.g. what makes something a truck instead of a car?) and uses the "Nearest Neighbors" technique to determine what something is. The nearest neighbors technique takes the classification of some item that neighbors the unknown item and then uses the data from the neighbors to classify the unknown item.

Neural Nets
This is a really cool buzzword and a really cool technique! This method uses a model of the human brain, which to truly explain requires its own post. This technique uses matrix multiplication and activation functions to determine to allow a machine to interpret inputs and propagate it through a network. This technique has lost popularity in the past, but regained it in the future. This is my personal favorite type of learning!

Conclusion
My hope for my future blog posts is the go into detail on how math can help a machine learn. As always, if you are kind enough to read this post, let me know what you think. I welcome constructive criticism. I want to learn how to design learning machines and be the best at it. I don't want to just seem like I know what I'm talking about. If there's concept I didn't explain well or if you want to understand something more deeply, then let me know!