What is Linear Regression?
Given a dataset including a dependent and an explanatory variable, linear regression is a statistical technique used to create a model (equation) that best represents the relationship between the variables. This model can then be used to predict additional values that will follow the same pattern. It is a technique commonly used to add trend lines to charts of time series data and in Machine Learning to predict future values from a given training set of data.
For example, given the following set of noisy timeseries data:
It might be hard to tell from this sample if the values are increasing or decreasing over time. Applying linear regression can yield a trendline to make the pattern clear, such as the following (in green):
Typically, linear regression is implemented using an optimization approach such as Gradient Descent which starts with a rough approximation and improves the accuracy over a large number of iterations. While such an approach will optimize the model, it can be slow based on the number of iterations required. In some cases the problem can be greatly simplified and solved in closed form using a derivation called the Normal Equation.
Lyric uses the Normal Equation to make it fast and efficient, as it should work for most applications.
First, make sure your data is represented in the form of a 2xN Array comprised of elements with an ‘x’ and ‘y’ value. The x value should be the explanatory and the y the dependent variables.
Then you need to have Lyric determine the best equation to represent this data. The equation is known as the model and you can build it using the following:
Now that you have your model, you will want to apply it to a set of inputs. The newInput should be a 1xN array containing only the explanatory variable values you would like to calculate the dependent values. This will result in a new 2xN array which will include the resulting series.
The following is a complete example which, given some values for the explanatory values 1 through 5, estimates the values of 6 through 8:
If you wanted to create a trend line, such as in the example above, you would simply apply the model to the same x values you provided in the input to build the model.
For more information on using Lyric (and more advanced options) please refer to the Lyric README.
How is Lyric implemented?
Lyric implements the normal equation using a series of matrix operations implemented by Sylvester. However, before the matrix operations can be applied, the input series x (explanatory values) must be modified to represent the degree of polynomial we want to use to fit the data. To do this, we simply take the vector of x values and create a matrix where each row is the values of x raised to a given power. For example, given the power = 3 (for a 3rd degree polynomial) the output O will be of the form:
O = x^0 i.e. all ones
O = x^1 i.e. the same as x
O = x^2 i.e. all elements of x squared
O = x^3 i.e. all elements of x cubed
If you are familiar with linear algebra, you’ll recognize that this represents an equation of the form:
Once the input is in this form, the actual matrix operations are fairly simple, following the normal equation steps.
The resulting theta matrix is the values of the constants A, B, C and D from the above equation. Then, by multiplying future values of x by the same theta matrix we can predict y values.
If you’re like to learn more about Linear Regression, the Machine Learning class offered by Coursera reviews it in high detail (as well as many other machine learning topics).