Hello again, where were we? … Oh yes, we have been discussing CoreML and have even set up an appropriate python 2 environment to work with CoreML. In this post we are going to cover some of the most basic aspects of the workhorse of machine learning: the dependable linear regression model.
We are indeed all familiar with a line of best fit, and I am sure that many of us remember doing some by hand (you know who you are) and who hasn’t played with Excel’s capabilities? In a nutshell, a linear regression is a model that relates a variable [latex] y[/latex] to one or more explanatory (or independent) variables [latex] X[/latex]. The parameters that define the model are estimated from the available data and there are a number of assumptions about the explanatory variables and you can find more information in my Data Science and Analytics with Python book. We can think of the goal of a linear regression model to draw a line though the data as exemplified in the plot below:
Let us take the case of 2 independent variables [latex] x_1[/latex] and [latex]x_2[/latex]. The linear regression model to predict our target variable [latex] y[/latex] is given by:
[latex] y = \alpha + \beta_1 x_1 + \beta_2 x_2 + \epsilon[/latex],
where [latex] \alpha[/latex]and [latex] \beta_i[/latex] are the parameters to be estimated to help us generate predictions. With the aid of techniques such as least squares can estimate the parameters [latex] \alpha, \beta_1[/latex] and [latex] \beta_2[/latex] by minimising the sum of the squares of the residuals, i,.e the difference between an observed value, and the fitted value provided by a model. Once we have determined the parameters, we are able to score new (unseen) data for [latex] x_1[/latex] and [latex] x_2[/latex] to predict the value of [latex] y[/latex].
In the next post we will show how we can do this for the Boston House Prices dataset using a couple of variables such as number of bedrooms in the property and a crime index for the area. Remember that the aim will be to show how to build the model to be used with CoreML and not a perfect model for the prediction.
Keep in touch.
Also published on Medium.