CoreML – Linear Regression

Hello again, where were we? … Oh yes, we have been discussing CoreML and have even set up an appropriate python 2 environment to work with CoreML. In this post we are going to cover some of the most basic aspects of the workhorse of machine learning: the dependable linear regression model.

We are indeed all familiar with a line of best fit, and I am sure that many of us remember doing some by hand (you know who you are) and who hasn’t played with Excel’s capabilities? In a nutshell, a linear regression is a model that relates a variable  y to one or more explanatory (or independent) variables  X. The parameters that define the model are estimated from the available data and there are a number of assumptions about the explanatory variables and you can find more information in my Data Science and Analytics with Python book. We can think of the goal of a linear regression model to draw a line though the data as exemplified in the plot below:

Let us take the case of 2 independent variables  x_1 and x_2. The linear regression model to predict our target variable  y is given by:

 y=\alpha + \beta_1 x_1 + \beta_2 x_2 + \epsilon,

where  \alphaand  \beta_i are the parameters to be estimated to help us generate predictions. With the aid of techniques such as least squares can estimate the parameters  \alpha, \beta_1 and  \beta_2 by minimising the sum of the squares of the residuals, i,.e the difference between an observed value, and the fitted value provided by a model. Once we have determined the parameters, we are able to score new (unseen) data for  x_1 and  x_2 to predict the value of  y.

In the next post we will show how we can do this for the Boston House Prices dataset using a couple of variables such as number of bedrooms in the property and a crime index for the area. Remember that the aim will be to show how to build the model to be used with CoreML and not a perfect model for the prediction.

Keep in touch.


Also published on Medium.