Quantum Tunnel

Random thoughts about random subjects... From science to literature and between manga and watercolours, passing by data science and rugby; including film, physics and fiction, programming, pictures and puns.

CoreML – Linear Regression

Hello again, where were we? … Oh yes, we have been discussing CoreML and have even set up an appropriate python 2 environment to work with CoreML. In this post we are going to cover some of the most basic aspects of the workhorse of machine learning: the dependable linear regression model.

We are indeed all familiar with a line of best fit, and I am sure that many of us remember doing some by hand (you know who you are) and who hasn’t played with Excel’s capabilities? In a nutshell, a linear regression is a model that relates a variable [latex] y[/latex] to one or more explanatory (or independent) variables [latex] X[/latex]. The parameters that define the model are estimated from the available data and there are a number of assumptions about the explanatory variables and you can find more information in my Data Science and Analytics with Python book. We can think of the goal of a linear regression model to draw a line though the data as exemplified in the plot below:

Let us take the case of 2 independent variables [latex] x_1[/latex] and [latex]x_2[/latex]. The linear regression model to predict our target variable [latex] y[/latex] is given by:

[latex] y = \alpha + \beta_1 x_1 + \beta_2 x_2 + \epsilon[/latex],

where [latex] \alpha[/latex]and [latex] \beta_i[/latex] are the parameters to be estimated to help us generate predictions. With the aid of techniques such as least squares can estimate the parameters [latex] \alpha, \beta_1[/latex] and [latex] \beta_2[/latex] by minimising the sum of the squares of the residuals, i,.e the difference between an observed value, and the fitted value provided by a model. Once we have determined the parameters, we are able to score new (unseen) data for [latex] x_1[/latex] and [latex] x_2[/latex] to predict the value of [latex] y[/latex].

In the next post we will show how we can do this for the Boston House Prices dataset using a couple of variables such as number of bedrooms in the property and a crime index for the area. Remember that the aim will be to show how to build the model to be used with CoreML and not a perfect model for the prediction.

Keep in touch.

-j


Also published on Medium.

Next Post

Previous Post

© 2021 Quantum Tunnel

Theme by Anders Norén