## Advanced Data Science and Analytics with Python – Submitted!

There you go, the first checkpoint is completed: I have officially submitted the completed version of “Advanced Data Science and Analytics with Python”.

The book has been some time in the making (and in the thinking…). It is a follow up from my previous book, imaginatively called “Data Science and Analytics with Python” . The book covers aspects that were necessarily left out in the previous volume; however, the readers in mind are still technical people interested in moving into the data science and analytics world. I have tried to keep the same tone as in the first book, peppering the pages with some bits and bobs of popular culture, science fiction and indeed Monty Python puns.

Advanced Data Science and Analytics with Python enables data scientists to continue developing their skills and apply them in business as well as academic settings. The subjects discussed in this book are complementary and a follow up from the topics discuss in Data Science and Analytics with Python. The aim is to cover important advanced areas in data science using tools developed in Python such as SciKit-learn, Pandas, Numpy, Beautiful Soup, NLTK, NetworkX and others. The development is also supported by the use of frameworks such as Keras, TensorFlow and Core ML, as well as Swift for the development of iOS and MacOS applications.

The book can be read independently form the previous volume and each of the chapters in this volume is sufficiently independent from the others proving flexibiity for the reader. Each of the topics adressed in the book tackles the data science workflow from a practical perspective, concentrating on the process and results obtained. The implementation and deployment of trained models are central to the book

Time series analysis, natural language processing, topic modelling, social network analysis, neural networds and deep learning are comprehensively covrered in the book. The book discusses the need to develop data products and tackles the subject of bringing models to their intended audiences. In this case literally to the users fingertips in the form of an iPhone app.

While the book is still in the oven, you may want to take a look at the first volume. You can get your copy here:

Furthermore you can see my Author profile here.

## “Advanced Data Science And Analytics” is finished!

It has been a few months of writing, testing, re-writing and starting again, and I am pleased to say that the first complete draft of “Advanced Data Science and Analytics with Python” is ready. Last chapter is done and starting revisions now. Yay!

## Apple Developer Support

It is great to see all the support that Apple Developers get in terms of tools, ecosystem, community and more.

For starters the Developer Support portal has a ton of information for the new comer as well as for the more expert of experts. Including guides and documentation for tools such as Xcode as well as information for developing software for MacOS and iOS.

Information about Design is available in the same place, including Human Interface Guidelines, Fonts (including downloads for San Francisco!) and information about accessibility and localisation.

Information about new tools and updates such as the latest about Swift, and SwiftUI can be easily found. And testing your apps with the help of tools such as TestFlight makes things so much easier.

## CoreML – Boston Model: The Complete App

Look how far we have come… We started this series by looking at what CoreML is and made sure that our environment was suitable. We decided to use linear regression as our model, and chose to use the Boston Price dataset in our exploration for this implementation. We built our model using Python and created our .mlmodel object and had a quick exploration of the model’s properties. We then started to build our app using Xcode (see Part 1, Part 2 and Part 3). In this final part we are going to take the .mlmodel and include it in out Xcode project, we will then use the inputs selected from out picker and calculate a prediction (based on our model) to be displayed to the user. Are you ready? Nu kör vi!

Let us start by adding the .mlmodel we created earlier on so that it is an available resource in our project. Open your Xcode project and locate your PriceBoston.mlmodel file. From the menu on the left-hand side select the “BostonPricer” folder. At the bottom of the window you will see a + sign, click on it and select “New Groups”. This will create a sub-folder within “BostonPricer”. Select the new folder and hit the return key, this will let you rename the folder to something more useful. In this case I am going to call this folder “Resources”.

Open Finder and navigate to the location of your BostonPricer.mlmodel. Click and drag the file inside the “Resources” folder we just created. This will open a dialogue box asking for some options for adding this file to your project. I selected the “Create Folder References” and left the rest as it was shown by default. After hitting “Finish” you will see your model now being part of your project. Let’s now go the code in ViewController and make some needed changes.  The first one is to tell our project that we are going to need the powers of the CoreML framework. At the top of the file, locate a line of code that imports UIKit, right below it type the following:

import CoreML

Inside the definition of the ViewController class, let us define a constant to reference the model. Look for the definitions of the crimeData and roomData constants and nearby them type the following:

let model = PriceBoston()

You will see that when you start typing the name of the model, Xcode will suggest the right name as it knows about the existence of the model as part of its resources, neat!

We need to make some changes to the getPrediction()function we created in the last post. Go to the function and look for place where we pick the values of crime and rooms and right after that write the following:

guard let priceBostonOutput = try? model.prediction(
crime:crime,
rooms: Double(rooms)
) else {
fatalError("Unexpected runtime error.")
}

You may get a warning telling you that the constant priceBostonOutput was defined but not used. Don’t worry, we will indeed use it in a little while. Just a couple of words about this piece of code, you will see that we are using the prediction method defined in the model and that we are passing the two input parameters that the model expects, namely crime and rooms. We are wrapping this call to the prediction method around a try statement so that we can catch any exceptions. This is where we are implementing our CoreML mode!!! Isn’t that cool‽

We are not done yet though; remember that we have that warning from Xcode about using the model. Looking at the properties of the model, we can see that we also have an output attribute called price. This is the prediction we are looking for and the one we would like to display. Out of the box it may have a lot of decimal figures, and it is never a good practice to display those to the user (although they are important in precision terms…). Also, with Swift’s strong typing we would have to typecast the double returned by the model into a string that can be printed. So, let us prepare some code to format the predicted price. At the top of the ViewController class, find the place where we defined the constants crimeData and roomData. Below them type the following code:

let priceFormat: NumberFormatter = {
let formatting = NumberFormatter()
formatting.numberStyle = .currency
formatting.maximumFractionDigits = 2
formatting.locale = Locale(identifier: "en_US")
return formatting
}()

We are defining a format that will show a number as currency in US dollars with two decimal figures. We can now pass our predicted price to this formatter and assign it to a new constant for future reference. Below the code where the getPrediction function was defined, write the following:

let priceText = priceFormat.string(from: NSNumber(value:
priceBostonOutput.price))

Now we have a nicely formatted string that can be used in the display. Let us change the message that we are asking our app to show when pressing the button:

let message = "The predicted price (in $1,000s) is " + priceText! We are done! Launch your app simulator, select a couple of values from the picker and hit the “Calculate Prediction” button… Et voilà, we have completed our first implementation of a CoreML model in a working app. There are many more things that we can do to improve the app. For instance, we can impose some constraints on the position of the different elements shown in the screen so that we can deploy the application in the various screen sizes offered by Apple devices. Improve the design and usability of the app and designing appropriate icons for the app (in various sizes). For the time being, I will leave some of those tasks for later. In the meantime you can take a look at the final code in my github site here. Enjoy and do keep in touch, I would love to hear if you have found this series useful. ## CoreML – iOS Implementation for the Boston Model (part 3) – Button We are very close at getting a functioning app for our Boston Model. In the last post we were able to put together the code that fills in the values in the picker and were able to “pick” the values shown for crime rate and number of rooms respectively. These values are fed to the model we built in one of the earlier posts of this series and the idea is that we will action this via a button that triggers the calculation of the prediction. In turn the prediction will be shown in a floating dialogue box. In this post we are going to activate the functionality of the button and show the user the values that have been picked. With this we will be ready to weave in the CoreML model in the final post of this series. So, what are we waiting for? Let us launch Xcode and get working. We have already done a bit of work for the button in the previous post where we connected the button to the ViewController generating a line of code that read as follows: @IBOutlet weak var predictButton: UIButton! If we launch the application and click on the button, sadly, nothing will happen. Let’s change that: in the definition of the UIViewController class, after the didReceiveMemoryWarning function write the following piece of code: @IBAction func getPrediction() { let selectedCrimeRow = inputPicker.selectedRow(inComponent: inputPredictor.crime.rawValue) let crime = crimeData[selectedCrimeRow] let selectedRoomRow = inputPicker.selectedRow(inComponent: inputPredictor.rooms.rawValue) let rooms = roomData[selectedRoomRow] let message = "The picked values are Crime: \(crime) and Rooms: \(rooms)" let alert = UIAlertController(title: "Values Picked", message: message, preferredStyle: .alert) let action = UIAlertAction(title: "OK", style: .default, handler: nil) alert.addAction(action) present(alert, animated: true, completion: nil) } The first four lines of the getPrediction function takes the values from the picker and creates some constants for crime and rooms that will then be used in a message to be displayed in the application. We are telling Xcode to treat this message as an alert and ask it to present it to the user (last line in the code above). What we need to do now is tell Xcode that this function is to be triggered when we click on the button. There are several way we can connect the button with the code above. In this case we are going to go to the Main.storyboard, control+click on the button and drag. This will show an arrow, we need to connect that arrow with the View Controller icon (a yellow circle with a white square inside) at the top of the view controller window we are putting together. When you let go, you will see a drop-down menu. From there, under “Sent Events” select the function we created above, namely getPrediction. See the screenshots below: You can now run the application. Select a number from each of the columns in the picker, and when ready, prepare to be amazed: Click on the “Calculate Prediction” button, et voilà – you will see a new window telling you the values you have just picked. Tap “OK” and start again! In the next post we will add the CoreML model, and modify the event for the button to take the two values picked and calculate a prediction which in turn will be shown in the floating window. Stay tuned. You can look at the code (in development) in my github site here. ## CoreML – iOS Implementation for the Boston Model (part 2) – Filling the Picker Right! Where were we? Yes, last time we put together a skeleton for the CoreML Boston Model application that will take two inputs (crime rate and number of rooms) and provide a prediction of the price of a Boston property (yes, based on somewhat all prices…). We are making use of three three labels, one picker and one button. Let us start creating variables to hold the potential values for the input variables. We will do this in the ViewController by selecting this file from the left-hand side menu: Inside the ViewController class definition enter the following variable assignments: let crimeData = Array(stride(from: 0.1, through: 0.3, by: 0.01)) let roomData = Array(4...9) These values are informed by the data exploration we carried out in an earlier post. We are going to use the arrays defined above to populate the values that will be shown in our picker. For this we need to define a data source for the picker and make sure that there are two components to choose values from. Before we do any of that we need to connect the view from our storyboard to the code, in particular we need to create outlets for the picker and for the button. Select the Main.storyboard from the menu in the left-hand side. With the Main.storyboard in view, in the top right-hand corner of Xcode you will see a button with an icon that has two intersecting circles, click on that icon. you will now see the storyboard side-by-side with the code. While pressing the Control key, select the picker by clicking on it; without letting go drag into the code window (you will see an arrow appear as you drag): You will se a dialogue window where you can now enter a name for the element in your Storyboard. In this case I am calling my picker inputPicker, as shown in the figure on the left. After pressing the “connect” button a new line of code appears and you will see a small circle on top of the code line number indicating that a connection with the Storyboard has been made. Do the same for the button and call it predictButton. In order to make our life a little bit easier, we are going to bundle together the input values. At the bottom of the ViewController code write the following: enum inputPredictor: Int { case crime = 0 case rooms } We have define an object called inputPredictor that will hold the values of for crime and rooms. In turn we will use this object to populate the picker as follows: In the same ViewController file, after the class definition that is provided in the project by default we are going to write an extension for the data source. Write the following code: extension ViewController: UIPickerViewDataSource { func numberOfComponents(in pickerView: UIPickerView) -> Int { return 2 } func pickerView(_ pickerView: UIPickerView, numberOfRowsInComponent component: Int) -> Int { guard let inputVals = inputPredictor(rawValue: component) else { fatalError("No predictor for component") } switch inputVals { case .crime: return crimeData.count case .rooms: return roomData.count } } } With the function numberOfComponents we are indicating that we want to have 2 components in this view. Notice that inside the pickerView function we are creating a constant inputVals defined by the values from inputPredictor. So far we have indicated where the values for the picker come from, but we have not delegated the actions that can be taken with those values, namely displaying them and picking them (after all, this element is a picker!) so that we can use the values elsewhere. If you were to execute this app, you will see an empty picker… OK, so what we need to do is create the UIPickerViewDelegate, and we do this by entering the following code right under the previous snippet: extension ViewController: UIPickerViewDelegate { func pickerView(_ pickerView: UIPickerView, titleForRow row: Int, forComponent component: Int) -> String? { guard let inputVals = inputPredictor(rawValue: component) else { fatalError("No predictor for component") } switch inputVals { case .crime: return String(crimeData[row]) case .rooms: return String(roomData[row]) } } func pickerView(_ pickerView: UIPickerView, didSelectRow row: Int, inComponent component: Int) { guard let inputVals = inputPredictor(rawValue: component) else { fatalError("No predictor for component") } switch inputVals { case .crime: print(String(crimeData[row])) case .rooms: print(String(roomData[row])) } } }  In the first function we are defining what values are supposed to be shown for the titleForRow in the picker, and we do this for each of the two elements we have, i.e. crime and rooms. In the second function we are defining what happens when we didSelectRow, in other words select the value that is being shown by each of the two elements in the picker. Not too bad, right? Well, if you were to run this application you will still see no change in the picker… Why is that? The answer is that we need to let the application know what needs to be show when the elements load. Go back to the top of the code (around line 20 or so) below the code lines that defined the outlets for the picker and the button. There write the following code: override func viewDidLoad() { super.viewDidLoad() // Picker data source and delegate inputPicker.dataSource = self inputPicker.delegate = self } OK, we can now run the application: On the top left-hand side of the Xcode window you will see a play button; clicking on it will launch the Simulator and you will be able to see your picker working. Go on, select a few values from each of the elements: In the next post we will write code to activate the button to run a prediction using our CoreML model with the values selected from the picker and show the result to the user. Stay tuned! You can look at the code (in development) in my github site here. ## CoreML – Model properties If you have been following the posts in this open notebook, you may know that by now we have managed to create a linear regression model for the Boston Price dataset based on two predictors, namely crime rate and average number of rooms. It is by no means the best model out there ad our aim is to explore the creation of a model (in this case with Python) and convert it to a Core ML model that can be deployed in an iOS app. Before move on to the development of the app, I thought it would be good to take a look at the properties of the converted model. If we open the PriceBoston.mlmodel we saved in the previous post (in Xcode of course) we will see the following information: We can see the name of the model (PriceBoston) and the fact that it is a “Pipeline Regressor”. The model can be given various attributes such as Author, Description, License, etc. We can also see the listing of the Model Evaluation Parameters in the form of Inputs (crime rate and number of rooms) and Outputs (price). There is also an entry to describe the Model Class (PriceBoston) and without attaching this model to a target the class is actually not present. Once we make this model part of a target inside an app, Xcode will generate the appropriate code Just to give you a flavour of the code that will be generated when we attach this model to a target, please take a look at the screenshot below: You can see that the code was generated automatically (see the comment at the beginning of the Swift file). The code defines the input variables and feature names, defines a way to extract values out of the input strings, sets up the model output and other bits and pieces such as defining the class for model loading and prediction (not shown). All this is taken care of by Xcode, making it very easy for us to use the model in our app. We will start building that app in the following posts (bear with me, I promise we will get there). Enjoy! ## CoreML – Building the model for Boston Prices In the last post we have taken a look at the Boston Prices dataset loaded directly from Scikit-learn. In this post we are going to build a linear regression model and convert it to a .mlmodel to be used in an iOS app. We are going to need some modules: import coremltools import pandas as pd from sklearn import datasets, linear_model from sklearn.model_selection import train_test_split from sklearn import metrics import numpy as np The cormeltools is the module that will enable the conversion to use our model in iOS. Let us start by defining a main function to load the dataset: def main(): print('Starting up - Loading Boston dataset.') boston = datasets.load_boston() boston_df = pd.DataFrame(boston.data) boston_df.columns = boston.feature_names print(boston_df.columns) In the code above we have loaded the dataset and created a pandas dataframe to hold the data and the names of the columns. As we mentioned in the previous post, we are going to use only the crime rate and the number of rooms to create our model:  print("We now choose the features to be included in our model.") X = boston_df[['CRIM', 'RM']] y = boston.target Please note that we are separating the target variable from the predictor variables. Although this dataset in not too large, we are going to follow best practice and split the data into training and testing sets:  X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=7) We will only use the training set in the creation of the model and will test with the remaining data points.  my_model = glm_boston(X_train, y_train) The line of code above assumes that we have defined the function glm_boston as follows: def glm_boston(X, y): print("Implementing a simple linear regression.") lm = linear_model.LinearRegression() gml = lm.fit(X, y) return gml Notice that we are using the LinearRegression implementation in Scikit-learn. Let us go back to the main function we are building and extract the coefficients for our linear model. Refer to the CoreML – Linear Regression post to remember that type of model that we are building is of the form $y=\alpha + \beta_1 x_1 + \beta_2 x_2 + \epsilon$:  coefs = [my_model.intercept_, my_model.coef_] print("The intercept is {0}.".format(coefs[0])) print("The coefficients are {0}.".format(coefs[1])) We can also take a look at some metrics that tell let us evaluate our model against the test data:  # calculate MAE, MSE, RMSE print("The mean absolute error is {0}.".format( metrics.mean_absolute_error(y_test, y_pred))) print("The mean squared error is {0}.".format( metrics.mean_squared_error(y_test, y_pred))) print("The root mean squared error is {0}.".format( np.sqrt(metrics.mean_squared_error(y_test, y_pred)))) ## CoreML conversion And now for the big moment: We are going to convert our model to an .mlmodel object!! Ready?  print("Let us now convert this model into a Core ML object:") # Convert model to Core ML coreml_model = coremltools.converters.sklearn.convert(my_model, input_features=["crime", "rooms"], output_feature_names="price") # Save Core ML Model coreml_model.save("PriceBoston.mlmodel") print("Done!") We are using the sklearn.convert method of coremltools.converters to create the my_model model with the necessary inputs (i.e. crime and rooms) and output (price). Finally we save the model in a file with the name PriceBoston.mlmodel. Et voilà! In the next post we will start creating an iOS app to use the model we have just built. You can look at the code (in development) in my github site here. ## CoreML – Boston Prices exploration In the previous post of this series we described some of the basics of linear regression, one of the most well-known models in machine learning. We saw that we can relate the values of input parameters $x_i$ to the target variable $y$ to be predicted. In this post we are going to create a linear regression model to predict the price of houses in Boston (based on valuations from 1970s). The dataset provides information such as Crime (CRIM), areas of non-retail business in the town (INDUS), the age of people who own the house (AGE), average number of rooms (RM) as well as the median value of homes in$1000s (MEDV) as well as other attributes.

Let us start by exploring the data. We are going to use Scikit-learn and fortunately the dataset comes with the module. The input variables are included in the data method and the price is given by the target. We are going to load the input variables in the dataframe boston_df and the prices in the array y:

from sklearn import datasets
import pandas as pd
boston_df = pd.DataFrame(boston.data)
boston_df.columns = boston.feature_names
y = boston.target

We are going to build our model using only a limited number of inputs. In this case let us pay attention to the average number of rooms and the crime rate:

X = boston_df[['CRIM', 'RM']]
X.columns = ['Crime', 'Rooms']
X.describe()

The description of these two attributes is as follows:

            Crime       Rooms
count  506.000000  506.000000
mean     3.593761    6.284634
std      8.596783    0.702617
min      0.006320    3.561000
25%      0.082045    5.885500
50%      0.256510    6.208500
75%      3.647423    6.623500
max     88.976200    8.780000

As we can see the minimum number of rooms is 3.5 and the maximum is 8.78, whereas for the crime rate the minimum is 0.006 and the maximum value is 88.97, nonetheless the median is 0.25. We will use some of these values to define the ranges that will be provided to our users to find price predictions.

Finally, let us visualise the data:

We shall bear these values in mind when building our regression model in subsequent posts.

You can look at the code (in development) in my github site here.

## CoreML – Linear Regression

Hello again, where were we? … Oh yes, we have been discussing CoreML and have even set up an appropriate python 2 environment to work with CoreML. In this post we are going to cover some of the most basic aspects of the workhorse of machine learning: the dependable linear regression model.

We are indeed all familiar with a line of best fit, and I am sure that many of us remember doing some by hand (you know who you are) and who hasn’t played with Excel’s capabilities? In a nutshell, a linear regression is a model that relates a variable $y$ to one or more explanatory (or independent) variables $X$. The parameters that define the model are estimated from the available data and there are a number of assumptions about the explanatory variables and you can find more information in my Data Science and Analytics with Python book. We can think of the goal of a linear regression model to draw a line though the data as exemplified in the plot below:

Let us take the case of 2 independent variables $x_1$ and $x_2$. The linear regression model to predict our target variable $y$ is given by:

$y=\alpha + \beta_1 x_1 + \beta_2 x_2 + \epsilon$,

where $\alpha$and $\beta_i$ are the parameters to be estimated to help us generate predictions. With the aid of techniques such as least squares can estimate the parameters $\alpha, \beta_1$ and $\beta_2$ by minimising the sum of the squares of the residuals, i,.e the difference between an observed value, and the fitted value provided by a model. Once we have determined the parameters, we are able to score new (unseen) data for $x_1$ and $x_2$ to predict the value of $y$.

In the next post we will show how we can do this for the Boston House Prices dataset using a couple of variables such as number of bedrooms in the property and a crime index for the area. Remember that the aim will be to show how to build the model to be used with CoreML and not a perfect model for the prediction.

Keep in touch.

-j