CoreML – iOS Implementation for the Boston Model (part 3) – Button

We are very close at getting a functioning app for our Boston Model. In the last post we were able to put together the code that fills in the values in the picker and were able to “pick” the values shown for crime rate and number of rooms respectively. These values are fed to the model we built in one of the earlier posts of this series and the idea is that we will action this via a button that triggers the calculation of the prediction. In turn the prediction will be shown in a floating dialogue box.

In this post we are going to activate the functionality of the button and show the user the values that have been picked. With this we will be ready to weave in the CoreML model in the final post of this series. So, what are we waiting for? Let us launch Xcode and get working. We have already done a bit of work for the button in the previous post where we connected the button to the ViewController generating a line of code that read as follows:

@IBOutlet weak var predictButton: UIButton!

If we launch the application and click on the button, sadly, nothing will happen. Let’s change that: in the definition of the UIViewController class, after the didReceiveMemoryWarning function write the following piece of code:

@IBAction func getPrediction() {
        let selectedCrimeRow = inputPicker.selectedRow(inComponent: inputPredictor.crime.rawValue)
        let crime = crimeData[selectedCrimeRow]

        let selectedRoomRow = inputPicker.selectedRow(inComponent: inputPredictor.rooms.rawValue)
        let rooms = roomData[selectedRoomRow]

        let message = "The picked values are Crime: \(crime) and Rooms: \(rooms)"

        let alert = UIAlertController(title: "Values Picked",
                                      message: message,
                                      preferredStyle: .alert)

        let action = UIAlertAction(title: "OK", style: .default,
                                   handler: nil)

        alert.addAction(action)
        present(alert, animated: true, completion: nil)
    }

The first four lines of the getPrediction function takes the values from the picker and creates some constants for crime and rooms that will then be used in a message to be displayed in the application. We are telling Xcode to treat this message as an alert and ask it to present it to the user (last line in the code above). What we need to do now is tell Xcode that this function is to be triggered when we click on the button.

There are several way we can connect the button with the code above. In this case we are going to go to the Main.storyboard, control+click on the button and drag. This will show an arrow, we need to connect that arrow with the View Controller icon (a yellow circle with a white square inside) at the top of the view controller window we are putting together. When you let go, you will see a drop-down menu. From there, under “Sent Events” select the function we created above, namely getPrediction. See the screenshots below:

You can now run the application. Select a number from each of the columns in the picker, and when ready, prepare to be amazed: Click on the “Calculate Prediction” button, et voilà – you will see a new window telling you the values you have just picked. Tap “OK” and start again!

In the next post we will add the CoreML model, and modify the event for the button to take the two values picked and calculate a prediction which in turn will be shown in the floating window. Stay tuned.

You can look at the code (in development) in my github site here.

JupyterLab is Ready for Users

This is a reblog of this original post.

JupyterLab is Ready for Users

We are proud to announce the beta release series of JupyterLab, the next-generation web-based interface for Project Jupyter.

Project Jupyter Feb 20
tl;dr: JupyterLab is ready for daily use (documentation, try it with Binder)

1*_jDTWlZNUySwrRBgVNqoNw.png&amp;amp;lt;img class=”progressiveMedia-noscript js-progressiveMedia-inner” src=”<a href= “https://cdn-images-1.medium.com/max/1600/1*_jDTWlZNUySwrRBgVNqoNw.png“>https://cdn-images-1.medium.com/max/1600/1*_jDTWlZNUySwrRBgVNqoNw.png</a>“&amp;amp;gt;

JupyterLab is an interactive development environment for working with notebooks, code, and data.

The Evolution of the Jupyter Notebook

Project Jupyter exists to develop open-source software, open standards, and services for interactive and reproducible computing.

Since 2011, the Jupyter Notebook has been our flagship project for creating reproducible computational narratives. The Jupyter Notebook enables users to create and share documents that combine live code with narrative text, mathematical equations, visualizations, interactive controls, and other rich output. It also provides building blocks for interactive computing with data: a file browser, terminals, and a text editor.

The Jupyter Notebook has become ubiquitous with the rapid growth of data science and machine learning and the rising popularity of open-source software in industry and academia:

  • Today there are millions of users of the Jupyter Notebook in many domains, from data science and machine learning to music and education. Our international community comes from almost every country on earth.¹
  • The Jupyter Notebook now supports over 100 programming languages, most of which have been developed by the community.
  • There are over 1.7 million public Jupyter notebooks hosted on GitHub. Authors are publishing Jupyter notebooks in conjunction with scientific research, academic journals, data journalism, educational courses, and books.

At the same time, the community has faced challenges in using various software workflows with the notebook alone, such as running code from text files interactively. The classic Jupyter Notebook, built on web technologies from 2011, is also difficult to customize and extend.

JupyterLab: Ready for Users

JupyterLab is an interactive development environment for working with notebooks, code and data. Most importantly, JupyterLab has full support for Jupyter notebooks. Additionally, JupyterLab enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.

1*O20XGvUOTLoFKQ9o20usIA.png&amp;amp;lt;img class=”progressiveMedia-noscript js-progressiveMedia-inner” src=”<a href= “https://cdn-images-1.medium.com/max/1600/1*O20XGvUOTLoFKQ9o20usIA.png“>https://cdn-images-1.medium.com/max/1600/1*O20XGvUOTLoFKQ9o20usIA.png</a>“&amp;amp;gt;

JupyterLab enables you to arrange your work area with notebooks, text files, terminals, and notebook outputs.JupyterLab provides a high level of integration between notebooks, documents, and activities:

  • Drag-and-drop to reorder notebook cells and copy them between notebooks.
  • Run code blocks interactively from text files (.py, .R, .md, .tex, etc.).
  • Link a code console to a notebook kernel to explore code interactively without cluttering up the notebook with temporary scratch work.
  • Edit popular file formats with live preview, such as Markdown, JSON, CSV, Vega, VegaLite, and more.

JupyterLab has been over three years in the making, with over 11,000 commits and 2,000 releases of npm and Python packages. Over 100 contributors from the broader community have helped build JupyterLab in addition to our core JupyterLab developers.

To get started, see the JupyterLab documentation for installation instructions and a walk-through, or try JupyterLab with Binder. You can also set up JupyterHub to use JupyterLab.

Customize Your JupyterLab Experience

JupyterLab is built on top of an extension system that enables you to customize and enhance JupyterLab by installing additional extensions. In fact, the builtin functionality of JupyterLab itself (notebooks, terminals, file browser, menu system, etc.) is provided by a set of core extensions.

1*OneJZOqKqBZ9oN80kRX7kQ.png&amp;amp;lt;img class=”progressiveMedia-noscript js-progressiveMedia-inner” src=”<a href= “https://cdn-images-1.medium.com/max/1600/1*OneJZOqKqBZ9oN80kRX7kQ.png“>https://cdn-images-1.medium.com/max/1600/1*OneJZOqKqBZ9oN80kRX7kQ.png</a>“&amp;amp;gt;

JupyterLab extensions enable you to work with diverse data formats such as GeoJSON, JSON and CSV.²Among other things, extensions can:

  • Provide new themes, file editors and viewers, or renderers for rich outputs in notebooks;
  • Add menu items, keyboard shortcuts, or advanced settings options;
  • Provide an API for other extensions to use.

Community-developed extensions on GitHub are tagged with the jupyterlab-extension topic, and currently include file viewers (GeoJSON, FASTA, etc.), Google Drive integration, GitHub browsing, and ipywidgets support.

Develop JupyterLab Extensions

While many JupyterLab users will install additional JupyterLab extensions, some of you will want to develop your own. The extension development API is evolving during the beta release series and will stabilize in JupyterLab 1.0. To start developing a JupyterLab extension, see the JupyterLab Extension Developer Guide and the TypeScript or JavaScript extension templates.

JupyterLab itself is co-developed on top of PhosphorJS, a new Javascript library for building extensible, high-performance, desktop-style web applications. We use modern JavaScript technologies such as TypeScript, React, Lerna, Yarn, and webpack. Unit tests, documentation, consistent coding standards, and user experience research help us maintain a high-quality application.

JupyterLab 1.0 and Beyond

We plan to release JupyterLab 1.0 later in 2018. The beta releases leading up to 1.0 will focus on stabilizing the extension development API, user interface improvements, and additional core features. All releases in the beta series will be stable enough for daily usage.

JupyterLab 1.0 will eventually replace the classic Jupyter Notebook. Throughout this transition, the same notebook document format will be supported by both the classic Notebook and JupyterLab.

Get Involved

There are many ways you can participate in the JupyterLab effort. We welcome contributions from all members of the Jupyter community:

  • Use our extension development API to make your own JupyterLab extensions. Please add the jupyterlab-extension topic if your extension is hosted on GitHub. We appreciate feedback as we evolve toward a stable API for JupyterLab 1.0.
  • Contribute to the development, documentation, and design of JupyterLab on GitHub. To get started with development, please see our Contributing Guide and Code of Conduct. We label issues that are ideal for new contributors as “good first issue” or “help wanted”.
  • Connect with us on our GitHub Issues page or on our Gitter Channel. If you find a bug, have questions, or want to provide feedback, please join the conversation.

We are thrilled to see how you use and extend JupyterLab.

Sincerely,

The JupyterLab Team and Project Jupyter

We thank Bloomberg and Anaconda for their support and collaboration in developing JupyterLab. We also thank the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, and the Helmsley Charitable Trust for their support.

[1] Based on the 249 country codes listed under ISO 3166–1, recent Google analytics data from 2018 indicates that jupyter.org has hosted visitors from 213 countries.

[2] Data visualized in this screenshot is licensed CC-BY-NC 3.0. See http://datacanvas.org/public-transportation/ for more details.

nteract – a great Notebook experience

I am a supporter of using Jupyter Notebooks for data exploration and code prototyping. It is a great way to start writing code and immediately get interactive feedback. Not only can you document your code there using markdown, but also you can embed images, plots, links and bring your work to life.

Nonetheless, there are some little annoyances that I have, for instance the fact that I need to launch a Kernel to open a file and having to do that “the long way” – i.e. I cannot double-click on the file that I am interested in seeing. Some ways to overcome this include looking at Gihub versions of my code as the notebooks are rendered automatically, or even saving HTML or PDF versions of the notebooks. I am sure some of you may have similar solutions for this.

Last week, while looking for entries on something completely different, I stumbled upon a post that suggested using nteract. It sounded promising and I took a look. It turned out to be related to the Hydrogen package available for Atom, something I have used in the past and loved it. nteract was different though as it offered a desktop version and other goodies such as in-app support for publishing, a terminal-free experience sticky cells, input and output hiding… Bring it on!

I just started using it, and so far so good. You may want to give it a try, and maybe even contribute to the git repo.

nteract_screenshot.jpg

CoreML – Building the model for Boston Prices

In the last post we have taken a look at the Boston Prices dataset loaded directly from Scikit-learn. In this post we are going to build a linear regression model and convert it to a .mlmodel to be used in an iOS app.

We are going to need some modules:

import coremltools
import pandas as pd
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn import metrics
import numpy as np

The cormeltools is the module that will enable the conversion to use our model in iOS.

Let us start by defining a main function to load the dataset:

def main():
    print('Starting up - Loading Boston dataset.')
    boston = datasets.load_boston()
    boston_df = pd.DataFrame(boston.data)
    boston_df.columns = boston.feature_names
    print(boston_df.columns)

In the code above we have loaded the dataset and created a pandas dataframe to hold the data and the names of the columns. As we mentioned in the previous post, we are going to use only the crime rate and the number of rooms to create our model:

    print("We now choose the features to be included in our model.")
    X = boston_df[['CRIM', 'RM']]
    y = boston.target

Please note that we are separating the target variable from the predictor variables. Although this dataset in not too large, we are going to follow best practice and split the data into training and testing sets:

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=7)

We will only use the training set in the creation of the model and will test with the remaining data points.

    my_model = glm_boston(X_train, y_train)

The line of code above assumes that we have defined the function glm_boston as follows:

def glm_boston(X, y):
    print("Implementing a simple linear regression.")
    lm = linear_model.LinearRegression()
    gml = lm.fit(X, y)
    return gml

Notice that we are using the LinearRegression implementation in Scikit-learn. Let us go back to the main function we are building and extract the coefficients for our linear model. Refer to the CoreML – Linear Regression post to remember that type of model that we are building is of the form  y=\alpha + \beta_1 x_1 + \beta_2 x_2 + \epsilon:

    coefs = [my_model.intercept_, my_model.coef_]
    print("The intercept is {0}.".format(coefs[0]))
    print("The coefficients are {0}.".format(coefs[1]))

We can also take a look at some metrics that tell let us evaluate our model against the test data:

    # calculate MAE, MSE, RMSE
    print("The mean absolute error is {0}.".format(
        metrics.mean_absolute_error(y_test, y_pred)))
    print("The mean squared error is {0}.".format(
        metrics.mean_squared_error(y_test, y_pred)))
    print("The root mean squared error is {0}.".format(
        np.sqrt(metrics.mean_squared_error(y_test, y_pred))))

CoreML conversion

And now for the big moment: We are going to convert our model to an .mlmodel object!! Ready?

    print("Let us now convert this model into a Core ML object:")
    # Convert model to Core ML
    coreml_model = coremltools.converters.sklearn.convert(my_model,
                                        input_features=["crime", "rooms"],
                                        output_feature_names="price")
    # Save Core ML Model
    coreml_model.save("PriceBoston.mlmodel")
    print("Done!")

We are using the sklearn.convert method of coremltools.converters to create the my_model model with the necessary inputs (i.e. crime and rooms) and output (price). Finally we save the model in a file with the name PriceBoston.mlmodel.

Et voilà! In the next post we will start creating an iOS app to use the model we have just built.

You can look at the code (in development) in my github site here.

CoreML – Boston Prices exploration

In the previous post of this series we described some of the basics of linear regression, one of the most well-known models in machine learning. We saw that we can relate the values of input parameters x_i to the target variable y to be predicted. In this post we are going to create a linear regression model to predict the price of houses in Boston (based on valuations from 1970s). The dataset provides information such as Crime (CRIM), areas of non-retail business in the town (INDUS), the age of people who own the house (AGE), average number of rooms (RM) as well as the median value of homes in $1000s (MEDV) as well as other attributes.

Let us start by exploring the data. We are going to use Scikit-learn and fortunately the dataset comes with the module. The input variables are included in the data method and the price is given by the target. We are going to load the input variables in the dataframe boston_df and the prices in the array y:

from sklearn import datasets
import pandas as pd 
boston = datasets.load_boston() 
boston_df = pd.DataFrame(boston.data)
boston_df.columns = boston.feature_names
y = boston.target

We are going to build our model using only a limited number of inputs. In this case let us pay attention to the average number of rooms and the crime rate:

X = boston_df[['CRIM', 'RM']]
X.columns = ['Crime', 'Rooms']
X.describe()

The description of these two attributes is as follows:

            Crime       Rooms
count  506.000000  506.000000
mean     3.593761    6.284634
std      8.596783    0.702617
min      0.006320    3.561000
25%      0.082045    5.885500
50%      0.256510    6.208500
75%      3.647423    6.623500
max     88.976200    8.780000

As we can see the minimum number of rooms is 3.5 and the maximum is 8.78, whereas for the crime rate the minimum is 0.006 and the maximum value is 88.97, nonetheless the median is 0.25. We will use some of these values to define the ranges that will be provided to our users to find price predictions.

Finally, let us visualise the data:

We shall bear these values in mind when building our regression model in subsequent posts.

You can look at the code (in development) in my github site here.

CoreML – Linear Regression

Hello again, where were we? … Oh yes, we have been discussing CoreML and have even set up an appropriate python 2 environment to work with CoreML. In this post we are going to cover some of the most basic aspects of the workhorse of machine learning: the dependable linear regression model.

We are indeed all familiar with a line of best fit, and I am sure that many of us remember doing some by hand (you know who you are) and who hasn’t played with Excel’s capabilities? In a nutshell, a linear regression is a model that relates a variable  y to one or more explanatory (or independent) variables  X. The parameters that define the model are estimated from the available data and there are a number of assumptions about the explanatory variables and you can find more information in my Data Science and Analytics with Python book. We can think of the goal of a linear regression model to draw a line though the data as exemplified in the plot below:

Let us take the case of 2 independent variables  x_1 and x_2. The linear regression model to predict our target variable  y is given by:

 y=\alpha + \beta_1 x_1 + \beta_2 x_2 + \epsilon,

where  \alphaand  \beta_i are the parameters to be estimated to help us generate predictions. With the aid of techniques such as least squares can estimate the parameters  \alpha, \beta_1 and  \beta_2 by minimising the sum of the squares of the residuals, i,.e the difference between an observed value, and the fitted value provided by a model. Once we have determined the parameters, we are able to score new (unseen) data for  x_1 and  x_2 to predict the value of  y.

In the next post we will show how we can do this for the Boston House Prices dataset using a couple of variables such as number of bedrooms in the property and a crime index for the area. Remember that the aim will be to show how to build the model to be used with CoreML and not a perfect model for the prediction.

Keep in touch.

-j

Core ML – Preparing the environment

Hello again! In preparation to training a model to be converted by Core ML to be used in an application, I would like to make sure we have a suitable environment to work on. One of the first things that came to my attention looking at the coreml module is the fact that it only supports Python 2! Yes, you read correctly, you will have to make sure you use Python 2.7 if you want to make this work. As you probably know, Python 2 will be retired in 2020, so I hope that Apple is considering in their development cycles. Python 3 is now finally supported! In the meantime you can see the countdown to Python 2’s retirement here, and thanks Python 2 for the many years of service…

Anyway, if you are a Python 2 3 user, then you are good to go. If on the other hand you have moved with the times you may need to make appropriate installations. I am using Anaconda (you may use your favourite distro) and I will be creating a conda environment (I’m calling it coreml) with Python 2.7 and some of the libraries I will be using:

> conda create --name coreml python=3 ipython jupyter scikit-learn

> conda activate coreml

(coreml) 
> pip install coremltools

I am sure there may be other modules that will be needed, and I will make appropriate installations (and additions to this post) as that becomes clearer.

You can get a look at Apple’s coremltools github repo here.

ADDITIONS: As I mentioned, there may have been other modules that needed installing in the new environment here is a list:

  • pandas
  • matplotlib
  • pillow

Core ML – What is it?

In a previous post I mentioned that I will be sharing some notes about my journey with doing data science and machine learning by Apple technology. This is the firsts of those posts and here I will go about what Core ML is…

Core ML is a computer framework. So what is a framework?  Well, in computer terms is a software abstraction that enables generic functionality to be modified as required by the user to transform it into software for specific purposes to enable the development of a system or even a humble project.

So Core ML is an Apple provided framework to speed apps that use trained machine learning models. Notice that word in bold – trained – is part of the description of the framework. This means that the model has to be developed externally with appropriate training data for the specific project in mind. For instance if you are interested in building a classifier that distinguishes cats from cars, then you need to train the model with lots of cat and car images.

As it stands Core ML supports a variety of machine learning models, from generalised linear models (GLMs for short) to neural nets. Furthermore it helps with the tests of adding the trained machine learning model to your application by automatically creating a custom programmatic interface that supplies an APU to your model. All this within the comfort of Xcode!

There is an important point to remember. The model has to be developed externally from Core ML, in other words you may want to use your favourite machine learning framework (that word again), computer language and environment to cover the different aspects of the data science workflow. You can read more in that in Chapter 3 of my “Data Science and Analytics with Python” book. So whether you use Scikit-learnm, Keras or Caffe, the model you develop has to be trained (tested and evaluated) beforehand. Once you are ready, then Core ML will support you in bringing it to the masses via your app.

As mentioned in the Core ML documentation:

Core ML is optimized for on-device performance, which minimizes memory footprint and power consumption. Running strictly on the device ensures the privacy of user data and guarantees that your app remains functional and responsive when a network connection is unavailable.

OK, so in the next few posts we will be using Python and coreml tools to generate a so-called .mlmodel file that Xcode can use and deploy. Stay tuned!

The incredible growth of Python

A reblog of the post by Dave Robinson | 09/12/2017

We recently explored how wealthy countries (those defined as high-income by the World Bank) tend to visit a different set of technologies than the rest of the world. Among the largest differences we saw was in the programming language Python. When we focus on high-income countries, the growth of Python is even larger than it might appear from tools like Stack Overflow Trends, or in other rankings that consider global software development.

In this post, we’ll explore the extraordinary growth of the Python programming language in the last five years, as seen by Stack Overflow traffic within high-income countries. The term “fastest-growing” can be hard to define precisely, but we make the case that Python has a solid claim to being the fastest-growing major programming language.

All the numbers discussed in this post are for high-income countries; they’re generally representative of trends in the United States, United Kingdom, Germany, Canada, and other such countries, which in combination make up about 64% of Stack Overflow’s traffic. Many other countries such as India, Brazil, Russia, and China also make enormous contributions to the global software development ecosystem, and this post is less descriptive of those economies, though we’ll see that Python has shown growth there as well.

It’s worth emphasizing up front that the number of users of a language isn’t a measure of the language’s quality: we’re describing the languages developers use, but not prescribing anything. (Full disclosure: I used to programprimarily in Python, though I have since switched entirely to R).

Python’s growth in high-income countries

You can see on Stack Overflow Trends that Python has been growing rapidly in the last few years. But for this post we’ll focus on high-income countries, and consider visits to questions rather than questions asked (this tends to give similar results, but has less month-by-month noise, especially for smaller tags).

We have data on Stack Overflow question views going back to late 2011, and in this time period we can consider the growth of Python relative to five other major programming languages. (Note that this is therefore a shorter time scale than the Trends tool, which goes back to 2008). These are currently six of the ten most-visited Stack Overflow tags in high-income countries; the four we didn’t include are CSS, HTML, Android, and JQuery.

June 2017 was the first month that Python was the most visited tag on Stack Overflow within high-income nations. This included being the most visited tag within the US and the UK, and in the top 2 in almost all other high income nations (next to either Java or JavaScript). This is especially impressive because in 2012, it was less visited than any of the other 5 languages, and has grown by 2.5-fold in that time.

Part of this is because of the seasonal nature of traffic to Java. Since it’s heavily taught in undergraduate courses, Java traffic tends to rise during the fall and spring and drop during the summer. Will it catch up with Python again by the end of the year? We can try forecasting the next two years of growth with a http://otexts.org/fpp2/sec-6-stl.html, which combines growth with seasonal trends to make a prediction about future values.

According to this model, Python could either stay in the lead or be overtaken by Java in the fall (it’s roughly within the variation of the model’s predictions), but it’s clearly on track to become the most visited tag in 2018. STL also suggests that JavaScript and Java will remain at similar levels of traffic among high income countries, just as they have for the last two years.

What tags are growing the fastest overall?

The above was looking only at the six most-visited programming languages. Among other notable technologies, which are currently growing the fastest in high-income countries?

We defined the growth rate in terms of the ratio between 2017 and 2016 share of traffic. We decided to consider only programming languages (like Java and Python) and platforms (such as iOS, Android, Windows and Linux) in this analysis, as opposed to frameworks like Angular or libraries like TensorFlow (although many of those showed notable growth that may be examined in a future post).

Because of the challenges in defining “fastest-growing” described in this comic, we compare the growth to the overall average in a mean-difference plot.

With a 27% year-over year-growth rate, Python stands alone as a tag that is both large and growing rapidly; the next-largest tag that shows similar growth is R. We see that traffic to most other large tags has stayed pretty steady within high-income countries, with visits to Android, iOS, and PHP decreasing slightly. We previously examined some of the shrinking tags like Objective-C, Perl and Ruby in our post on the death of Flash). We can also notice that among functional programming languages, Scala is the largest and growing, while F# and Clojure are smaller and shrinking, with Haskell in between and remaining steady.

There’s an important omission from the above chart: traffic to TypeScript questions grew by an impressive 142% in the last year, enough that we left it off to avoid overwhelming the rest of the scale. You can also see that some other smaller languages are growing similarly or faster than Python (like R, Go and Rust), and there are a number of tags like Swift and Scala that are also showing impressive growth. How does their traffic over time compare to Python’s?

The growth of languages like R and Swift is indeed impressive, and TypeScript has shown especially rapid expansion in an even shorter time. Many of these smaller languages grew from getting almost no question traffic to become notable presences in the software ecosystem. But as this graph shows, it’s easier to show rapid growth when a tag started relatively small.

Note that we’re not saying these languages are in any way “competing” with Python. Rather, we’re explaining why we’d treat their growth in a separate category; these were lower-traffic tags to start with. Python is an unusual case for being both one of the most visited tags on Stack Overflow and one of the fastest-growing ones. (Incidentally, it is also accelerating! Its year-over-year growth has become faster each year since 2013).

Rest of the world

So far in this post we’ve been analyzing the trends in high-income countries. Does Python show a similar growth in the rest of the world, in countries like India, Brazil, Russia and China?

Indeed it does.

Outside of high-income countries Python is still the fastest growing major programming language; it simply started at a lower level and the growth began two years later (in 2014 rather than 2012). In fact, the year-over-year growth rate of Python in non-high-income countries is slightly higher than it is in high-income countries. We don’t examine it here, but R, the other language whose usage is positively correlated with GDP, is growing in these countries as well.

Many of the conclusions in this post about the growth and decline of tags (as opposed to the absolute rankings) in high-income countries hold true for the rest of the world; there’s a 0.979 Spearman correlation between the growth rates in the two segments. In some cases, you can see a “lagging” phenomenon similar to what happened with Python, where a technology was widely adopted within high-income countries a year or two before it expanded in the rest of the world. (This is an interesting phenomenon and may be the subject of a future blog post!)

Next time

We’re not looking to contribute to any “language war.” The number of users of a language doesn’t imply anything about its quality, and certainly can’t tell you which language is more appropriate for a particular situation. With that perspective in mind, however, we believe it’s worth understanding what languages make up the developer ecosystem, and how that ecosystem might be changing.

This post demonstrated that Python has shown a surprising growth in the last five years, especially within high-income countries. In our next post, we’ll start to explore the “why”. We’ll segment the growth by country and by industry, and examine what other technologies tend to be used alongside Python (to estimate, for example, how much of the growth has been due to increased usage of Python for web development versus for data science).

Original Source.

[tags Programming, Phyton, Data Science]