Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

Warning: array_key_exists(): The first argument should be either a string or an integer in /homepages/46/d820956895/htdocs/clickandbuilds/jrogelwebsite/wp-content/plugins/jetpack/modules/theme-tools/content-options/featured-images-fallback.php on line 49

A collection of posts related to Coding, Programming, Hacking and Computer Tricks. Take a look and enjoy

Getting Answers for Core ML deployment from my own Book

I was working today in the deployment of a small neural network model prototype converted to Core ML to be used in an iPhone app.

I was trying to find the best way to get things to work and then it occurred to me I had solved a similar issue before... where‽ when‽ aha!

The answer was actually in my Advanced Data Science and Analytics with Python.

Read me...

Natural Language Processing Talk - Newspaper Article

With the lockdown and social distancing rules forcing all of us to adjust our calendars, events and even lesson plans and lectures, I was not surprised to hear of speaking opportunities that otherwise may not arise.

A great example is the reprise of a talk I gave about a year ago while visiting Mexico. It was a great opportunity to talk to Social Science students at the Political Science Faculty of the Universidad Autónoma del Estado de México. The subject was open but had to cover the use of technology and I thought that talking about the use of natural language processing in terms of digital humanities would be a winner. And it was...

In March this year I was approached by the Faculty to re-run the talk but this time instead of doing it face to face we would use a teleconference room. Not only was I, the speaker, talking from the comfort of my own living room, but also all the attendees would be at home. Furthermore, some of the students may not have access to the live presentation (lack of broadband, equipment, etc) and recoding the session for later usage was the best option for them.

I didn’t hesitate in saying yes, and I enjoyed the interaction a lot. Today I learnt that the session was the focus of a small note in a local newspaper. The session was run in Spanish and the note in Portal, the local newspaper, is in Spanish too. I really liked that they picked a line I used in the session to convince the students that technology is not just for the natural sciences:

“Hay que hacer ciencias sociales con técnicas del Siglo XIX... El mundo es de los geeks.

“We should study social sciences applying techniques of the 21st Century. The world today belongs to us, the geeks.

The point is that although qualitative and quantitative techniques are widely used in social science, the use of new platforms and even programming languages such as python open up opportunities for social scientists too.

The talk is available in the blog the class uses to share their discussions: The Share Knowledge Network - Follow this link for the talk.

The newspaper article by Ximena Barragán can be found here.

Read me...

Computer Programming Knowledge

I came across the image above in the Slack channel of the University of Hertfordshire Centre for Astrophysics Research. It summarises some of the fundamental knowledge in computer science that was assumed necessary at some point in time: Binar, CPU execution and algorithms.

They refer to 7 algorithms, but actually rather than actual algorithms they are classes:

  1. Sort
  2. Search
  3. Hashing
  4. Dynamic Programming
  5. Binary Exponentiation
  6. String Matching and Parsing
  7. Primality Testing

I like the periodic table shown at the bottom of the graphic. Showing some old friends such as Fortran, C, Basic and Cobol. Some other that are probably not used all that much, and others that have definitely been rising: Javascript, Java, C++, Lisp. It is great to se Python, number 35, listed as Multi-Paradigm!


Read me...

Cover Draft for “Advanced Data Science and Analytics with Python”

I have received the latest information about the status of my book “Advanced Data Science and Analytics with Python”. This time reviewing the latest cover drafts for the book.

This is currently my favourite one.

Awaiting the proofreading comments, and I hope to update you about that soon.

Read me...

LibreOffice - Dialogue boxes showing blanks

I have been using LibreOffice on and off for a few years now and generally I think it is a great alternative to the MS Office offering. It does the tasks that are required and the improvements over different versions have been steady and useful

I had however a very strange experience in which dialogue boxes and other windows such as alerts and messages just showed blank text. It was obvious that there was some important information in them, but it was not possible to read them. In some cases it was ok... I mean I knew here the "OK" button was expected to appear, or where "Cancel" should be placed. However, it was an annoying (at best) and limiting (at worst) exoperience.

After digging in a bit I realised what the problem was. The fonts that were supposed to be showing were at fault. The culprits were as follows:

  • DINRegular.ttf, and
  • DINRegularAlternate.ttf

After removing these two fonts from ~/Library/Fonts/ everything went back to normal. I hope this helps in case you are having a similar issue.

Read me...

Pandas 1.0 is out

If you are interested in #DataScience you surely have heard of #pandas and you would be pleased to hear that version 1.0 finally out. With better integration with bumpy and improvements with numba among others. Take a look!
— Read on www.anaconda.com/pandas-1-0-is-here/

Read me...

MacOS - No Floating Thumbnail when taking a screenshot

Have you tried taking a screenshot in your Mac and are annoyed at having to wait for the floating thumbnail - in other words you wait for 5 seconds before the screenshot becomes a file? Well here you can find out how to get rid of that.

Follow these steps:

1) Type CMD + SHIFT + 5
2) Click OPTIONS
3) Uncheck "Show Floating Thumbnail"
4) Et voilà!

See the screenshot above!

Read me...

Advanced Data Science and Analytics with Python - Submitted!

There you go, the first checkpoint is completed: I have officially submitted the completed version of "Advanced Data Science and Analytics with Python".

The book has been some time in the making (and in the thinking...). It is a follow up from my previous book, imaginatively called "Data Science and Analytics with Python" . The book covers aspects that were necessarily left out in the previous volume; however, the readers in mind are still technical people interested in moving into the data science and analytics world. I have tried to keep the same tone as in the first book, peppering the pages with some bits and bobs of popular culture, science fiction and indeed Monty Python puns. 

Advanced Data Science and Analytics with Python enables data scientists to continue developing their skills and apply them in business as well as academic settings. The subjects discussed in this book are complementary and a follow up from the topics discuss in Data Science and Analytics with Python. The aim is to cover important advanced areas in data science using tools developed in Python such as SciKit-learn, Pandas, Numpy, Beautiful Soup, NLTK, NetworkX and others. The development is also supported by the use of frameworks such as Keras, TensorFlow and Core ML, as well as Swift for the development of iOS and MacOS applications.

The book can be read independently form the previous volume and each of the chapters in this volume is sufficiently independent from the others proving flexibiity for the reader. Each of the topics adressed in the book tackles the data science workflow from a practical perspective, concentrating on the process and results obtained. The implementation and deployment of trained models are central to the book

Time series analysis, natural language processing, topic modelling, social network analysis, neural networds and deep learning are comprehensively covrered in the book. The book discusses the need to develop data products and tackles the subject of bringing models to their intended audiences. In this case literally to the users fingertips in the form of an iPhone app.

While the book is still in the oven, you may want to take a look at the first volume. You can get your copy here:

Furthermore you can see my Author profile here.

Read me...

Natural Language Processing - Talk

Last October I had the great opportunity to come and give a talk at the Facultad de Ciencias Políticas, UAEM, México. The main audience were students of the qualitative analysis methods course, but there were people also from informatics and systems engineering.

It was an opportunity to showcase some of the advances that natural language processing offers to social scientists interested in analysing discourse, from politics through to social interactions.

The talk covered a introduction and brief history of the field. We went through the different stages of the analysis, from reading the data, obtaining tokens and labelling their part of speech (POS) and then looking at syntactic and semantic analysis.

We finished the session with a couple of demos. One looking at speeches of Clinton and Trump during their presidential campaigns; the other one was a simple analysis of a novel in Spanish.

Thanks for the invite.

Read me...

Apple Developer Support

It is great to see all the support that Apple Developers get in terms of tools, ecosystem, community and more.


For starters the Developer Support portal has a ton of information for the new comer as well as for the more expert of experts. Including guides and documentation for tools such as Xcode as well as information for developing software for MacOS and iOS.

Information about Design is available in the same place, including Human Interface Guidelines, Fonts (including downloads for San Francisco!) and information about accessibility and localisation.

Information about new tools and updates such as the latest about Swift, and SwiftUI can be easily found. And testing your apps with the help of tools such as TestFlight makes things so much easier.

Read me...

Adding new conda environment kernel to Jupyter and nteract

I know there are a ton of posts out there covering this very topic. I am writing this post more for my out benefit, so that I have a reliable place to check the commands I need to add a new conda environment to my Jupyter and nteract IDEs.

First to create an environment that contains, say TensorFlow, Pillow, Keras and pandas we need to type the following in the command line:

$ conda create -n tensorflow_env tensorflow pillow keras pandas jupyter ipykernel nb_conda

Now, to add this to the list of available environments in either Jupyter or nteract, we type the following:

$ conda activate tensor_env

$ python -m ipykernel install --name tensorflow_env

$ conda deactivate

Et voilà, you should now see the environment in the dropdown menu!

Read me...

Data Science and Analytics with Python - Social Network Analysis

Using the time wisely during the Bank Holiday weekend. As my dad would say, "resting while making bricks"... Currently reviewing/editing/correcting Chapter 3 of "Advanced Data Science and Analytics with Python". Yes, that is volume 2 of "Data Science and Analytics with Python".


Read me...

Social Network Analysis and Star Wars

On my way back to London and making the most of the time in the train to work on my Data Science and Analytics Vol 2 book. Working with #StarWars data to explain Social Network Analysis #datascience #geek

Read me...

Python - Pendulum

Working with dates and times in programming can be a painful test at times. In Python, there are some excellent libraries that help with all the pain, and recently I became aware of Pendulum. It is effectively are replacement for the standard datetime class and it has a number of improvements. Check out the documentation for further information.

Installation of the packages is straightforward with pip:

$ pip install pendulum

For example, some simple manipulations involving time zones:

import pendulum

now = pendulum.now('Europe/Paris')

# Changing timezone

# Default support for common datetime formats

# Shifting

Duration can be used as a replacement for the standard timedelta class:

dur = pendulum.duration(days=15)

# More properties

# Handy methods
'2 weeks 1 day'

It also supports the definition of a period, i.e. a duration that is aware of the DateTime instances that created it. For example:

dt1 = pendulum.now()
dt2 = dt1.add(days=3)

# A period is the difference between 2 instances
period = dt2 - dt1


# A period is iterable
for dt in period:

Give it a go, and let me know what you think of it. 

Read me...

File Encoding with the Command Line - Determining and Converting

With the changes that Python 3 has brought to bear in terms of dealing with character encodings, I have written before some tips that I use on my day to day work. It is sometimes useful to determine the character encoding of a files at a much earlier stage. The command line is a perfect tool to help us with these issues. 

The basic syntax you need is the following one:

$ file -I filename

Furthermore, you can even use the command line to convert the encoding of a file into another one. The syntax is as follows:

$ iconv -f encoding_source -t encoding_target filename

For instance if you needed to convert an ISO88592 file called input.txt into UTF8 you can use the following line:

$ iconv -f iso-8859-1 -t utf-8 < input.txt > output.txt

If you want to check a list of know coded characters that you can handle with this command simply type:

$ iconv --list

Et voilà!


Read me...

Backslashes v Forward Slashes - Windows, Linux and Mac

"Why do I have to use backslashes (\) in Windows, but forward slashes (/) in everything else?" This is a question that I have been asked by a number of people over the years and I have been meaning to write something about it for a long time now. 

It seems that Windows is really the odd one out as Linux, OS X and even Android uses forward slashes. It seems that the cause of this annoying (at times) difference is due to accidental events. 

In the 1970s, Unix first introduced the forward slash to separate entries in a directory path. So far so good. In the meantime, the initial version of MS DOS did not even support the use of directories... and we are talking early 80s here! At the time, IBM has the main contributor to Microsofr utilities and they used the forward slash as a flag or switch character (In Unix we use a hyphen for this). You can still see a vestigial tail in some commands... Think dir /w for example. 

 The next version of MS DOs started support for directories and to keep compatibility, IBM expected to continue usage of / as a flag and as such the alternative for directory path separation, Windows started using \. Once you start using this in your own environment, who cares what other people use in their operating systems!! Right? In that way, in Windows the use of the different slashes tells you if you are running  an option (/) or a directory path (\). 

And the rest, as they say, is history!

Read me...

Magic Mouse - Secondary Click Not Working

I have recently taken Mojave for a spin and I am really happy with the changes in the new OS. I know it is merely eye-candy, but I really like the dark theme. Things have been working well, but I came across a nagging issue with my MagicMouse:

For some reason the secondary click would simply not work. I had made sure the settings were enabled by making sure that the "Secondary Click" option was ticked (see screenshot below). I tried ticking it on and off, restarting the machine, deleting the mouse and reconnecting it... nothing had worked...

Finally I decided to take a look at some of the plist files and here is my solution to this problem:

    1. Go to the ~/Library/Preferences/ directory
    2. Delete the following files:

      Restart the machine

Et voilà!

Read me...

IEEE Language Rankings 2018

Python retains its top spot in the fifth annual IEEE Spectrum top programming language rankings, and also gains a designation as an "embedded language". Data science language R remains the only domain-specific slot in the top 10 (where it as listed as an "enterprise language") and drops one place compared to its 2017 ranking to take the #7 spot.

Looking at other data-oriented languages, Matlab as at #11 (up 3 places), SQL is at #24 (down 1), Julia at #32 (down 1) and SAS at #40 (down 3). Click the screenshot below for an interactive version of the chart where you can also explore the top 50 rankings.

Language Rank

The IEEE Spectrum rankings are based on search, social media, and job listing trends, GitHub repositories, and mentions in journal articles. You can find details on the ranking methodology here, and discussion of the trends behind the 2018 rankings at the link below.

IEEE Spectrum: The 2018 Top Programming Languages

Read me...

Building things with Python

Very pleased to see some of the things we are building with the Intro to Python for Data Science class this evening.

Read me...

Persistent "Previous Recipients" in Mac Mail

Hello everyone! I am very pleased to take a question from John who got in touch with Quantum Tunnel using the form here. John's favourite scientist is Einstein and his question is as follows:

In Mac mail I cannot delete unwanted email addresses. I have done the routine of deleting all addresses from the previous receiptant list, but when starting a new email unwanted addresses appear.. Any help is appreciated. Thanks, John

John is referring to the solution I provided in this earlier post. Sadly, the list of his lucky friends/colleagues/family (delete as appropriate) he has email recently persists even after clearing the "Previous Recipients" as explained in the post before.

There may be a way to force the clearing of these persistent email address:

  • Quit Mail and Address Book (in case the latter is open)
  • Open a terminal and type the following command:
    • `rm ~/Library/Application Support/AddressBook/MailRecents-v4.abcdmr`
  • Log out and back in again
  • Start Mail
  • You may have to clear the "Previous Recipients" list as per the post mentioned above

You should now be able to clear the list. And... In case you were wondering, the file we deleted should be created afresh to start accumulating new "recent recipients" (yay!)

Et voilà!

Read me...

Finding iBooks Files in My Mac

I was looking for the location of iBooks files (including ePub, PDFs and others) so that I can curate the list of manually exported files. Finding iBooks in my Mac should not be a difficult task, although it took a few minutes. I thought of sharing that here in the blog for future reference and in the hope that some of yo may find it useful.

We will use the Terminal, as doing things from Finder tends to redirect us. A first place to look into is the following one:


Now, that may not be the entire list of your books. In case you have enabled iCloud, then things may be stored in your Mobile Documents folder:

cd ~/Library/Mobile Documents/iCloud~com~apple~iBooks/Documents/

For things that you have bought in the iBooks store, take a look here:

cd ~/Library/Containers/com.apple.BKAgentService/Data/Documents/iBooks

Et voilà!


Read me...

CoreML - Boston Model: The Complete App

Look how far we have come... We started this series by looking at what CoreML is and made sure that our environment was suitable. We decided to use linear regression as our model, and chose to use the Boston Price dataset in our exploration for this implementation. We built our model using Python and created our .mlmodel object and had a quick exploration of the model's properties. We then started to build our app using Xcode (see Part 1, Part 2 and Part 3). In this final part we are going to take the .mlmodel and include it in out Xcode project, we will then use the inputs selected from out picker and calculate a prediction (based on our model) to be displayed to the user. Are you ready? Nu kör vi!

Let us start by adding the .mlmodel we created earlier on so that it is an available resource in our project. Open your Xcode project and locate your PriceBoston.mlmodel file. From the menu on the left-hand side select the "BostonPricer" folder. At the bottom of the window you will see a + sign, click on it and select "New Groups". This will create a sub-folder within "BostonPricer". Select the new folder and hit the return key, this will let you rename the folder to something more useful. In this case I am going to call this folder "Resources".

Open Finder and navigate to the location of your BostonPricer.mlmodel. Click and drag the file inside the "Resources" folder we just created. This will open a dialogue box asking for some options for adding this file to your project. I selected the "Create Folder References" and left the rest as it was shown by default. After hitting "Finish" you will see your model now being part of your project. Let's now go the code in ViewController and make some needed changes.  The first one is to tell our project that we are going to need the powers of the CoreML framework. At the top of the file, locate a line of code that imports UIKit, right below it type the following:

import CoreML

Inside the definition of the ViewController class, let us define a constant to reference the model. Look for the definitions of the crimeData and roomData constants and nearby them type the following:

let model = PriceBoston()

You will see that when you start typing the name of the model, Xcode will suggest the right name as it knows about the existence of the model as part of its resources, neat!

We need to make some changes to the getPrediction()function we created in the last post. Go to the function and look for place where we pick the values of crime and rooms and right after that write the following:

guard let priceBostonOutput = try? model.prediction(
            rooms: Double(rooms)
            ) else {
                fatalError("Unexpected runtime error.")

You may get a warning telling you that the constant priceBostonOutput was defined but not used. Don't worry, we will indeed use it in a little while. Just a couple of words about this piece of code, you will see that we are using the prediction method defined in the model and that we are passing the two input parameters that the model expects, namely crime and rooms. We are wrapping this call to the prediction method around a try statement so that we can catch any exceptions. This is where we are implementing our CoreML mode!!! Isn't that cool‽

We are not done yet though; remember that we have that warning from Xcode about using the model. Looking at the properties of the model, we can see that we also have an output attribute called price. This is the prediction we are looking for and the one we would like to display. Out of the box it may have a lot of decimal figures, and it is never a good practice to display those to the user (although they are important in precision terms...). Also, with Swift's strong typing we would have to typecast the double returned by the model into a string that can be printed. So, let us prepare some code to format the predicted price. At the top of the ViewController class, find the place where we defined the constants crimeData and roomData. Below them type the following code:

let priceFormat: NumberFormatter = {
        let formatting = NumberFormatter()
        formatting.numberStyle = .currency
        formatting.maximumFractionDigits = 2
        formatting.locale = Locale(identifier: "en_US")
        return formatting

We are defining a format that will show a number as currency in US dollars with two decimal figures. We can now pass our predicted price to this formatter and assign it to a new constant for future reference. Below the code where the getPrediction function was defined, write the following:

let priceText = priceFormat.string(from: NSNumber(value:

Now we have a nicely formatted string that can be used in the display. Let us change the message that we are asking our app to show when pressing the button:

let message = "The predicted price (in $1,000s) is " + priceText!

We are done! Launch your app simulator, select a couple of values from the picker and hit the "Calculate Prediction" button... Et voilà, we have completed our first implementation of a CoreML model in a working app.

There are many more things that we can do to improve the app. For instance, we can impose some constraints on the position of the different elements shown in the screen so that we can deploy the application in the various screen sizes offered by Apple devices. Improve the design and usability of the app and designing appropriate icons for the app (in various sizes). For the time being, I will leave some of those tasks for later. In the meantime you can take a look at the final code in my github site here.

Enjoy and do keep in touch, I would love to hear if you have found this series useful.


Read me...

CoreML - iOS Implementation for the Boston Model (part 3) - Button

We are very close at getting a functioning app for our Boston Model. In the last post we were able to put together the code that fills in the values in the picker and were able to "pick" the values shown for crime rate and number of rooms respectively. These values are fed to the model we built in one of the earlier posts of this series and the idea is that we will action this via a button that triggers the calculation of the prediction. In turn the prediction will be shown in a floating dialogue box.

In this post we are going to activate the functionality of the button and show the user the values that have been picked. With this we will be ready to weave in the CoreML model in the final post of this series. So, what are we waiting for? Let us launch Xcode and get working. We have already done a bit of work for the button in the previous post where we connected the button to the ViewController generating a line of code that read as follows:

@IBOutlet weak var predictButton: UIButton!

If we launch the application and click on the button, sadly, nothing will happen. Let's change that: in the definition of the UIViewController class, after the didReceiveMemoryWarning function write the following piece of code:

@IBAction func getPrediction() {
        let selectedCrimeRow = inputPicker.selectedRow(inComponent: inputPredictor.crime.rawValue)
        let crime = crimeData[selectedCrimeRow]

        let selectedRoomRow = inputPicker.selectedRow(inComponent: inputPredictor.rooms.rawValue)
        let rooms = roomData[selectedRoomRow]

        let message = "The picked values are Crime: \(crime) and Rooms: \(rooms)"

        let alert = UIAlertController(title: "Values Picked",
                                      message: message,
                                      preferredStyle: .alert)

        let action = UIAlertAction(title: "OK", style: .default,
                                   handler: nil)

        present(alert, animated: true, completion: nil)

The first four lines of the getPrediction function takes the values from the picker and creates some constants for crime and rooms that will then be used in a message to be displayed in the application. We are telling Xcode to treat this message as an alert and ask it to present it to the user (last line in the code above). What we need to do now is tell Xcode that this function is to be triggered when we click on the button.

There are several way we can connect the button with the code above. In this case we are going to go to the Main.storyboard, control+click on the button and drag. This will show an arrow, we need to connect that arrow with the View Controller icon (a yellow circle with a white square inside) at the top of the view controller window we are putting together. When you let go, you will see a drop-down menu. From there, under "Sent Events" select the function we created above, namely getPrediction. See the screenshots below:

You can now run the application. Select a number from each of the columns in the picker, and when ready, prepare to be amazed: Click on the "Calculate Prediction" button, et voilà - you will see a new window telling you the values you have just picked. Tap "OK" and start again!

In the next post we will add the CoreML model, and modify the event for the button to take the two values picked and calculate a prediction which in turn will be shown in the floating window. Stay tuned.

You can look at the code (in development) in my github site here.

Read me...

JupyterLab is Ready for Users

This is a reblog of this original post.

JupyterLab is Ready for Users

We are proud to announce the beta release series of JupyterLab, the next-generation web-based interface for Project Jupyter.

Project Jupyter Feb 20
tl;dr: JupyterLab is ready for daily use (documentation, try it with Binder)

1*_jDTWlZNUySwrRBgVNqoNw.png&amp;amp;lt;img class="progressiveMedia-noscript js-progressiveMedia-inner" src="<a href= "https://cdn-images-1.medium.com/max/1600/1*_jDTWlZNUySwrRBgVNqoNw.png">https://cdn-images-1.medium.com/max/1600/1*_jDTWlZNUySwrRBgVNqoNw.png</a>"&amp;amp;gt;

JupyterLab is an interactive development environment for working with notebooks, code, and data.

The Evolution of the Jupyter Notebook

Project Jupyter exists to develop open-source software, open standards, and services for interactive and reproducible computing.

Since 2011, the Jupyter Notebook has been our flagship project for creating reproducible computational narratives. The Jupyter Notebook enables users to create and share documents that combine live code with narrative text, mathematical equations, visualizations, interactive controls, and other rich output. It also provides building blocks for interactive computing with data: a file browser, terminals, and a text editor.

The Jupyter Notebook has become ubiquitous with the rapid growth of data science and machine learning and the rising popularity of open-source software in industry and academia:

  • Today there are millions of users of the Jupyter Notebook in many domains, from data science and machine learning to music and education. Our international community comes from almost every country on earth.¹
  • The Jupyter Notebook now supports over 100 programming languages, most of which have been developed by the community.
  • There are over 1.7 million public Jupyter notebooks hosted on GitHub. Authors are publishing Jupyter notebooks in conjunction with scientific research, academic journals, data journalism, educational courses, and books.

At the same time, the community has faced challenges in using various software workflows with the notebook alone, such as running code from text files interactively. The classic Jupyter Notebook, built on web technologies from 2011, is also difficult to customize and extend.

JupyterLab: Ready for Users

JupyterLab is an interactive development environment for working with notebooks, code and data. Most importantly, JupyterLab has full support for Jupyter notebooks. Additionally, JupyterLab enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.

1*O20XGvUOTLoFKQ9o20usIA.png&amp;amp;lt;img class="progressiveMedia-noscript js-progressiveMedia-inner" src="<a href= "https://cdn-images-1.medium.com/max/1600/1*O20XGvUOTLoFKQ9o20usIA.png">https://cdn-images-1.medium.com/max/1600/1*O20XGvUOTLoFKQ9o20usIA.png</a>"&amp;amp;gt;

JupyterLab enables you to arrange your work area with notebooks, text files, terminals, and notebook outputs.JupyterLab provides a high level of integration between notebooks, documents, and activities:

  • Drag-and-drop to reorder notebook cells and copy them between notebooks.
  • Run code blocks interactively from text files (.py, .R, .md, .tex, etc.).
  • Link a code console to a notebook kernel to explore code interactively without cluttering up the notebook with temporary scratch work.
  • Edit popular file formats with live preview, such as Markdown, JSON, CSV, Vega, VegaLite, and more.

JupyterLab has been over three years in the making, with over 11,000 commits and 2,000 releases of npm and Python packages. Over 100 contributors from the broader community have helped build JupyterLab in addition to our core JupyterLab developers.

To get started, see the JupyterLab documentation for installation instructions and a walk-through, or try JupyterLab with Binder. You can also set up JupyterHub to use JupyterLab.

Customize Your JupyterLab Experience

JupyterLab is built on top of an extension system that enables you to customize and enhance JupyterLab by installing additional extensions. In fact, the builtin functionality of JupyterLab itself (notebooks, terminals, file browser, menu system, etc.) is provided by a set of core extensions.

1*OneJZOqKqBZ9oN80kRX7kQ.png&amp;amp;lt;img class="progressiveMedia-noscript js-progressiveMedia-inner" src="<a href= "https://cdn-images-1.medium.com/max/1600/1*OneJZOqKqBZ9oN80kRX7kQ.png">https://cdn-images-1.medium.com/max/1600/1*OneJZOqKqBZ9oN80kRX7kQ.png</a>"&amp;amp;gt;

JupyterLab extensions enable you to work with diverse data formats such as GeoJSON, JSON and CSV.²Among other things, extensions can:

  • Provide new themes, file editors and viewers, or renderers for rich outputs in notebooks;
  • Add menu items, keyboard shortcuts, or advanced settings options;
  • Provide an API for other extensions to use.

Community-developed extensions on GitHub are tagged with the jupyterlab-extension topic, and currently include file viewers (GeoJSON, FASTA, etc.), Google Drive integration, GitHub browsing, and ipywidgets support.

Develop JupyterLab Extensions

While many JupyterLab users will install additional JupyterLab extensions, some of you will want to develop your own. The extension development API is evolving during the beta release series and will stabilize in JupyterLab 1.0. To start developing a JupyterLab extension, see the JupyterLab Extension Developer Guide and the TypeScript or JavaScript extension templates.

JupyterLab itself is co-developed on top of PhosphorJS, a new Javascript library for building extensible, high-performance, desktop-style web applications. We use modern JavaScript technologies such as TypeScript, React, Lerna, Yarn, and webpack. Unit tests, documentation, consistent coding standards, and user experience research help us maintain a high-quality application.

JupyterLab 1.0 and Beyond

We plan to release JupyterLab 1.0 later in 2018. The beta releases leading up to 1.0 will focus on stabilizing the extension development API, user interface improvements, and additional core features. All releases in the beta series will be stable enough for daily usage.

JupyterLab 1.0 will eventually replace the classic Jupyter Notebook. Throughout this transition, the same notebook document format will be supported by both the classic Notebook and JupyterLab.

Get Involved

There are many ways you can participate in the JupyterLab effort. We welcome contributions from all members of the Jupyter community:

  • Use our extension development API to make your own JupyterLab extensions. Please add the jupyterlab-extension topic if your extension is hosted on GitHub. We appreciate feedback as we evolve toward a stable API for JupyterLab 1.0.
  • Contribute to the development, documentation, and design of JupyterLab on GitHub. To get started with development, please see our Contributing Guide and Code of Conduct. We label issues that are ideal for new contributors as “good first issue” or “help wanted”.
  • Connect with us on our GitHub Issues page or on our Gitter Channel. If you find a bug, have questions, or want to provide feedback, please join the conversation.

We are thrilled to see how you use and extend JupyterLab.


The JupyterLab Team and Project Jupyter

We thank Bloomberg and Anaconda for their support and collaboration in developing JupyterLab. We also thank the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, and the Helmsley Charitable Trust for their support.

[1] Based on the 249 country codes listed under ISO 3166–1, recent Google analytics data from 2018 indicates that jupyter.org has hosted visitors from 213 countries.

[2] Data visualized in this screenshot is licensed CC-BY-NC 3.0. See http://datacanvas.org/public-transportation/ for more details.

Read me...

nteract - a great Notebook experience

I am a supporter of using Jupyter Notebooks for data exploration and code prototyping. It is a great way to start writing code and immediately get interactive feedback. Not only can you document your code there using markdown, but also you can embed images, plots, links and bring your work to life.

Nonetheless, there are some little annoyances that I have, for instance the fact that I need to launch a Kernel to open a file and having to do that "the long way" - i.e. I cannot double-click on the file that I am interested in seeing. Some ways to overcome this include looking at Gihub versions of my code as the notebooks are rendered automatically, or even saving HTML or PDF versions of the notebooks. I am sure some of you may have similar solutions for this.

Last week, while looking for entries on something completely different, I stumbled upon a post that suggested using nteract. It sounded promising and I took a look. It turned out to be related to the Hydrogen package available for Atom, something I have used in the past and loved it. nteract was different though as it offered a desktop version and other goodies such as in-app support for publishing, a terminal-free experience sticky cells, input and output hiding... Bring it on!

I just started using it, and so far so good. You may want to give it a try, and maybe even contribute to the git repo.


Read me...

CoreML - iOS Implementation for the Boston Model (part 2) - Filling the Picker

Right! Where were we? Yes, last time we put together a skeleton for the CoreML Boston Model application that will take two inputs (crime rate and number of rooms) and provide a prediction of the price of a Boston property (yes, based on somewhat all prices...). We are making use of three three labels, one picker and one button.

Let us start creating variables to hold the potential values for the input variables. We will do this in the ViewController by selecting this file from the left-hand side menu:








Inside the ViewController class definition enter the following variable assignments:

let crimeData = Array(stride(from: 0.1, through: 0.3, by: 0.01))
let roomData = Array(4...9)

These values are informed by the data exploration we carried out in an earlier post. We are going to use the arrays defined above to populate the values that will be shown in our picker. For this we need to define a data source for the picker and make sure that there are two components to choose values from.

Before we do any of that we need to connect the view from our storyboard to the code, in particular we need to create outlets for the picker and for the button. Select the Main.storyboard from the menu in the left-hand side. With the Main.storyboard in view, in the top right-hand corner of Xcode you will see a button with an icon that has two intersecting circles, click on that icon. you will now see the storyboard side-by-side with the code. While pressing the Control key, select the picker by clicking on it; without letting go drag into the code window (you will see an arrow appear as you drag):



You will se a dialogue window where you can now enter a name for the element in your Storyboard. In this case I am calling my picker inputPicker, as shown in the figure on the left. After pressing the "connect" button a new line of code appears and you will see a small circle on top of the code line number indicating that a connection with the Storyboard has been made. Do the same for the button and call it predictButton.



In order to make our life a little bit easier, we are going to bundle together the input values. At the bottom of the ViewController code write the following:

enum inputPredictor: Int {
    case crime = 0
    case rooms

We have define an object called inputPredictor that will hold the values of for crime and rooms. In turn we will use this object to populate the picker as follows: In the same ViewController file, after the class definition that is provided in the project by  default we are going to write an extension for the data source. Write the following code:

extension ViewController: UIPickerViewDataSource {

    func numberOfComponents(in pickerView: UIPickerView) -> Int {
        return 2

    func pickerView(_ pickerView: UIPickerView,
                    numberOfRowsInComponent component: Int) -> Int {
        guard let inputVals = inputPredictor(rawValue: component) else {
            fatalError("No predictor for component")

        switch inputVals {
        case .crime:
            return crimeData.count
        case .rooms:
            return roomData.count

With the function numberOfComponents we are indicating that we want to have 2 components in this view. Notice that inside the pickerView function we are creating a constant inputVals defined by the values from inputPredictor.  So far we have indicated where the values for the picker come from, but we have not delegated the actions that can be taken with those values, namely displaying them and picking them (after all, this element is a picker!) so that we can use the values elsewhere. If you were to execute this app, you will see an empty picker...

OK, so what we need to do is create the UIPickerViewDelegate, and we do this by entering the following code right under the previous snippet:

extension ViewController: UIPickerViewDelegate {
    func pickerView(_ pickerView: UIPickerView, titleForRow row: Int,
                    forComponent component: Int) -> String? {
        guard let inputVals = inputPredictor(rawValue: component) else {
            fatalError("No predictor for component")

        switch inputVals {
        case .crime:
            return String(crimeData[row])
        case .rooms:
            return String(roomData[row])

    func pickerView(_ pickerView: UIPickerView, didSelectRow row: Int,
                    inComponent component: Int) {
        guard let inputVals = inputPredictor(rawValue: component) else {
            fatalError("No predictor for component")

        switch inputVals {
        case .crime:
        case .rooms:


In the first function we are defining what values are supposed to be shown for the titleForRow in the picker, and we do this for each of the two elements we have, i.e. crime and rooms. In the second function we are defining what happens when we didSelectRow, in other words select the value that is being shown by each of the two elements in the picker. Not too bad, right?

Well, if you were to run this application you will still see no change in the picker... Why is that? The answer is that we need to let the application know what needs to be show when the elements load. Go back to the top of the code (around line 20 or so) below the code lines that defined the outlets for the picker and the button. There write the following code:

override func viewDidLoad() {
    // Picker data source and delegate
    inputPicker.dataSource = self
    inputPicker.delegate = self

OK, we can now run the application: On the top left-hand side of the Xcode window you will see a play button; clicking on it will launch the Simulator and you will be able to see your picker working. Go on, select a few values from each of the elements:

In the next post we will write code to activate the button to run a prediction using our CoreML model with the values selected from the picker and show the result to the user. Stay tuned!

You can look at the code (in development) in my github site here.

Read me...

Siri doesn’t like Rugby

Well, it seems that Siri does not like Rugby. Only information out baseball, basketball, American football, ice hockey or cricket (!). Apparently golf and tennis to follow...

Oh well...

Read me...

CoreML - Model properties

If you have been following the posts in this open notebook, you may know that by now we have managed to create a linear regression model for the Boston Price dataset based on two predictors, namely crime rate and average number of rooms. It is by no means the best model out there ad our aim is to explore the creation of a model (in this case with Python) and convert it to a Core ML model that can be deployed in an iOS app.

Before move on to the development of the app, I thought it would be good to take a look at the properties of the converted model. If we open the PriceBoston.mlmodel we saved in the previous post (in Xcode of course) we will see the following information:

We can see the name of the model (PriceBoston) and the fact that it is a "Pipeline Regressor". The model can be given various attributes such as Author, Description, License, etc. We can also see the listing of the Model Evaluation Parameters in the form of Inputs (crime rate and number of rooms) and Outputs (price). There is also an entry to describe the Model Class (PriceBoston) and without attaching this model to a target the class is actually not present. Once we make this model part of a target inside an app, Xcode will generate the appropriate code

Just to give you a flavour of the code that will be generated when we attach this model to a target, please take a look at the screenshot below:

You can see that the code was generated automatically (see the comment at the beginning of the Swift file). The code defines the input variables and feature names, defines a way to extract values out of the input strings, sets up the model output and other bits and pieces such as defining the class for model loading and prediction (not shown). All this is taken care of by Xcode, making it very easy for us to use the model in our app. We will start building that app in the following posts (bear with me, I promise we will get there).


Read me...

CoreML - Building the model for Boston Prices

In the last post we have taken a look at the Boston Prices dataset loaded directly from Scikit-learn. In this post we are going to build a linear regression model and convert it to a .mlmodel to be used in an iOS app.

We are going to need some modules:

import coremltools
import pandas as pd
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn import metrics
import numpy as np

The cormeltools is the module that will enable the conversion to use our model in iOS.

Let us start by defining a main function to load the dataset:

def main():
    print('Starting up - Loading Boston dataset.')
    boston = datasets.load_boston()
    boston_df = pd.DataFrame(boston.data)
    boston_df.columns = boston.feature_names

In the code above we have loaded the dataset and created a pandas dataframe to hold the data and the names of the columns. As we mentioned in the previous post, we are going to use only the crime rate and the number of rooms to create our model:

    print("We now choose the features to be included in our model.")
    X = boston_df[['CRIM', 'RM']]
    y = boston.target

Please note that we are separating the target variable from the predictor variables. Although this dataset in not too large, we are going to follow best practice and split the data into training and testing sets:

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=7)

We will only use the training set in the creation of the model and will test with the remaining data points.

    my_model = glm_boston(X_train, y_train)

The line of code above assumes that we have defined the function glm_boston as follows:

def glm_boston(X, y):
    print("Implementing a simple linear regression.")
    lm = linear_model.LinearRegression()
    gml = lm.fit(X, y)
    return gml

Notice that we are using the LinearRegression implementation in Scikit-learn. Let us go back to the main function we are building and extract the coefficients for our linear model. Refer to the CoreML - Linear Regression post to remember that type of model that we are building is of the form  y = \alpha + \beta_1 x_1 + \beta_2 x_2 + \epsilon:

    coefs = [my_model.intercept_, my_model.coef_]
    print("The intercept is {0}.".format(coefs[0]))
    print("The coefficients are {0}.".format(coefs[1]))

We can also take a look at some metrics that tell let us evaluate our model against the test data:

    # calculate MAE, MSE, RMSE
    print("The mean absolute error is {0}.".format(
        metrics.mean_absolute_error(y_test, y_pred)))
    print("The mean squared error is {0}.".format(
        metrics.mean_squared_error(y_test, y_pred)))
    print("The root mean squared error is {0}.".format(
        np.sqrt(metrics.mean_squared_error(y_test, y_pred))))

CoreML conversion

And now for the big moment: We are going to convert our model to an .mlmodel object!! Ready?

    print("Let us now convert this model into a Core ML object:")
    # Convert model to Core ML
    coreml_model = coremltools.converters.sklearn.convert(my_model,
                                        input_features=["crime", "rooms"],
    # Save Core ML Model

We are using the sklearn.convert method of coremltools.converters to create the my_model model with the necessary inputs (i.e. crime and rooms) and output (price). Finally we save the model in a file with the name PriceBoston.mlmodel.

Et voilà! In the next post we will start creating an iOS app to use the model we have just built.

You can look at the code (in development) in my github site here.

Read me...

Core ML - Preparing the environment

Hello again! In preparation to training a model to be converted by Core ML to be used in an application, I would like to make sure we have a suitable environment to work on. One of the first things that came to my attention looking at the coreml module is the fact that it only supports Python 2! Yes, you read correctly, you will have to make sure you use Python 2.7 if you want to make this work. As you probably know, Python 2 will be retired in 2020, so I hope that Apple is considering in their development cycles. Python 3 is now finally supported! In the meantime you can see the countdown to Python 2's retirement here, and thanks Python 2 for the many years of service...

Anyway, if you are a Python 2 3 user, then you are good to go. If on the other hand you have moved with the times you may need to make appropriate installations. I am using Anaconda (you may use your favourite distro) and I will be creating a conda environment (I'm calling it coreml) with Python 2.7 and some of the libraries I will be using:

> conda create --name coreml python=3 ipython jupyter scikit-learn

> conda activate coreml

> pip install coremltools

I am sure there may be other modules that will be needed, and I will make appropriate installations (and additions to this post) as that becomes clearer.

You can get a look at Apple's coremltools github repo here.

ADDITIONS: As I mentioned, there may have been other modules that needed installing in the new environment here is a list:

  • pandas
  • matplotlib
  • pillow
Read me...

Core ML - What is it?

In a previous post I mentioned that I will be sharing some notes about my journey with doing data science and machine learning by Apple technology. This is the firsts of those posts and here I will go about what Core ML is...

Core ML is a computer framework. So what is a framework?  Well, in computer terms is a software abstraction that enables generic functionality to be modified as required by the user to transform it into software for specific purposes to enable the development of a system or even a humble project.

So Core ML is an Apple provided framework to speed apps that use trained machine learning models. Notice that word in bold - trained - is part of the description of the framework. This means that the model has to be developed externally with appropriate training data for the specific project in mind. For instance if you are interested in building a classifier that distinguishes cats from cars, then you need to train the model with lots of cat and car images.

As it stands Core ML supports a variety of machine learning models, from generalised linear models (GLMs for short) to neural nets. Furthermore it helps with the tests of adding the trained machine learning model to your application by automatically creating a custom programmatic interface that supplies an APU to your model. All this within the comfort of Xcode!

There is an important point to remember. The model has to be developed externally from Core ML, in other words you may want to use your favourite machine learning framework (that word again), computer language and environment to cover the different aspects of the data science workflow. You can read more in that in Chapter 3 of my "Data Science and Analytics with Python" book. So whether you use Scikit-learnm, Keras or Caffe, the model you develop has to be trained (tested and evaluated) beforehand. Once you are ready, then Core ML will support you in bringing it to the masses via your app.

As mentioned in the Core ML documentation:

Core ML is optimized for on-device performance, which minimizes memory footprint and power consumption. Running strictly on the device ensures the privacy of user data and guarantees that your app remains functional and responsive when a network connection is unavailable.

OK, so in the next few posts we will be using Python and coreml tools to generate a so-called .mlmodel file that Xcode can use and deploy. Stay tuned!

Read me...

Apple ML

Machine Learning with Apple - An Open Notebook

We all know how cool machine learning, predictive analytics and data science concepts and problems are. There are a number of really interesting technologies and frameworks to use and choose from. I have been a Python and R user for some time now and they seem to be pretty good for a lot of the things I have to do on a day-to-day basis.

As many of you know, I am also a mac user and have been for quite a lot time. I remember using early versions of Mathematica on PowerMacs back at Uni... I digress..


Apple has also been moving into the machine learning arena and has made available a few interesting goodies that help people like me make the most of the models we develop.

I am starting a series of posts that I hope can be seen as an "open notebook" of my experimentation and learning with Apple technology. One that comes to mind is CoreML, a new framework that makes running various machine learning and statistical models on macOS and iOS natively supported. The idea is that the framework helps data scientists and developers bridge the gap between them by integrating trained models into our apps. Sounds cool, don't you think? Ready... Let's go!

Read me...

Now... presenting at ODSC Europe

Data science is definitely in everyone’s lips and this time I had the opportunity of showcasing some of my thoughts, practices and interests at the Open Data Science Conference in London.

The event was very well attended by data scientists, engineers and developers at all levels of seniority, as well as business stakeholders. I had the great opportunity to present the landscape that newcomers and seasoned practitioners must be familiar with to be able to make a successful transition into this exciting field.

It was also a great opportunity to showcase “Data Science and Analytics with Python” and to get to meet new people including some that know other members of my family too.


Read me...

10-10 Celebrate Ada Lovelace day!

It's Ada Lovelace day, celebrating the work of women in mathematics, science, technology and engineering. To join the celebration +Plus Magazine revisits a collection of interviews with female mathematicians produced earlier this year. The interviews accompany the Women of Mathematics photo exhibition, which celebrates female mathematicians from institutions throughout Europe. It was launched in Berlin in the summer of 2016 and is now touring European institutions.

To watch the interviews with the women or read the transcripts, and to see the portraits that featured in the exhibition, click on the links below. For more content by or about female mathematicians click here.

Read me...

The incredible growth of Python

A reblog of the post by Dave Robinson | 09/12/2017

We recently explored how wealthy countries (those defined as high-income by the World Bank) tend to visit a different set of technologies than the rest of the world. Among the largest differences we saw was in the programming language Python. When we focus on high-income countries, the growth of Python is even larger than it might appear from tools like Stack Overflow Trends, or in other rankings that consider global software development.

In this post, we’ll explore the extraordinary growth of the Python programming language in the last five years, as seen by Stack Overflow traffic within high-income countries. The term “fastest-growing” can be hard to define precisely, but we make the case that Python has a solid claim to being the fastest-growing major programming language.

All the numbers discussed in this post are for high-income countries; they’re generally representative of trends in the United States, United Kingdom, Germany, Canada, and other such countries, which in combination make up about 64% of Stack Overflow’s traffic. Many other countries such as India, Brazil, Russia, and China also make enormous contributions to the global software development ecosystem, and this post is less descriptive of those economies, though we’ll see that Python has shown growth there as well.

It’s worth emphasizing up front that the number of users of a language isn’t a measure of the language’s quality: we’re describing the languages developers use, but not prescribing anything. (Full disclosure: I used to programprimarily in Python, though I have since switched entirely to R).

Python’s growth in high-income countries

You can see on Stack Overflow Trends that Python has been growing rapidly in the last few years. But for this post we’ll focus on high-income countries, and consider visits to questions rather than questions asked (this tends to give similar results, but has less month-by-month noise, especially for smaller tags).

We have data on Stack Overflow question views going back to late 2011, and in this time period we can consider the growth of Python relative to five other major programming languages. (Note that this is therefore a shorter time scale than the Trends tool, which goes back to 2008). These are currently six of the ten most-visited Stack Overflow tags in high-income countries; the four we didn’t include are CSS, HTML, Android, and JQuery.

June 2017 was the first month that Python was the most visited tag on Stack Overflow within high-income nations. This included being the most visited tag within the US and the UK, and in the top 2 in almost all other high income nations (next to either Java or JavaScript). This is especially impressive because in 2012, it was less visited than any of the other 5 languages, and has grown by 2.5-fold in that time.

Part of this is because of the seasonal nature of traffic to Java. Since it’s heavily taught in undergraduate courses, Java traffic tends to rise during the fall and spring and drop during the summer. Will it catch up with Python again by the end of the year? We can try forecasting the next two years of growth with a http://otexts.org/fpp2/sec-6-stl.html, which combines growth with seasonal trends to make a prediction about future values.

According to this model, Python could either stay in the lead or be overtaken by Java in the fall (it’s roughly within the variation of the model’s predictions), but it’s clearly on track to become the most visited tag in 2018. STL also suggests that JavaScript and Java will remain at similar levels of traffic among high income countries, just as they have for the last two years.

What tags are growing the fastest overall?

The above was looking only at the six most-visited programming languages. Among other notable technologies, which are currently growing the fastest in high-income countries?

We defined the growth rate in terms of the ratio between 2017 and 2016 share of traffic. We decided to consider only programming languages (like Java and Python) and platforms (such as iOS, Android, Windows and Linux) in this analysis, as opposed to frameworks like Angular or libraries like TensorFlow (although many of those showed notable growth that may be examined in a future post).

Because of the challenges in defining “fastest-growing” described in this comic, we compare the growth to the overall average in a mean-difference plot.

With a 27% year-over year-growth rate, Python stands alone as a tag that is both large and growing rapidly; the next-largest tag that shows similar growth is R. We see that traffic to most other large tags has stayed pretty steady within high-income countries, with visits to Android, iOS, and PHP decreasing slightly. We previously examined some of the shrinking tags like Objective-C, Perl and Ruby in our post on the death of Flash). We can also notice that among functional programming languages, Scala is the largest and growing, while F# and Clojure are smaller and shrinking, with Haskell in between and remaining steady.

There’s an important omission from the above chart: traffic to TypeScript questions grew by an impressive 142% in the last year, enough that we left it off to avoid overwhelming the rest of the scale. You can also see that some other smaller languages are growing similarly or faster than Python (like R, Go and Rust), and there are a number of tags like Swift and Scala that are also showing impressive growth. How does their traffic over time compare to Python’s?

The growth of languages like R and Swift is indeed impressive, and TypeScript has shown especially rapid expansion in an even shorter time. Many of these smaller languages grew from getting almost no question traffic to become notable presences in the software ecosystem. But as this graph shows, it’s easier to show rapid growth when a tag started relatively small.

Note that we’re not saying these languages are in any way “competing” with Python. Rather, we’re explaining why we’d treat their growth in a separate category; these were lower-traffic tags to start with. Python is an unusual case for being both one of the most visited tags on Stack Overflow and one of the fastest-growing ones. (Incidentally, it is also accelerating! Its year-over-year growth has become faster each year since 2013).

Rest of the world

So far in this post we’ve been analyzing the trends in high-income countries. Does Python show a similar growth in the rest of the world, in countries like India, Brazil, Russia and China?

Indeed it does.

Outside of high-income countries Python is still the fastest growing major programming language; it simply started at a lower level and the growth began two years later (in 2014 rather than 2012). In fact, the year-over-year growth rate of Python in non-high-income countries is slightly higher than it is in high-income countries. We don’t examine it here, but R, the other language whose usage is positively correlated with GDP, is growing in these countries as well.

Many of the conclusions in this post about the growth and decline of tags (as opposed to the absolute rankings) in high-income countries hold true for the rest of the world; there’s a 0.979 Spearman correlation between the growth rates in the two segments. In some cases, you can see a “lagging” phenomenon similar to what happened with Python, where a technology was widely adopted within high-income countries a year or two before it expanded in the rest of the world. (This is an interesting phenomenon and may be the subject of a future blog post!)

Next time

We’re not looking to contribute to any “language war.” The number of users of a language doesn’t imply anything about its quality, and certainly can’t tell you which language is more appropriate for a particular situation. With that perspective in mind, however, we believe it’s worth understanding what languages make up the developer ecosystem, and how that ecosystem might be changing.

This post demonstrated that Python has shown a surprising growth in the last five years, especially within high-income countries. In our next post, we’ll start to explore the “why”. We’ll segment the growth by country and by industry, and examine what other technologies tend to be used alongside Python (to estimate, for example, how much of the growth has been due to increased usage of Python for web development versus for data science).

Original Source.

[tags Programming, Phyton, Data Science]

Read me...

Data Science and Analytics with Python - New York Team

Earlier this week I received this picture of the team in New York. As you can see they have recently all received a copy of my "Data Science and Analytics with Python" book.

Thanks guys!


Read me...

Languages for Data Science

Very often the question about what programming language is best for data science work. The answer may depend on who you ask, there are many options out there and they all have their advantages and disadvantages. Here are some thoughts from Peter Gleeson on this matter:

While there is no correct answer, there are several things to take into consideration. Your success as a data scientist will depend on many points, including:


When it comes to advanced data science, you will only get so far reinventing the wheel each time. Learn to master the various packages and modules offered in your chosen language. The extent to which this is possible depends on what domain-specific packages are available to you in the first place!


A top data scientist will have good all-round programming skills as well as the ability to crunch numbers. Much of the day-to-day work in data science revolves around sourcing and processing raw data or ‘data cleaning’. For this, no amount of fancy machine learning packages are going to help.


In the often fast-paced world of commercial data science, there is much to be said for getting the job done quickly. However, this is what enables technical debt to creep in — and only with sensible practices can this be minimized.


In some cases it is vital to optimize the performance of your code, especially when dealing with large volumes of mission-critical data. Compiled languages are typically much faster than interpreted ones; likewise statically typed languages are considerably more fail-proof than dynamically typed. The obvious trade-off is against productivity.

To some extent, these can be seen as a pair of axes (Generality-Specificity, Performance-Productivity). Each of the languages below fall somewhere on these spectra.

With these core principles in mind, let’s take a look at some of the more popular languages used in data science. What follows is a combination of research and personal experience of myself, friends and colleagues — but it is by no means definitive! In approximately order of popularity, here goes:


What you need to know

Released in 1995 as a direct descendant of the older S programming language, R has since gone from strength to strength. Written in C, Fortran and itself, the project is currently supported by the R Foundation for Statistical Computing.




  • Excellent range of high-quality, domain specific and open source packages. R has a package for almost every quantitative and statistical application imaginable. This includes neural networks, non-linear regression, phylogenetics, advanced plotting and many, many others.
  • The base installation comes with very comprehensive, in-built statistical functions and methods. R also handles matrix algebra particularly well.
  • Data visualization is a key strength with the use of libraries such as ggplot2.


  • Performance. There’s no two ways about it, R is not a quick language.
  • Domain specificity. R is fantastic for statistics and data science purposes. But less so for general purpose programming.
  • Quirks. R has a few unusual features that might catch out programmers experienced with other languages. For instance: indexing from 1, using multiple assignment operators, unconventional data structures.

Verdict — “brilliant at what it’s designed for”

R is a powerful language that excels at a huge variety of statistical and data visualization applications, and being open source allows for a very active community of contributors. Its recent growth in popularity is a testament to how effective it is at what it does.


What you need to know

Guido van Rossum introduced Python back in 1991. It has since become an extremely popular general purpose language, and is widely used within the data science community. The major versions are currently 3.6 and 2.7.




  • Python is a very popular, mainstream general purpose programming language. It has an extensive range of purpose-built modules and community support. Many online services provide a Python API.
  • Python is an easy language to learn. The low barrier to entry makes it an ideal first language for those new to programming.
  • Packages such as pandas, scikit-learn and Tensorflow make Python a solid option for advanced machine learning applications.


  • Type safety: Python is a dynamically typed language, which means you must show due care. Type errors (such as passing a String as an argument to a method which expects an Integer) are to be expected from time-to-time.
  • For specific statistical and data analysis purposes, R’s vast range of packages gives it a slight edge over Python. For general purpose languages, there are faster and safer alternatives to Python.

Verdict — “excellent all-rounder”

Python is a very good choice of language for data science, and not just at entry-level. Much of the data science process revolves around the ETL process (extraction-transformation-loading). This makes Python’s generality ideally suited. Libraries such as Google’s Tensorflow make Python a very exciting language to work in for machine learning.


What you need to know

SQL (‘Structured Query Language’) defines, manages and queries relational databases. The language appeared by 1974 and has since undergone many implementations, but the core principles remain the same.


Varies — some implementations are free, others proprietary


  • Very efficient at querying, updating and manipulating relational databases.
  • Declarative syntax makes SQL an often very readable language . There’s no ambiguity about what
    SELECT name FROM users WHERE age > 18

    is supposed to do!

  • SQL is very used across a range of applications, making it a very useful language to be familiar with. Modules such as SQLAlchemy make integrating SQL with other languages straightforward.


  • SQL’s analytical capabilities are rather limited — beyond aggregating and summing, counting and averaging data, your options are limited.
  • For programmers coming from an imperative background, SQL’s declarative syntax can present a learning curve.
  • There are many different implementations of SQL such as PostgreSQL, SQLite, MariaDB . They are all different enough to make inter-operability something of a headache.

Verdict — “timeless and efficient”

SQL is more useful as a data processing language than as an advanced analytical tool. Yet so much of the data science process hinges upon ETL, and SQL’s longevity and efficiency are proof that it is a very useful language for the modern data scientist to know.


What you need to know

Java is an extremely popular, general purpose language which runs on the (JVM) Java Virtual Machine. It’s an abstract computing system that enables seamless portability between platforms. Currently supported by Oracle Corporation.


Version 8 — Free! Legacy versions, proprietary.


  • Ubiquity . Many modern systems and applications are built upon a Java back-end. The ability to integrate data science methods directly into the existing codebase is a powerful one to have.
  • Strongly typed. Java is no-nonsense when it comes to ensuring type safety. For mission-critical big data applications, this is invaluable.
  • Java is a high-performance, general purpose, compiled language . This makes it suitable for writing efficient ETL production code and computationally intensive machine learning algorithms.


  • For ad-hoc analyses and more dedicated statistical applications, Java’s verbosity makes it an unlikely first choice. Dynamically typed scripting languages such as R and Python lend themselves to much greater productivity.
  • Compared to domain-specific languages like R, there aren’t a great number of libraries available for advanced statistical methods in Java.

Verdict — “a serious contender for data science”

There is a lot to be said for learning Java as a first choice data science language. Many companies will appreciate the ability to seamlessly integrate data science production code directly into their existing codebase, and you will find Java’s performance and and type safety are real advantages. However, you’ll be without the range of stats-specific packages available to other languages. That said, definitely one to consider — especially if you already know one of R and/or Python.


What you need to know

Developed by Martin Odersky and released in 2004, Scala is a language which runs on the JVM. It is a multi-paradigm language, enabling both object-oriented and functional approaches. Cluster computing framework Apache Spark is written in Scala.




  • Scala + Spark = High performance cluster computing. Scala is an ideal choice of language for those working with high-volume data sets.
  • Multi-paradigmatic: Scala programmers can have the best of both worlds. Both object-oriented and functional programming paradigms available to them.
  • Scala is compiled to Java bytecode and runs on a JVM. This allows inter-operability with the Java language itself, making Scala a very powerful general purpose language, while also being well-suited for data science.


  • Scala is not a straightforward language to get up and running with if you’re just starting out. Your best bet is to download sbt and set up an IDE such as Eclipse or IntelliJ with a specific Scala plug-in.
  • The syntax and type system are often described as complex. This makes for a steep learning curve for those coming from dynamic languages such as Python.

Verdict — “perfect, for suitably big data”

When it comes to using cluster computing to work with Big Data, then Scala + Spark are fantastic solutions. If you have experience with Java and other statically typed languages, you’ll appreciate these features of Scala too. Yet if your application doesn’t deal with the volumes of data that justify the added complexity of Scala, you will likely find your productivity being much higher using other languages such as R or Python.


What you need to know

Released just over 5 years ago, Julia has made an impression in the world of numerical computing. Its profile was raised thanks to early adoption by several major organizationsincluding many in the finance industry.




  • Julia is a JIT (‘just-in-time’) compiled language, which lets it offer good performance. It also offers the simplicity, dynamic-typing and scripting capabilities of an interpreted language like Python.
  • Julia was purpose-designed for numerical analysis. It is capable of general purpose programming as well.
  • Readability. Many users of the language cite this as a key advantage


  • Maturity. As a new language, some Julia users have experienced instability when using packages. But the core language itself is reportedly stable enough for production use.
  • Limited packages are another consequence of the language’s youthfulness and small development community. Unlike long-established R and Python, Julia doesn’t have the choice of packages (yet).

Verdict — “one for the future”

The main issue with Julia is one that cannot be blamed for. As a recently developed language, it isn’t as mature or production-ready as its main alternatives Python and R. But, if you are willing to be patient, there’s every reason to pay close attention as the language evolves in the coming years.


What you need to know

MATLAB is an established numerical computing language used throughout academia and industry. It is developed and licensed by MathWorks, a company established in 1984 to commercialize the software.


Proprietary — pricing varies depending on your use case


  • Designed for numerical computing. MATLAB is well-suited for quantitative applications with sophisticated mathematical requirements such as signal processing, Fourier transforms, matrix algebra and image processing.
  • Data Visualization. MATLAB has some great inbuilt plotting capabilities.
  • MATLAB is often taught as part of many undergraduate courses in quantitative subjects such as Physics, Engineering and Applied Mathematics. As a consequence, it is widely used within these fields.


  • Proprietary licence. Depending on your use-case (academic, personal or enterprise) you may have to fork out for a pricey licence. There are free alternatives available such as Octave. This is something you should give real consideration to.
  • MATLAB isn’t an obvious choice for general-purpose programming.

Veredict — “best for mathematically intensive applications”

MATLAB’s widespread use in a range of quantitative and numerical fields throughout industry and academia makes it a serious option for data science. The clear use-case would be when your application or day-to-day role requires intensive, advanced mathematical functionality; indeed, MATLAB was specifically designed for this.

Other Languages

There are other mainstream languages that may or may not be of interest to data scientists. This section provides a quick overview… with plenty of room for debate of course!


C++ is not a common choice for data science, although it has lightning fast performance and widespread mainstream popularity. The simple reason may be a question of productivity versus performance.

As one Quora user puts it:

“If you’re writing code to do some ad-hoc analysis that will probably only be run one time, would you rather spend 30 minutes writing a program that will run in 10 seconds, or 10 minutes writing a program that will run in 1 minute?”

The dude’s got a point. Yet for serious production-level performance, C++ would be an excellent choice for implementing machine learning algorithms optimized at a low-level.

Verdict — “not for day-to-day work, but if performance is critical…”

Read me...

Python 3, Pandas and Encoding Issues

It is not unusual to come across encoding problems when opening files in Python 3. The subject matter is a large topic of discussion, and here I am providing some quick ways to deal with a typical encoding issue you are likely to encounter.

Say you are interested in opening a CSV file to be loaded into a pandas dataframe. If the stars align and the generator of your CSV is magnanimous, they may have saved the file using UTF-8. If so you may get away with reading the file (here called my file.csv) as follows

import python as pd

df = pd.read_csv('myfile.csv')

You should in principle pass a parameter to pandas telling it what encoding the file has been saved with, so a more complete version of the snippet above would be:

import python as pd

df = pd.read_csv('myfile.csv', encoding='utf-8')

Encoding conundrum

What happens when you don't know what encoding was used to save the file? Well, you can ask, but it is very unlikely that the file generator know... What to do? Well there are some libraries that can be helpful.

Install the chardet module as follows from the terminal

pip install chardet

And use the following snippet as a guide:

import chardet
import pandas as pd

def find_encoding(fname):
    r_file = open(fname, 'rb').read()
    result = chardet.detect(r_file)
    charenc = result['encoding']
    return charenc

my_encoding = find_encoding('myfile.csv')
df = pd.read_csv('myfile.csv', encoding=my_encoding)

Et voilà!

Read me...

iPad keyboard - Caps Lock key changes language

I have been experiencing this issue for some time now... I have an external keyboard for my iPad and every time that I hit the Caps Lock key, instead of locking the capital letters in the keyboard, the iPad changes language... This is particularly annoying as I use several languages, from Spanish to Japanese. I decided that enough is enough and I have now managed to find a way to avoid this:

  1. Go to Settings, General
  2. Open Keyboard
  3. Select Hardware Keyboard 
  4. Switch off the Caps Lock switch to/from Latin

Et voilà!



Read me...

Jupyter not Launching after Updating to MacOS 10.12.5

If you have been itching to update the operating system on your shiny Mac, beware that there is a broken link when launching Jupyter.

Do not panic, simply follow these simple steps:

  • Open a terminal and type the following command:
jupyter notebook --generate-config
  • In the terminal navigate to the place where the configuration file is. In other words, type the following:
cd ~/.jupyter/jupyter_notebook_config.py
  • Open the jupyter_noteebook_config.py file and look for a line that says:
# c.NotebookApp.browser = ''

Change the line as follows: Delete the hash and write the name of your browser in between the quotes.

For SAFARI it would look as follows

c.NotebookApp.browser = 'safari'

If you are a CHROME supporter, the command looks a tad bit more complicated:

c.NotebookApp.browser = 'open -a /Applications/Google\ Chrome.app %s'
  • Save the script and close it.
  • Restart your terminal and launch Jupyter...

Et voilà!

Read me...

Jetpack Issue Solved - HTTP status code was not 200 (500)

I have been having some troubles managing a wordpress site via the Wordpress App. The stats and other things worked fine, but I was not able to see my existing posts or pages.

Everytime I tried to synchronise them I would get an error that read something along the lines of:

"transport error – HTTP status code was not 200 (500)”

I did not pay too much attention to it as I thought it was an error with the App and that in new updates it would get sorted... but that did not seem to be the case.

This morning I rolled my sleeves and decided to take a look at things. I ended up updating the PHP version in the site from 5.3 to 5.4 and it seems to have done the trick!!

So, I thought of letting you know. Now, I cannot guarantee that that is the end of the story, but at least my posts are showing in the mobile app.


Read me...

Data Science and Analytics with Python - Cover

Well, I am very pleased to show you the cover that will be used for "Data Science and Analytics with Python" book. Not long to publication day!

Read me...

GA - Intro to Data Science and Analytics

Very pleased to have given an intro talk on Data Science and Analytics at General Assembly yesterday.


Read me...

"Essential Matlab and Octave" in the CERN Document Server

I got pinged this screenshot from a friend that saw "Essential MATLAB and Octave" being included in the CERN Document Server!



Read me...

Data Science and Analytics with Python - Proofread Manuscript

I have now received comments and corrections for the proofreading of my “Data Science and Analytics with Python” book.

Two weeks and counting to return corrections and comments back to the editor and project manager.


Read me...

Anaconda - Guarenteed Python packages via Conda and Conda-Forge

During the weekend I got a member of the team getting in touch because he was unable to get a Python package working for him . He had just installed Python in his machine, but things were not quite right... For example pip was not working and he had a bit of a bother setting some environment variables... I recommended to him having a look at installing Python via the Anaconda distribution. Today he was up and running with his app.

Given that outcome, I thought it was a great coincidence that the latest episode of Talk Python To Me that started playing on my way back home happened to be about Conda and Conda-Forge. I highly recommend listening to it. Take a loook:

Talk Python To Me - Python conversations for passionate developers - #94 Guarenteed packages via Conda and Conda-Forge

Have you ever had trouble installing a package you wanted to use in your Python app? Likely it contained some odd dependency, required a compilation step, maybe even using an uncommon compiler like Fortran. Did you try it on Windows? How many times have you seen "Cannot find vcvarsall.bat" before you had to take a walk?

If this sounds familiar, you might want to check conda the package manager, Anaconda, the distribution, conda forge, and conda build. They dramatically lower the bar for installing packages on all the platforms.

This week you'll meet Phil Elson, Kale Franz, and Michael Sarahan who all work on various parts of this ecosystem.

Links from the show:

conda: conda.pydata.org
conda-build: conda.pydata.org/docs/commands/build/conda-build.html
Anaconda distribution: continuum.io/anaconda-overview
conda-forge: conda-forge.github.io

Phil Elson on Twitter: @pypelson
Kale Franz: @kalefranz
Michael Sarahan: github.com/msarahan

Read me...

The Winton Gallery opens at the Science Museum

During the recent Christmas and New Year break I had the opportunity to visit the Science Museum (yes, again...). This time to see the newly opened Winton Gallery that housed the Mathematics exhibit in the museum. Not only is the exhibit about a subject matter close to my heart, but also the gallery was designed by Zaha Hadid Architects. I must admit, that the first I heard of this was in a recent visit to the IMAX at the Science Museum to see Rogue One... Anyway, I took some pictures that you can see in the photo gallery here, and I am also re-posting an entry that appeared in the London Mathematical Society newsletter Number 465 for January 2017.

Mathematics: The Winton Gallery opens at the Science Museum, London

On 8 December 2016 the Science Museum opened a pioneering new gallery that explores how mathematicians, their tools and ideas have helped shape the modern world over the last 400 years. Mathematics: The Winton Gallery places mathematics at the heart of all our lives, bringing  the subject to life through remarkable stories, artefacts and design.

More than 100 treasures from the Science Museum’s world-class science, technology, engineering and mathematics collections help tell powerful stories about how mathematical practice has shaped and been shaped by some of our most fundamental human concerns – including money, trade, travel, war, life and death.

From a beautiful 17th-century Islamic astrolabe that used ancient mathematical techniques to map the night sky to an early example of the famous Enigma machine, designed to resist even the most advanced mathematical techniques for codebreaking, each historical object has an important story to tell about how mathematics has shaped our world. Archive photography and lm helps capture these stories and digital exhibits alongside key objects introduce the wide range of people who made, used or were affected by each mathematical device.

Dramatically positioned at the centre of the gallery is the Handley Page ‘Gugnunc’ aircraft, built in 1929 for a competition to construct a safe aircraft. Ground-breaking aerodynamic research influenced the wing design of this experimental aircraft, helping transform public opinion about the safety of ying and securing the future of the aviation industry. This aeroplane highlights perfectly the central theme of the gallery about how mathematical practice is driven by, and in uences, real-world concerns and activities.

Mathematics also defines Zaha Hadid Architects’ design for the gallery. Inspired by the Handley Page aircraft, the gallery is laid out using principles of mathematics and physics. These principles also inform the three-dimensional curved surfaces representing the patterns of air ow that would have streamed around this aircraft.

Patrik Schumacher, Partner at Zaha Hadid Architects, recently noted that mathematics was part of Zaha Hadid’s life from a young age and was always the foundation of her architecture, describing the new mathematics gallery as ‘an important part of Zaha’s legacy in London’. Gallery curator David Rooney, who was respon- sible for the Science Museum’s recent award- winning Codebreaker: Alan Turing’s Life and Legacy exhibition, explained that the gallery tells ‘a rich cultural story of human endeavor that has helped transform the world’.

The mathematics gallery was made possible through an unprecedented donation from long-standing supporters of science, David and Claudia Harding. Additional support was also provided by Principal Sponsor Samsung, Major Sponsor MathWorks and a number of individual donors.

A lavishly illustrated new book, Mathematics: How It Shaped Our World, written by David Rooney and published by Scala Arts & Heritage Publishers, accompanies the new display. It expands the stories covered in the gallery and contains an absorbing series of newly commissioned essays by prominent historians and mathematicians including June Barrow-Green, Jim Bennett, Patricia Fara, Dame Celia Hoyles and Helen Wilson, with an afterword from Dame Zaha Hadid with Patrick Schumacher.

Read me...

World of Watson - Talk on Data Science

Last week I had the opportunity to attend the annual IBM conference in Las Vegas. The World of Watson conference, formally known as Insight,  provided me with an opportunity to meet new interesting people, talk to colleagues and customers, learn new things and share some ideas with like-minded people. As you can imagine, with Watson being at the centre stage of the event, there were a large number of presentations, stands and marketing featuring Watson-related things: from cognitive chocolate and brews through to cognitive computing and beyond.

World of Watson Presentation
World of Watson Presentation

My session took place on Monday October 24th and I was very pleased to see a full room, and even later standing-room only just minuted before the start. We covered some of the fundamentals of data science and machine learning and took the pulse of their use in the insurance industry in particular. I then had the opportunity of sharing some of the results of the work we have been doing over the past 12 months at the Data Science Studio in London. The case studies showcased included examples in insurance, banking, wealth management and retail.

All in all, it was a very successful and enjoyable trip, in spite of the constant flashing lights of the slot machines around Las Vegas different venues.

Read me...

Extract tables from messy spreadsheets with jailbreakr (reblog)

The original blog can be seen here.

R has some good tools for importing data from spreadsheets, among them the readxl package for Excel and the googlesheets package for Google Sheets. But these only work well when the data in the spreadsheet are arranged as a rectangular table, and not overly encumbered with formatting or generated with formulas. As Jenny Bryan pointed out in her recent talk at the useR!2016 conference (and embedded below, or download PDF slides here), in practice few spreadsheets have "a clean little rectangle of data in the upper-left corner", because most people use spreadsheets not just a file format for data retrieval, but also as a reporting/visualization/analysis tool.

Nonetheless, for a practicing data scientist, there's a lot of useful data locked up in these messy spreadsheets that needs to be imported into R before we can begin analysis. As just one example given by Jenny in her talk, this spreadsheet was included as one of 15,000 spreadsheet attachments (one with 175 tabs!) in the Enron Corpus.


To make it easier to import data into R from messy spreadsheets like this, Jenny and co-author Richard G. FitzJohn created the jailbreakr package. The package is in its early stages, but it can already import Excel (xlsx format) and Google Sheets intro R as a new "linen" objects from which small sub-tables can easily be extracted as data frames. It can also print spreadsheets in a condensed text-based format with one character per cell — useful if you're trying to figure out why an apparently simple spreadsheet isn't importing as you expect. (Check out the "weekend getaway winner" story near the end of Jenny's talk for a great example.)

The jailbreakr package isn't yet on CRAN, but if you want to try it out you can download it from the Github repository (or even contribute!) at the link below.

Github (rsheets): jailbreakr

Read me...

Raspberry Pi

I am very pleased to have finally received the Raspberry Pi 3 that I ordered the other day. I also got a Sense Hat - an add-on board for Raspberry Pi, made especially for the Astro Pi mission

The Sense HAT has an 8×8 RGB LED matrix, a five-button joystick and includes the following sensors:

  • Gyroscope
  • Accelerometer
  • Magnetometer
  • Temperature
  • Barometric pressure
  • Humidity

There is even a  Python library providing easy access to everything on the board. I can't wait to start using it with some of the APIs available at Bluemix for example. Any ideas are more than welcome.


Read me...

Bluemix - a set of tools/tutorials for app development

IBM's Bluemix provides access to a large set of API's such as Watson services like AlchemyAPI, Natural Language Classifier, Visual Recognition, Personality Insights and more. I have recently started playing with it a bit more. You can set up a free account (free for 30 days) and see what you think.

Check it out:

 Here is what IBM has to say about it:

Bluemix is the latest cloud offering from IBM. It enables organizations and developers to quickly and easily create, deploy, and manage applications on the cloud. Bluemix is an implementation of IBM's Open Cloud Architecture based on Cloud Foundry, an open source Platform as a Service (PaaS). Bluemix delivers enterprise-level services that can easily integrate with your cloud applications without you needing to know how to install or configure them.Bluemix

I will be happy to hear what you build and how you use bluemix. Keep in touch.

Read me...

Now Reading: Algorithms to Live By

Now reading: Algorithms to Live By

A pretty good read about  how computer algorithms can be applied to our everyday lives, helping to solve common decision-making problems and more.



Read me...

Getting ready for WWDC16 in San Francisco

Getting ready for WWDC16 in San Francisco

Read me...

Installing Spark 1.6.1 on a Mac with Scala 2.11

I have recently gone through the process of installing Spark in my mac for testing and development purposes. I also wanted to make sure I could use the installation not only with Scala, but also with PySpark through a Jupyter notebook.

If you are interested in doing the same, here are the steps I followed. First of all, here are the packages you will need:

  • Python 2.7 or higher
  • Java SE Development Kit
  • Scala and Scala Build Tool
  • Spark 1.6.1 (at the time of writing)
  • Jupyter Notebook


You can chose the best python distribution that suits your needs. I find Anaconda to be fine for my purposes. You can obtain a graphical installer from https://www.continuum.io/downloads. I am using Python 2.7 at the time of writing.

Java SE Development Kit

You will need to download Oracle Java SE Development Kit 7 or 8 at Oracle JDK downloads page. In my case, at the time of writing I am using 1.7.0_80. You can check the version you have by opening a terminal and typing

java -version

You also have to make sure that the appropriate environment variable is set up. In your


  add the following lines:

export JAVA_HOME=$(/usr/libexec/java_home)

Scala and Scala Build Tool

In this case, I found it much easier to use Homebrew to install and manage the Scala language. I f you have never used Homebrew, I recommend that you take a look. To install it you have to type the following in your terminal:

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Once you have Hombrew you can install Scala and the Scala Build Tool as follows:

> brew install scala
> brew install bst

You may want to create appropriate environments in your 



export SCALA_HOME=/usr/local/bin/scala

Spark 1.6.1

Obtain Spark from https://spark.apache.org/downloads.html

Note that for building Spark with Scala 2.11 you will need to download the Spark source code and build it appropriately.


Once you have downloaded the tgz file, unzip it into an appropriate location (your home directory for example) and navigate to the unzipped folder (for example



To build Spark with Scala 2.11 you need to type the following commands:

> ./dev/change-version-to-2.11.sh
> build/sbt clean assembly

This may take a while, so sit tight! When finished, you can check that everything is working by launching either the Scala shell:

> ./bin/spark-shell

or the Python shell:

> ./bin/pyspark

Once again there are some environment variables that are recommended:

export SPARK_PATH=~/spark-1.6.1
export PYSPARK_DRIVER_PYTHON="jupyter" 
alias sparknb='$SPARK_PATH/bin/pyspark --master local[2]'

The last line is an alias that will enable us to launch a Jupyter notebook with PySpark. Totally optional!

Jupyter Notebook

If all is working well you are ready to go. Source your


  and  launch a Jupyter notebook:

> sparknb

Et voilà!

Read me...

Makeover Monday: Will a sugar tax have an impact on childhood obesity?

Following up the Data+Visual Meetup hosted at IBM last Wednesday, I wanted to take part in the Makeover Monday project that Andy Kriebel highlighted during his talk.

This week the data was came from the BBC and in particular the visualisation that shows how people in the UK get their added sugar:

This is a story that follows up the recent announcement by Chancellor George Osbourne about a tax on sugary drinks in the UK. Here is my Makeover Monday for the visualisation above:

Sugar UK




Read me...

Astronaut Bowman

How much should we fear the rise of artificial intelligence?

  1. When the arena is something as pure as a board game, where the rules are entirely known and always exactly the same, the results are remarkable. When the arena is something as messy, unrepeatable and ill-defined as actuality, the business of adaptation and translation is a great deal more difficult.

Tom Chatfield

From the opinion article of Tom Chatfiled in The Guardian.

Astronaut Bowman

Read me...

Google Drive

Google Drive not synching on your Mac? Here is what to do

I am not a big used of Google Drive. It is a good service and it mostly does what one may need from a suite of productivity apps... but for some reason I only use it in very limited cases.

So, no surprise that I had not noticed that the synching between the cloud version of my documents and those in my mac had gone pear shaped. I tried logging out of Drive but that did not help. I attempted forcing the synch by making changed in both the cloud version and the Mac, but same result. Google Drive

I managed to sort it out in the end and here is what I did:

  1. Exit the Drive application
  2. Navigate to the Application Support folder and look for the Google folderYou may need to find the hidden Library folder
    • In Finder look for the Go menu and press Option + Cmd to reveal the hidden folder
    • Once there look for the "Application Support"
    • Alternatively you can press Cmd + Shift + G and go to "~/Library/Application Support/Google"
  3. Delete the Drive fokder
  4. Start the Drive application

Et voilà

Read me...

Disable Notifications

A quick way to tame Mac notification while giving a presentation

Surely you have suffered this same situation: You are giving a really good presentation, with a fantastic slide deck in your shiny MacBook, you are dominating the stage and people are nodding at your witty insights... and then an email notification appears in the top right-hand corner of the screen, followed by a FaceTime call from your other-half.... Noooooo!

A good way to disable these notification is to ⌥-click (option-click) the notification bar:

Disable Notifications


In that way, any notifications handled by the notification bar are not shown. Once you are ready to receive notifications, simply ⌥-click (option-click) again. Et voilà!

Read me...


How Watson answers a question

If you wanted to know more about how Watson works, here is a good video that may help.


Read me...


Watson and some well-known people

I recently came across a few interesting videos of IBM's Watson appearing alongside some well-known people like Ridley Scott and Cerrie Fisher, here are some:




Read me...

Quantum algorithms for topological and geometric analysis of data

Story Source:

The above post is reprinted from materials provided by Massachusetts Institute of Technology. The original item was written by David L. Chandler. Note: Materials may be edited for content and length.

Quantum Data Algos

From gene mapping to space exploration, humanity continues to generate ever-larger sets of data -- far more information than people can actually process, manage, or understand.

Machine learning systems can help researchers deal with this ever-growing flood of information. Some of the most powerful of these analytical tools are based on a strange branch of geometry called topology, which deals with properties that stay the same even when something is bent and stretched every which way.

Such topological systems are especially useful for analyzing the connections in complex networks, such as the internal wiring of the brain, the U.S. power grid, or the global interconnections of the Internet. But even with the most powerful modern supercomputers, such problems remain daunting and impractical to solve. Now, a new approach that would use quantum computers to streamline these problems has been developed by researchers at MIT, the University of Waterloo, and the University of Southern California.

The team describes their theoretical proposal this week in the journal Nature Communications. Seth Lloyd, the paper's lead author and the Nam P. Suh Professor of Mechanical Engineering, explains that algebraic topology is key to the new method. This approach, he says, helps to reduce the impact of the inevitable distortions that arise every time someone collects data about the real world.

In a topological description, basic features of the data (How many holes does it have? How are the different parts connected?) are considered the same no matter how much they are stretched, compressed, or distorted. Lloyd explains that it is often these fundamental topological attributes "that are important in trying to reconstruct the underlying patterns in the real world that the data are supposed to represent."

It doesn't matter what kind of dataset is being analyzed, he says. The topological approach to looking for connections and holes "works whether it's an actual physical hole, or the data represents a logical argument and there's a hole in the argument. This will find both kinds of holes."

Using conventional computers, that approach is too demanding for all but the simplest situations. Topological analysis "represents a crucial way of getting at the significant features of the data, but it's computationally very expensive," Lloyd says. "This is where quantum mechanics kicks in." The new quantum-based approach, he says, could exponentially speed up such calculations.

Lloyd offers an example to illustrate that potential speedup: If you have a dataset with 300 points, a conventional approach to analyzing all the topological features in that system would require "a computer the size of the universe," he says. That is, it would take 2300 (two to the 300th power) processing units -- approximately the number of all the particles in the universe. In other words, the problem is simply not solvable in that way.

"That's where our algorithm kicks in," he says. Solving the same problem with the new system, using a quantum computer, would require just 300 quantum bits -- and a device this size may be achieved in the next few years, according to Lloyd.

"Our algorithm shows that you don't need a big quantum computer to kick some serious topological butt," he says.

There are many important kinds of huge datasets where the quantum-topological approach could be useful, Lloyd says, for example understanding interconnections in the brain. "By applying topological analysis to datasets gleaned by electroencephalography or functional MRI, you can reveal the complex connectivity and topology of the sequences of firing neurons that underlie our thought processes," he says.

The same approach could be used for analyzing many other kinds of information. "You could apply it to the world's economy, or to social networks, or almost any system that involves long-range transport of goods or information," Lloyd says. But the limits of classical computation have prevented such approaches from being applied before.

While this work is theoretical, "experimentalists have already contacted us about trying prototypes," he says. "You could find the topology of simple structures on a very simple quantum computer. People are trying proof-of-concept experiments."

The team also included Silvano Garnerone of the University of Waterloo in Ontario, Canada, and Paolo Zanardi of the Center for Quantum Information Science and Technology at the University of Southern California.

Read me...

Data Science Bootcamp - Done

Today I had the opportunity of running a #DataScience bootcamp in London. It was an all-day affair and although the attendees were engaged, I’m sure that by the end of the 6th hour they were quite tired.
The discussions ranged from what data science is, the skills required to become a data scientist and also to manage them. Finally we implemented some data analyses based  on linear regression, all using R. I was very pleased to see some of the results.


Read me...

Opening old Keynote/Pages files in new versions

Greetings readers! I hope you are all enjoying the break and getting ready for 2016.

This time I wanted to bring to your attention some information that you may find to be very useful. Particularly if, like me, you happen to have need some old slides, presentations or talks you have in Keynote but forgot (or rather did not need) to update to a newer version of the software. You may have thought that there would be some backward compatibility for this sort of thing, and you may be surprised that there is not an obvious click-and-update type solution. Nonetheless, not all is lost and you would not have to trash your presentations, unless of course they were not the slides you were looking for... This trick also works with Pages by the way.

You may find that when opening your old slide decks, Keynotes complains with:

This document can't be opened because it's too old. To open it, save it with Keynote '09 first.

Keynote Compatibility Issue

and Pages with:

This document can't be opened because it's too old. To open it, save it with Pages '09 first.

Of course, if you have both versions installed this should not be a problem, but why would you do that? So, if you cannot open the old file in the first place, here is what you need to do (please make sure that you have a backup copy of your file... you never know...):

  1. Open the Terminal and navigate to the directory where the old file is saved. So if your file is called

    and it is saved in your Desktop just type 

    > cd Desktop
  2. Rename the file with a .zip extension:
    > mv my_presentation.keynote my_presentation.zip
  3. Unzip the file: 
    > unzip my_presentation.zip -d my_presentation
  4. Type the following command:
    gunzip --stdout index.apxl.gz | sed 's-:version="72007061400"-:version="92008102400"-g' > index.apxl

    and hit return. If you do not get any errors you are good to go.

  5. Remove the 
  6. Re-compress the folder and change the extension to the original one.

Try opening your file, it may still complain but at least you will be able to open it. Et voilà!

Read me...

MacTex updates for El Capitan

El Capitan! Great! The new version of the OS X operating system. New features, new fonts, new problems... I knew that updating was going to bring some unexpected problems with my applications, but I wanted to update... And ditto, as soon as I tried to take a look under the hood for a couple of things I realised that a fresh installation of homebrewwas going to be needed.

More importantly, with my new book on data science (aka "Data Science and Analytics with Python"), LaTeX is probably one of the most used things in my computer. So, I wanted to check that things were fine and although I could compile (currently trying to finish Chapter 3 in case you are wondering) but there were some issues here and there, for example TeX Live thought I was using version 0 (yes zero!) and it could not find some files.

It turns out that El Capitan does not let us write to /usr and the 2015 TeX distribution creates symbolic links to /usr/texbin, is removed (if it was there from a previous OS version) and cannot be installed. If a GUI looks by default at that location it will sadly no longer find it. That is why the terminal was not affected! (Phew!)

The solution is to tell the broken applications to look at /Library/TeX/texbin, in /Library/TeX which is “owned” by MacTEX so is allowed by El Capitan. So to fix Tex Live do the following:

  •  Open TEX Live Utility  Preferences and click on the Choose. . .
  •  That opens a file chooser. Type Shift-Cmd-G , enter /Library/TeX  into the dialog box and then press Return .
  • Finally Double-Click  on texbin
  • Et voilà


For more info see this link.

El Capitan

Read me...

Google's and Gerty's logos are quite similar

I have recently updated my applications and hit confused when trying to launche my book reader Gerty and instead of opening the book(s) I'm currently reading, I found staring at Googles's search bar...

I am sure that is something neither of them would like, but hey... Just pointing it out. The similarity is superficial, but enough to get confused when looking at small icons in a screen. Check it out:

Read me...

iPython Notebook is now Jupyter... I knew it!

JupyterIt is not really news... Jupyter is the new name of the loved iPython project, and it has been for a while and as they Jupiter projects puts it themselves

The language-agnostic parts of IPython are getting a new home in Project Jupyter

As announced in the python.org page, as of version 4.0, the The Big Split from the old iPython starts. I knew this and I even tweeted about it:


All, great, right? Well I still got surprised when after updating my Python installation and tried to start my ipython notebook I got an error that ended with:

File "importstring.py", line 31, in import_item
module = __import__(package, fromlist=[obj])
ImportError: No module named notebook.notebookapp

Then I remembered and to fix my problem I simply tried installing Jupyter (*I am using Anaconda) with the following command

conda install jupyter

Et voilà!

iPython Notebook is now Jupyter... I knew it!

Read me...

Cloudera Breakfast Briefing and Tofu Scientists

Last Thursday I attended a Cloudera Breakfast Briefing where Sean Owen was speaking about Spark and the examples were related to building decision trees and random forests. It was a good session in general.

Sean started his talk with an example using the Iris dataset using R, in particular the "party" library. He then moved on to talk about Spark and MLlib.


For the rest of the talk he used the "Covertype" data set that contains 581,012 data points describing trees using 54 features (elevation, slope, soil tye, etc,) predicting forest cover type (spruce, aspen, etc.). A very apt dataset for the construction of random forests, right? I was very pleased to see a new (for me) dataset being used!

Sean want over some bits and pieces about using Spark, highlighting the compactness of the code. He also turned his attention to the tuning of hyper-parameters and its importance.

There are different ways to approach this, but it is always about finding a balance, a trade-off. For a tree we can play with the depth of the tree, the maximum number of bins (i.e. the number of different decision rules to be tried), the amount of impurity (Gini or Entropy measures).

If we don't know the right values for the hyperparameters, we can try several ones.  Particularly if you have enough room on your cluster.


  • Building a random forest: let various trees see only a subset of the data, then combine. Another approach is to let the trees see a subset of the features. The latter is a nice idea as this may be a more reasonable approach for large clusters, where communication among nodes is kept to a minimum -> good for Spark or Hadoop.


Sean finished with some suggestions of things one can try:

  • Try SVM and LogisticRegression in MLlib
  • Real-time scoring with Spark Streaming
  • Use random decision forests for regression

Nonetheless, the best bit of this all was that after asking a couple of questions I managed to get my hands in a "Tofu Scientist" T-Shirt! Result!

Tofu Scientist 1

Read me...

No shuffle in new iOS 8.4 Music App

I was not too sure about the new Apple Music offering, but so far it seems quite alright! The music choices are generally good, and I hope that as I use the music app in iOS 8.4 more the choices get better.

Unfortunately I ended up using the app while not having mobile coverage and no WiFi either... so I reverted to "My Music" and since I was in the middle of a run, I wanted just to hit the shuffle button and hope for the best... However, I was surprised that there was no shuffle button to be seen... I ended up hitting the first song in the list and take it from there. It turns out that the shuffle option is set by default, you just have to seed it by starting playing any song. That seems good, except for the fact that it is not obvious at all.

You can select if you want the shuffle mode or not after starting playing any song and expanding the "Now Playing" bar:

iTunes Shuffle 1


And there you will be able to see the usual Shuffle icon:

iTunes Shuffle


Read me...

You have big data? MIT researchers can help shrinking it!

In a lot of machine learning and data science applications, it is not unusual to use matrices to represent data. It is indeed a very convenient way to keep the information but also to do manipulations, calculations and other useful tricks. As the size of the data increases, of course the size of the matrices grows too and that can be a bit problematic. Finding a way to reduce the size of these matrices while keeping the information is a challenge that a lot of us have faced. Using techniques that exploit the sparsity of the matrices, or even reducing the dimensionality via principal components is common practice.

Reading the latest World Economic Forum Newsletter I came to find out about a new algorithm that MIT researchers will present in the ACM Symposium on Theory of Computing in June. The algorithm is said to find the smallest possible approximation of an original matrix, guaranteeing reliable computations. Indeed the best way to determine how well de "reduced" matrix approximated the original one you need to measure the "distance" between them and a common distance to use the the typical Euclidean measure that we are all familiar with... What? You say you aren't?... Remember Pythagoras?

The square of the hypotenuse (the side opposite the right angle) is equal to the sum of the squares of the other two sides.


There you go... all you have to do is extend it to n-dimensions et voilà... That is not the only way to measure distance. Think for example the way in which you more in a grid-city such in Manhattan, New York City... You cannot take move diagonally (except in Broadway I suppose...) so you need to go north-sound or east-west. That distance is actually called the "Manhattan distance".

Mathematicians refer to "norms" when talking about distance measurement and indeed both the Euclidean and Manhattan distances mentioned above are norms:

  • Manhattan distance is a 1-norm measure, the sum of differences are raise to the power of 1.
  • Euclidean distance is a 2-norm measure, the sum of differences are raise to the power of 2.
  • etc...

So what about the MIT algorithm proposed by Richard Peng and Michael Cohen? Well they show that their algorithm is optimal for "reducing" matrices under any norm! The first step is to assign each row of the original matrix a “weight”. A row’s weight is related to the number of other rows that it is similar to. It also determines the likelihood that the row will be included in the reduced matrix.

Let us imagine that the row is indeed included in the reduced matrix. Then its values will be multiplied according to its weight. So, for instance, if 10 rows are similar to one another, but not to any other rows of the matrix, each will have a 10 percent chance of getting into the condensed matrix. If one of them does, its entries will all be multiplied by 10, so that it will reflect the contribution of the other nine rows.

You would think that using the Manhattan distance would be simpler than the Euclidian one when calculating the weights... Well you would be wrong! The previous best effort to reduce a matrix under the 1-norm would return a matrix whose number of rows was proportional to the number of columns of the original matrix raised to the power of 2.5. In the case of the Euclidean distance it would return a matrix whose number of rows is proportional to the number of columns of the original matrix times its own logarithm.

The MIT algorithm of Peng and Cohen is able to reduce matrices under the 1-norm as well as it does under the 2-norm. One important thing is that for the Euclidean norm, the reduction is as good as that of other algorithms... and that is because they use the same best algorithm out there... However, for the 1-norm it uses it recursively!

Interested in reading the paper? Well go to the ArXiV and take a look!

Read me...

Markup for Fast Data Science Publication - Reblog

I am an avid user of Markdown via Mou and R Markdown (with RStudio). The facility that the iPython Notebook offers in combining code and text to be rendered in an interactive webpage is the choice for a number of things, including the 11-week Data Science course I teach at General Assembly.

As for LaTeX, well, I could not have survived my PhD without it and I still use it heavily. I have even created some videos about how to use LaTeX, you can take a loot at them

My book "Essential Matlab and Octave" was written and formatted in its entirety using LaTeX. My new book "Data Science and Analytics with Python" is having the same treatment.


I was very pleased to see the following blog post by Benjamin Bengfort. This is a reblog of that post and the original can be found here.

Markup for Fast Data Science Publication
Benjamin Bengfort

A central lesson of science is that to understand complex issues (or even simple ones), we must try to free our minds of dogma and to guarantee the freedom to publish, to contradict, and to experiment. — Carl Sagan in Billions & Billions: Thoughts on Life and Death at the Brink of the Millennium

As data scientists, it's easy to get bogged down in the details. We're busy implementing Python and R code to extract valuable insights from data, train effective machine learning models, or put a distributed computation system together. Many of these tasks, especially those relating to data ingestion or wrangling, are time-consuming but are the bread and butter of the data scientist's daily grind. What we often forget, however, is that we must not only be data engineers, but also contributors to the data science corpus of knowledge.

If a data product derives its value from data and generates more data in return, then a data scientist derives their value from previously published works and should generate more publications in return. Indeed, one of the reasons that Machine Learning has grown ubiquitous (see the many Python-tagged questions related to ML on Stack Overflow) is thanks to meticulous blog posts and tools from scientific research (e.g. Scikit-Learn) that enable the rapid implementation of a variety of algorithms. Google in particular has driven the growth of data products by publishing systems papers about their methodologies, enabling the creation of open source tools like Hadoop and Word2Vec.

By building on a firm base for both software and for modeling, we are able to achieve greater results, faster. Exploration, discussion, criticism, and experimentation all enable us to have new ideas, write better code, and implement better systems by tapping into the collective genius of a data community. Publishing is vitally important to keeping this data science gravy train on the tracks for the foreseeable future.

In academia, the phrase "publish or perish" describes the pressure to establish legitimacy through publications. Clearly, we don't want to take our rule as authors that far, but the question remains, "How can we effectively build publishing into our workflow?" The answer is through markup languages - simple, streamlined markup that we can add to plain text documents that build into a publishing layout or format. For example, the following markup languages/platforms build into the accompanying publishable formats:

  • Markdown → HTML
  • iPython Notebook (JSON + Markdown) → Interactive Code
  • reStructuredText + Sphinx → Python Documentation, ReadTheDocs.org
  • AsciiDoc → ePub, Mobi, DocBook, PDF
  • LaTeX → PDF

The great thing about markup languages is that they can be managed inline with your code workflow in the same software versioning repository. Github goes even further as to automatically render Markdown files! In this post, we'll get you started with several markup and publication styles so that you can find what best fits into your workflow and deployment methodology.


Markdown is the most ubiquitous of the markup languages we'll describe in this post, and its simplicity means that it is often chosen for a variety of domains and applications, not just publishing. Markdown, originally created by John Gruber, is a text-to-HTML processor, where lightweight syntactic elements are used instead of the more heavyweight HTML tags. Markdown is intended for folks writing for the web, not designing for the web, and in some CMS systems, it is simply the way that you write, no fancy text editor required.

Markdown has seen special growth thanks to Github, which has an extended version of Markdown, usually referred to as "Github-Flavored Markdown." This style of Markdown extends the basics of the original Markdown to include tables, syntax highlighting, and other inline formatting elements. If you create a Markdown file in Github, it is automatically rendered when viewing files on the web, and if you include a README.md in a directory, that file is rendered below the directory contents when browsing code. Github Issues are also expected to be in Markdown, further extended with tools like checkbox lists.

Markdown is used for so many applications it is difficult to name them all. Below are a select few that might prove useful to your publishing tasks.

  • Jekyll allows you to create static websites that are built from posts and pages written in Markdown.
  • Github Pages allows you to quickly publish Jekyll-generated static sites from a Github repository for free.
  • Silvrback is a lightweight blogging platform that allows you to write in Markdown (this blog is hosted on Silvrback).
  • Day One is a simple journaling app that allows you to write journal entries in Markdown.
  • iPython Notebook expects Markdown to describe blocks of code.
  • Stack Overflow expects questions, answers, and comments to be written in Markdown.
  • MkDocs is a software documentation tool written in Markdown that can be hosted on ReadTheDocs.org.
  • GitBook is a toolchain for publishing books written in Markdown to the web or as an eBook.

There are also a wide variety of editors, browser plugins, viewers, and tools available for Markdown. Both Sublime Text and Atom support Markdown and automatic preview, as well as most text editors you'll use for coding. Mou is a desktop Markdown editor for Mac OSX and iA Writer is a distraction-free writing tool for Markdown for iOS. (Please comment your favorite tools for Windows and Android). For Chrome, extensions like Markdown Here make it easy to compose emails in Gmail via Markdown or Markdown Preview to view Markdown documents directly in the browser.

Clearly, Markdown enjoys a broad ecosystem and diverse usage. If you're still writing HTML for anything other than templates, you're definitely doing it wrong at this point! It's also worth including Markdown rendering for your own projects if you have user submitted text (also great for text-processing).

Rendering Markdown can be accomplished with the Python Markdown library, usually combined with the Bleach library for sanitizing bad HTML and linkifying raw text. A simple demo of this is as follows:

First install markdown and bleach using pip:

$ pip install markdown bleach

Then create a markdown parsing function as follows:

import bleach
from markdown import markdown

def htmlize(text):
This helper method renders Markdown then uses Bleach to sanitize it as
well as converting all links in text to actual anchor tags.
text = bleach.clean(text, strip=True) # Clean the text by stripping bad HTML tags
text = markdown(text) # Convert the markdown to HTML
text = bleach.linkify(text) # Add links from the text and add nofollow to existing links

return text

Given a markdown file test.md whose contents are as follows:

# My Markdown Document

For more information, search on [Google](http://www.google.com).

_Grocery List:_

1. Apples
2. Bananas
3. Oranges

The following code:

>>> with open('test.md', 'r') as f:
... print htmlize(f.read())

Will produce the following HTML output:

<h1>My Markdown Document</h1>
For more information, search on <a href="http://www.google.com" rel="nofollow">Google</a>.

<em>Grocery List:</em>

Hopefully this brief example has also served as a demonstration of how Markdown and other markup languages work to render much simpler text with lightweight markup constructs into a larger publishing framework. Markdown itself is most often used for web publishing, so if you need to write HTML, then this is the choice for you!

To learn more about Markdown syntax, please see Markdown Basics.

iPython Notebook

iPython Notebook is an web-based, interactive environment that combines Python code execution, text (marked up with Markdown), mathematics, graphs, and media into a single document. The motivation for iPython Notebook was purely scientific: How do you demonstrate or present your results in a repeatable fashion where others can understand the work you've done? By creating an interactive environment where code, graphics, mathematical formulas, and rich text are unified and executable, iPython Notebook gives a presentation layer to otherwise unreadable or inscrutable code. Although Markdown is a big part of iPython Notebook, it deserves a special mention because of how critical it is to the data science community.

iPython Notebook is interesting because it combines both the presentation layer as well as the markup layer. When run as a server, usually locally, the notebook is editable, explorable (a tree view will present multiple notebook files), and executable - any code written in Python in the notebook can be evaluated and run using an interactive kernel in the background. Math formula written in LaTeX are rendered using MathJax. To enhance the delivery and shareability of these notebooks, the NBViewer allows you to share static notebooks from a Github repository.

iPython Notebook comes with most scientific distributions of Python like Anaconda or Canopy, but it is also easy to install iPython with pip:

$ pip install ipython

iPython itself is an enhanced interactive Python shell or REPL that extends the basic Python REPL with many advanced features, primarily allowing for a decoupled two-process model that enables the notebook. This process model essentially runs Python as a background kernel that receives execution instructions from clients and returns responses back to them.

To start an iPython notebook execute the following command:

$ ipython notebook

This will start a local server at

and automatically open your default browser to it. You'll start in the "dashboard view", which shows all of the notebooks available in the current working directory. Here you can create new notebooks and start to edit them. Notebooks are saved as .ipynb files in the local directory, a format called "Jupyter" that is simple JSON with a specific structure for representing each cell in the notebook. The Jupyter notebook files are easily reversioned via Git and Github since they are also plain text.

To learn more about iPython Notebook, please see the iPython Notebook documentation.


reStructuredText is an easy-to-read plaintext markup syntax specifically designed for use in Python docstrings or to generate Python documentation. In fact, the reStructuredText parser is a component of Docutils, an open-source text processing system that is used by Sphinx to generate intelligent and beautiful software documentation, in particular the native Python documentation.

Python software has a long history of good documentation, particularly because of the idea that batteries should come included. And documentation is a very strong battery! PyPi, the Python Package Index, ensures that third party packages provide documentation, and that the documentation can be easily hosted online through Python Hosted. Because of the ease of use and ubiquity of the tools, Python programmers are known for having very consistently documented code; sometimes it's hard to tell the standard library from third party modules!

In How to Develop Quality Python Code, I mentioned that you should use Sphinx to generate documentation for your apps and libraries in a docs directory at the top-level. Generating reStructuredText documentation in a docs directory is fairly easy:

$ mkdir docs
$ cd docs
$ sphinx-quickstart

The quickstart utility will ask you many questions to configure your documentation. Aside from the project name, author, and version (which you have to type in yourself), the defaults are fine. However, I do like to change a few things:

> todo: write "todo" entries that can be shown or hidden on build (y/n) [n]: y
> coverage: checks for documentation coverage (y/n) [n]: y
> mathjax: include math, rendered in the browser by MathJax (y/n) [n]: y

Similar to iPython Notebook, reStructured text can render LaTeX syntax mathematical formulas. This utility will create a Makefile for you; to generate HTML documentation, simply run the following command in the docs directory:

$ make html

The output will be built in the folder _build/html where you can open the index.html in your browser.

While hosting documentation on Python Hosted is a good choice, a better choice might be Read the Docs, a website that allows you to create, host, and browse documentation. One great part of Read the Docs is the stylesheet that they use; it's more readable than older ones. Additionally, Read the Docs allows you to connect a Github repository so that whenever you push new code (and new documentation), it is automatically built and updated on the website. Read the Docs can even maintain different versions of documentation for different releases.

Note that even if you aren't interested in the overhead of learning reStructuredText, you should use your newly found Markdown skills to ensure that you have good documentation hosted on Read the Docs. See MkDocs for document generation in Markdown that Read the Docs will render.

To learn more about reStructuredText syntax, please see the reStructuredText Primer.


When writing longer publications, you'll need a more expressive tool that is just as lightweight as Markdown but able to handle constructs that go beyond simple HTML, for example cross-references, chapter compilation, or multi-document build chains. Longer publications should also move beyond the web and be renderable as an eBook (ePub or Mobi formats) or for print layout, e.g. PDF. These requirements add more overhead, but simplify workflows for larger media publication.

Writing for O'Reilly, I discovered that I really enjoyed working in AsciiDoc - a lightweight markup syntax, very similar to Markdown, which renders to HTML or DocBook. DocBook is very important, because it can be post-processed into other presentation formats such as HTML, PDF, EPUB, DVI, MOBI, and more, making AsciiDoc an effective tool not only for web publishing but also print and book publishing. Most text editors have an AsciiDoc grammar for syntax highlighting, in particular sublime-asciidoc and Atom AsciiDoc Preview, which make writing AsciiDoc as easy as Markdown.

AsciiDoctor is an AsciiDoc-specific toolchain for building books and websites from AsciiDoc. The project connects the various AsciiDoc tools and allows a simple command-line interface as well as preview tools. AsciiDoctor is primarily used for HTML and eBook formats, but at the time of this writing there is a PDF renderer, which is in beta. Another interesting project of O'Reilly's is Atlas, a system for push-button publishing that manages AsciiDoc using a Git repository and wraps editorial build processes, comments, and automatic editing in a web platform. I'd be remiss not to mention GitBook which provides a similar toolchain for publishing larger books, though with Markdown.

Editor's Note: GitBook does support AsciiDoc.

To learn more about AsciiDoc markup see AsciiDoc 101.


If you've done any graduate work in the STEM degrees then you are probably already familiar with LaTeX to write and publish articles, reports, conference and journal papers, and books. LaTeX is not a simple markup language, to say the least, but it is effective. It is able to handle almost any publishing scenario you can throw at it, including (and in particular) rendering complex mathematical formulas correctly from a text markup language. Most data scientists still use LaTeX, using MathJax or the Daum Equation Editor, if only for the math.

If you're going to be writing PDFs or reports, I can provide two primary tips for working with LaTeX. First consider cloud-based editing with Overleaf or ShareLaTeX, which allows you to collaborate and edit LaTeX documents similarly to Google Docs. Both of these systems have many of the classes and stylesheets already so that you don't have to worry too much about the formatting, and instead just get down to writing. Additionally, they aggregate other tools like LaTeX templates and provide templates of their own for most document types.

My personal favorite workflow, however, is to use the Atom editor with the LaTeX package and the LaTeX grammar. When using Atom, you get very nice Git and Github integration - perfect for collaboration on larger documents. If you have a TeX distribution installed (and you will need to do that on your local system, no matter what), then you can automatically build your documents within Atom and view them in PDF preview.

A complete tutorial for learning LaTeX can be found at Text Formatting with LaTeX.


Software developers agree that testing and documentation is vital to the successful creation and deployment of applications. However, although Agile workflows are designed to ensure that documentation and testing are included in the software development lifecycle, too often testing and documentation is left to last, or forgotten. When managing a development project, team leads need to ensure that documentation and testing are part of the "definition of done."

In the same way, writing is vital to the successful creation and deployment of data products, and is similarly left to last or forgotten. Through publication of our work and ideas, we open ourselves up to criticism, an effective methodology for testing ideas and discovering new ones. Similarly, by explicitly sharing our methods, we make it easier for others to build systems rapidly, and in return, write tutorials that help us better build our systems. And if we translate scientific papers into practical guides, we help to push science along as well.

Don't get bogged down in the details of writing, however. Use simple, lightweight markup languages to include documentation alongside your projects. Collaborate with other authors and your team using version control systems, and use free tools to make your work widely available. All of this is possible becasue of lightweight markup languages, and the more profecient you are at including writing in your workflow, the easier it will be to share your ideas.

Helpful Links

This post is particularly link-heavy with many references to tools and languages. For reference, here are my preferred guides for each of the Markup languages discussed:

Books to Read

Special thanks to Rebecca Bilbro for editing and contributing to this post. Without her, this would certainly have been much less readable!

As always, please follow @DistrictDataLab on Twitter and subscribe to this blog by clicking the Subscribe button on the blog home page.

Benjamin Bengfort


Read me...

Using curl to download a shortened URL - Dropbox, bit.ly

English: A download symbol.

I was in the middle of an introductory workshop for Data Science at General Assembly and I was talking about using command line instructions to facilitate the manipulation of files and folders. We covered some of the usual ones such as ls, mv, mkdir, cat, more, less, etc. I was then going to demonstrate how easy it was to download a file from the command line using curl and I had prepared a small file uploaded to Dropbox and shortened its URL with bit.ly.

"So far so good" - I thought - and then proceeded with the demonstration... Only to find out that the command I was using was indeed downloading a file, but it was the only downloading the wrapper html created by bit.ly for the re-directioning...  I should have known better than that! Of course all this happening while various pairs of gazing eyes were upon me... I tried again using a different flag and... nothing! and again... nothing... Pressure mounting, I decided to cut the embarrassment short and apologised. Got them to download the file in the less glamorous way by using the browser...

So, if you are ever in that predicament, here is the solution, use the -L flag with curl:

$ curl -L -o newname.ext http://your.shortened.url

The -L deals with the redirectioning of the shortened URL and make sure that you use the -o flag to assign a new name to your file.

E voilà!

Read me...

The physical book! Essential MATLAB and Octave

It has been a long wait, but finally today I got my hands on the physical version of my book. So pleased.

It is available from the publishers

Also in Amazon:
http://www.amazon.co.uk/gp/aw/d/1482234637/ref=redir_mdp_mobile/280-5584446-9196231Essential Matlab and Octave Book


Read me...

How to choose between learning Python or R first - Reblog

This post is a reblog of a post by Chenh Han Lee, the original can be seen at Udacity.

How to Choose Between Learning Python or R First
by Cheng Han Lee

January 12, 2015

If you’re interested in a career in data, and you’re familiar with the set of skills you’ll need to master, you know that Python and R are two of the most popular languages for data analysis. If you’re not exactly sure which to start learning first, you’re reading the right article.

When it comes to data analysis, both Python and R are simple (and free) to install and relatively easy to get started with. If you’re a newcomer to the world of data science and don’t have experience in either language, or with programming in general, it makes sense to be unsure whether to learn R or Python first.

Luckily, you can’t really go wrong with either.

The Case for R

R has a long and trusted history and a robust supporting community in the data industry. Together, those facts mean that you can rely on online support from others in the field if you need assistance or have questions about using the language. Plus, there are plenty of publicly released packages, more than 5,000 in fact, that you can download to use in tandem with R to extend its capabilities to new heights. That makes R great for conducting complex exploratory data analysis. R also integrates well with other computer languages like C++, Java, and C.

When you need to do heavy statistical analysis or graphing, R’s your go-to. Common mathematical operations like matrix multiplication work straight out of the box, and the language’s array-oriented syntax makes it easier to translate from math to code, especially for someone with no or minimal programming background.

The Case for Python

Python is a general-purpose programming language that can pretty much do anything you need it to: data munging, data engineering, data wrangling, website scraping, web app building, and more. It’s simpler to master than R if you have previously learned an object-oriented programming language like Java or C++.

In addition, because Python is an object-oriented programming language, it’s easier to write large-scale, maintainable, and robust code with it than with R. Using Python, the prototype code that you write on your own computer can be used as production code if needed.

Although Python doesn’t have as comprehensive a set of packages and libraries available to data professionals as R, the combination of Python with tools like Pandas, Numpy, Scipy, Scikit-Learn, and Seaborn will get you pretty darn close. The language is also slowly becoming more useful for tasks like machine learning, and basic to intermediate statistical work (formerly just R’s domain).

Choosing Between Python and R

Here are a few guidelines for determining whether to begin your data language studies with Python or with R.

Personal preference

Choose the language to begin with based on your personal preference, on which comes more naturally to you, which is easier to grasp from the get-go. To give you a sense of what to expect, mathematicians and statisticians tend to prefer R, whereas computer scientists and software engineers tend to favor Python. The best news is that once you learn to program well in one language, it’s pretty easy to pick up others.

Project selection

You can also make the Python vs. R call based on a project you know you’ll be working on in your data studies. If you’re working with data that’s been gathered and cleaned for you, and your main focus is the analysis of that data, go with R. If you have to work with dirty or jumbled data, or to scrape data from websites, files, or other data sources, you should start learning, or advancing your studies in, Python.


Once you have the basics of data analysis under your belt, another criterion for evaluating which language to further your skills in is what language your teammates are using. If you’re all literally speaking the same language, it’ll make collaboration—as well as learning from each other—much easier.

Job market

Jobs calling for skill in Python compared to R have increased similarly over the last few years.

R v Python



That said, as you can see, Python has started to overtake R in data jobs. Thanks to the expansion of the Python ecosystem, tools for nearly every aspect of computing are readily available in the language. In addition, since Python can be used to develop web applications, it enables companies to employ crossover between Python developers and data science teams. That’s a major boon given the shortage of data experts in the current marketplace.

The Bottom Line

In general, you can’t err whether you choose to learn Python first or R first for data analysis. Each language has its pros and cons for different scenarios and tasks. In addition, there are actually libraries to use Python with R, and vice versa—so learning one won’t preclude you from being able to learn and use the other. Perhaps the best solution is to use the above guidelines to decide which of the two languages to begin with, then fortify your skill set by learning the other one.

Is your brain warmed up enough yet? Get to it!

Read me...

Failed Battery

I have had this 17-in MacBook Pro for a few years… perhaps about 8 years? Probably a bit more? In any case, I have it more as a memento than anything else as I have a more modern one these days. I still keep it updated and all the rest of it so I was rather surprised to get it out and see that the battery has effectively bursted!!! I hope the rest of the machine still works though :(

Failed Battery


Read me...

Programming Languages Ranking 2014

Well, it seems that it is that time of the month when the TIOBE index releases the rankings of programming languages. Happy to see R improving it position going from 15 to 12. Matlab is at 24 though...

The index is based on number of skilled engineers world-wide, courses and third party vendors that use each of the languages and popular search engines are used to calculate the ratings. Just remember that the TIOBE index is not about the best programming language or the language in which most lines of code have been written.

The definition of the TIOBE index can be found here. In any case here are the rankings:

Nov 2014 Nov 2013 Change Programming Language Ratings Change
1 1 C 17.469% -0.69%
2 2 Java 14.391% -2.13%
3 3 Objective-C 9.063% -0.34%
4 4 C++ 6.098% -2.27%
5 5 C# 4.985% -1.04%
6 6 PHP 3.043% -2.34%
7 8 change Python 2.589% -0.52%
8 10 change JavaScript 2.088% +0.04%
9 12 change Perl 2.073% +0.55%
10 11 change Visual Basic .NET 2.061% +0.09%
11 - change Visual Basic 1.657% +1.66%
12 31 change R 1.548% +1.14%
13 9 change Transact-SQL 1.408% -1.11%
14 13 change Ruby 1.211% -0.09%
15 17 change Delphi/Object Pascal 0.957% +0.31%
16 23 change F# 0.892% +0.39%
17 18 change PL/SQL 0.870% +0.27%
18 - change Swift 0.834% +0.83%
19 14 change Pascal 0.831% +0.12%
20 81 change Dart 0.816% +0.73%
Read me...

Apple Notes and Gmail Notes


I accidentally ended up creating some notes in the Gmail Notes inside my iDevice only to be completely confounded by the fact I could not see them in my desktop. I tried to find some resolution by looking at the instructions for the Apple notes, but got frustrated with the lack of information.

So, here it is how I solved my issue:

It seems that as an Apple Notes user, one can select to have the Notes saved "On my iPhone/iPad/Mac" or synced to any email account of one's choice. If you chose the first option, then no issues there, but the "fun" part comes with the latter. In that case the application will send notes from the device via Gmail to the Gmail servers, or for that matter to the email account you designated under IMAP. This means that your notes are therefore treated as normal email and labelled as "Notes". Not only that, they are automatically archived on arrival. The initial transfer is one-way only and this implies that the notes can't be restored from Gmail to the device. In order to find your Notes in Gmail you have to search for the "Notes" label!

If you call up your note on your device, the application access it from Gmail and displays it. But if you deleted it, as many of us do, then the app gets confused as it does not know where they are... If they are deleted from the device removes the label in Gmail and thus they cannot be accessed by the device and they get zombiefied in Gmail! They will still be present in All Mail, but without label.

How to fix this... well it depends. If the Notes have been deleted from the Gmail account from the web interface they will still be there in the Trash for 30 days. You can "restore" then during that time and will be showing in the Notes App on the device.

If the Notes folder was deleted using the Mail App on the device, the notes will (probably) still be there under "All Mail" but without a label. You can search for them and re-apply the label!

My advice would be not to use the synching at all... it has caused more pains than it should be. Let me know if this helps.

Read me...

Enable NTFS read and write in your Mac

CES 2013 - OWC Mac mini external storage - min...
CES 2013 - OWC Mac mini external storage - miniStack Max (Photo credit: the JoshMeister)

I was confronted with an old issue, that had not been an issue for a while: writing to an external hard drive that was formatted with Windows (NTFS) from my mac. I used to have NTFS-3G (together with MacFUSE) installed and that used to be fine. However, I guess something when a bit eerie with Mavericks as I was not able to get my old solution to work.

So, here is what I did (you will need superuser powers, so be prepared to type your password):

Open a Terminal (Terminal.app) and create a file called stab in the /etc folder. For instance you can type:

$ sudo nano /etc/fstab

You can now enter some information in your newly created file telling MacOS information about your device. If your external drive is called mydevice enter the following:

LABEL=mydevice none ntfs rw,auto,nobrowse

Use tabs between the fields listed above. Save your file and you are now ready to plug your device.

There is a small caveat: Once you do this, your hard drive is not going to appear in your Desktop. But do not disappear, you can still use the terminal to access the drives mounted by going to /Volumes folder as follows:

$ sudo ln -s /Volumes ~/Desktop/Volumes

et voilà!

Enhanced by Zemanta
Read me...

WWDC programme

Yay, it looks like the programme for WWDC has been released.




Read me...

Getting the latest version of Jekyll to work

jekyllI have been playing on and off with Jekyll and I find it very interesting, useful and once installed, easy tool for creating posts. However, the installation may or may not be that easy. Last time I installed it using Ruby directly and did not bother updating it for a while.

Recently I decided to update it and this time I decided to do that with the help of the fantastic Homebrew.

Everything seemed to work fine except that discountkept on complaining. So started afresh and this time round the terminal complained saying:

-bash: jekyll: command not found

The problem was easy to solve once I remembered that brew places the code in the brew Cellar and thus Ruby could not find the gem directory. So I simply added the correct path and exported it:

export PATH=/usr/local/Cellar/ruby/2.1.1/bin

et voilà!

Enhanced by Zemanta
Read me...

Changing date/time in Ubuntu virtualbox

I was a bit puzzled by the fact I could not easily change the date/time in an instance of a virtualbox as used by the High Performance Scientific Computing Coursera course run by Dr. Randall J. LeVeque via Coursera.

I tried using the simple date command but I kept on being told that

date: cannot set date: Operation not permitted

I tried updating the Ubuntu distro, but no luck. Eventually I found a solution using a symlink to localtime:

$ cd /etc
$ mv localtime localtime_original
$ ln -s /usr/share/zoneinfo/Europe/London ./localtime

You will have to use the correct zone for your location. Et voilà!

Enhanced by Zemanta
Read me...

Navigating the terminal

"Leopard" Icons in Black

The work computer of one of my colleagues has recently closed the circle and he has now a shiny new apple computer. He is very well-versed in a bunch of computer related tasks, nonetheless he asked me the other day about shortcuts to navigate a shell terminal. I showed him a few tricks, and I thought posting some here just in case they are helpful to my readers too:

  • To go to the beginning of the command line - Ctlr+A
  • To go to the end of the command line - Ctr+E
  • To delete from the current position to the beginning of the line - Ctrl+U
  • To undelete - Ctrl+Y
  • To delete words to the front of current position - Ctrl+K
  • To delete words to the back of current position - Ctrl-W

He also was wondering about an easy way to create a file and open it immediately. The way I do that is with a bash function placed in my .bashrc file

mytouch {
   touch #1
   open #1


Enhanced by Zemanta
Read me...

An alternative way to reduce the size of PDF in a mac

I am sure you, like me, have had the need to reduce the file size of a PDF. Take for example the occasional need of sending a PDF by email just to find out that the size is such that the message is rejected. I have used Adobe Acrobat Pro to help, but recently I came across an alternative way of achieving this: Use Colorsync Utility in a mac. Here is how:

  1. Right click the PDF that needs reducing and select “Open with…”
  2. Select Colorsync Utility and wait for the application to open the file
  3. At the bottom of the status bar in the application, you can now select one of the quartz filters available
  4. Press “Apply”
  5. and voilà
ColorSync (Photo credit: Wikipedia)
Enhanced by Zemanta
Read me...

Getting Gephi 0.8.2 to work with a Mac

Facebook Network Visualized with Gephi
Facebook Network Visualized with Gephi (Photo credit: yaph)

Ever since the previous Java update for the mac, my Gephi installation was not happy. I resorted to uninstalling version 0.8.2-beta and going back to 0.8.1. Not a bad version, but definitely not one with the latest updates. Well, at least it worked, did not freeze or panicked when trying to click on the menus. :D

I am very pleased to say that I have managed to get my installation of Gephi 0.8.2-beta working and here it is how:

Edit the contents of the package located in


To do so you can right-click on the Gephi application and open "Show Package Contents". You can then navigate to the location mentioned above. I used Aquamacs to edit the file, but you can use your favourite plain-text editor.

Towards the beginning of the file add the following line:


Save the file and start Gephi as usual. This did the trick for me. I would like to credit the GitHub page for Gephi were I ended up connecting the dots.

Enhanced by Zemanta
Read me...

Aquaterm plotting issue with Octave and Gnuplot (Mac)

Octave Mac

I recently updated my version of Octave using Homebrew and something went a bit eerie... Nothing major except that instead of plotting to the Aquaterm terminal, Octave and Gnuplot were only happy with X11. Not the greatest of issues, but I really prefer the look of graphs in Aquaterm and here are some steps I followed to get things sorted:

First I uninstalled gnuplot from Homebrew using:

brew uninstall gnuplot

Just in case the problem was with AquaTerm I re-downloaded it and installed it again. You can obtain AquaTerm here. I then reinstalled gnuplot just to realise that some symlinks were not created. You can check them thy typing:

ls /usr/local/lib/libaquaterm*
ls /usr/local/include/aquaterm/*

If they do not, you can set them up by typing the following commands in your shell:

sudo ln -s /Library/Frameworks/AquaTerm.framework/Versions/A/AquaTerm /usr/local/lib/libaquaterm.dylib

sudo ln -s /Library/Frameworks/AquaTerm.framework/Versions/A/AquaTerm /usr/local/lib/libaquaterm.1.0.0.dylib

sudo ln -s /Library/Frameworks/AquaTerm.framework/Versions/A/Headers/* /usr/local/include/aquaterm/

That did the trick for me. I hope you find this helpful.

Read me...

Essential MATLAB and Octave

As probably some of you know, I am currently writing a book about MATLAB and Octave focussed at new comers to both programming and the MATLAB/Octave environments. The book is tentatively entitled "Essential MATLAB and Octave" and I am getting closer and closer to getting the text finished. The next step is preparing exercises and finalising things. My publisher, CRC Press, has been great and I hope the book does well.

I'm aiming to finish things by May and in principle the book will be available from Novemeber or so. The whole process does take a while but I am really looking forward to seeing the finished thing out there.

So, what triggered this post? Well, I have seen the appearance of a site with the book announced. I am not sure if these are usual practices but in any case it is a good thing, don't you think?




Read me...

Grace Hopper Doodle



Once again Google puts out a doodle worth mentioning. This time they celebrate the 107th birthday anniversary of computer scientist Grace Hopper.
In case you do not know who Hopper is, well, let me smile say that she is the amazon woman behind COBOL (Common Business Oriented Language), which is still very much used today.

Grace Hopper was born in  New York in 1906  and studied Mathematics and Physics (of course) at Vassar College where she graduated in 1928. She then obtained a master's degree at Yale in 1930 and a PhD in 1934.

Hopper joined the US Navy reserve during World War two and she was assigned to the Bureau of Ordinance Computation Project at Harvard University where she was only the third person to program the Harvard Mark I computer. She continued to work at Harvard until 1949 when she joined the Eckert-Mauchly Computer Corporation as a senior programmer.

She helped to develop the UNIVAC I, which was the second commercial computer produced in the US. In the 1950s Hopper created the first ever compiler, known as the A compiler and the first version was called the A-O.

Hopper continued to serve in the navy until 1986 when she was the oldest commissioned officer on active duty in the United States Navy.

She died in Arlington, Virginia in 1992 at the age of 85.

Grace Hopper behind my keyboard
Grace Hopper behind my keyboard (Photo credit: Alexandre Dulaunoy)


Read me...

LondonR - Shiny

I had the chance to attend the latest LondonR meeting last week. It was a good interesting gathering and I was pleasantly surprised to see that it was well attended by a variety of like-minded people.

The meeting had talks by

  • Andy South - Making beautiful world maps with country-referenced data using rworldmap and other R packages
  • Malcolm Sherrington - Algorithmic Trading with R
  • Chris Beeley - Shiny happy web interfaces - Shiny, HTML, CSS, JavaScript, and Shiny Server working together

I am also very pleased that I managed to be on time to answer the question that Chris Beeley put on the day to win a digital copy of his book Web application development with R using Shiny. The book is available form Packt Publishing, Thanks to Chris Beeley and Packt for the book.


web app

Read me...

Furigana (ふりがな) in LaTeX

Furigana example
Furigana example (Photo credit: Wikipedia)

Some time ago I wrote a post about adding furigana using MS Word for Mac. It seems that the post has been quite useful to a few readers, nonetheless some of you have contacted me about the remark I made about doing this with in LaTeX.

So far I have helped people when they have requested help, but as I promised in that post, I have finally come to adding a post to add furigana using LaTex. Here is how:

You will need the following packages installed in your LaTeX distribution:

With these packages installed and working in your distribution, you can now use a document similar to the following:


usepackage[10pt]{type1ec} % use only 10pt fonts
usepackage[german, russian, vietnam, USenglish]{babel}
usepackage[overlap, CJK]{ruby}
noindent これは日本語の文章
noindent Hello 
furigana: ruby{私}{わたし}

The outcome of the script above can be seen below:

Furigana Latex

Read me...

Disabled bundles for Mail in Mavericks

I have just updated a previous post with some of the UUIDs for using some plugins with Mail.app. The correct strings for Mail 7.0 in Mavericks are:


For instructions on what to do with the strings above, please refer to this post.

Mail icon
Mail icon (Photo credit: Wikipedia)
Read me...

Command Line: a few tips

Terminal icon in OS X

In various posts in the past, I have given some tips using the Terminal and some comments have arrived about how complicated they may seem. Nonetheless, I still think that the flexibility offered by the tools provided are what make the UNIX/Mac environment so good. So in this post I would like to share some useful tips to use the terminal. Let me know what you think!

1. Download a File from the Web & Watch Progress

If you know the URL of a file that you need to download from the web you can use curl with the -O command to start downloading it:

$ curl -O url

Be sure to use the full URL. Also, remember to use the upper case ‘O’ and not the lowercase ‘o’ to keep the same file name on your local machine.

2. List Directory Contents by Modification Date

You can indeed take a look at the graphical interface, but if all you want is a quick list of the files in a directory showing permissions, users, file size, and modification date, with the most recently modified files and folders appearing from the bottom up then simply type the following:

$ ls -thor

3. Search Spotlight with Live Results from the Command Line

To do that you can use the mdfind command:

$ mdfind -time findme

This can go awfully quick depending on the specificity of the searched terms, but if you see a match hit Control+C to stop looking.

4. Kill Processes Using Wildcards

Simply use the pkill command. For example, if you want to get rid of all the processes that start with "Sam" just type:

$ pkill Sam*

5. Re-Run the Last Command as Root

The bang is you friend (!) In order to re-run the last command typed but as root type the following:

$ sudo !!

6. Get the Last Occurrence of a Command Without Executing It

Once again, the bang is your friend. Use the following command, where "searchterm" must be substituted by the command you are looking for:

$ !searchterm:p

For example, to find the last full command that used the prefix “sudo” you would use:

$ !sudo:p

7. Instantly Create a Blank File or Multiple Files

All you have to do is "touch" the file...

$ touch filename

You can list out multiple names to create multiple files too.

Do you have any favourite commands or tips to use the command line? Let me know.

Read me...

Cool new features in iOS 7

I have been playing with iOS 7 and there is a lot of new things in it. For example:

Mail has new Smart Mailboxes that can be helpful in filtering email: you can access them in the Mail app, just tap on the Edit button when you’re in the mailbox view.

The Compass app has added a level feature . Just swipe to the left when you launch the Compass app to find it.

The Camera app, as before, lets you hold it in landscape mode and use the Volume Up button to take the picture. Now, you can also hold the Volume Up button to have your iPhone autofocus the scene , without having to tap on the screen. You can also apply filters directly in the camera!

Great stuff!

Read me...

You can now manage dictionaries in iOS 7, great!

I have been playing with iOS 7 for a while now and I just came across a very good addition to the experience: you can now actively manage the dictionaries to be downloaded and used in various applications. A very useful feature, at least for me.



Read me...

Fixing disabled bundles/plugins for Mac Mail

Mail icon
Mail icon (Photo credit: Wikipedia)

I confess (not that you might not know, but still...) that I am a mac user. I have been for some time now and I quite like it. As such I have got used to using Mail.app for checking my email although I am well aware that it is by no means the best email client ever. Nonetheless, I do use it, and I do in conjunction with some useful plugins such as TruePreview and Universal Mailer that help improving the experience. TruePreview lets me scan an email without marking it as read, whereas Universal Mailer removes some annoying ATT00001.htm files that recipients get and correctly formats messages with attachments and inline images. They both work quite well... until Apple decides to upgrade the Mail.app and disables the plugins. One thing that you can do when this happens is to shake your fist in the air, curse a bit and wait until the plugins get updated. Or... you can try this and hope that the plugin starts working by adding the correct Universally Unique Identifier, or UUID, to the plist file. Here is how and please note that the UUID values quoted are valid for Mail.app 6.6 (OS X 10.8.5):

  1. Quit Mail
  2. Open Terminal
  3. Type: cd Library/Mail
  4. If you request a list of the files (using ls) you should see a folder like "Bundles (Disabled)"
  5. Type: cd Bundles (Disabled) - Note that the folder may be numbered, so check the contents...
  6. If you are trying to fix TruePreview then type: cd TruePreview.mailbundle/Contents
  7. If you are trying to fix Universal Mailer then type: cd UniversalMailer.mailbundle/Contents
  8. This may also work for other plugins... so check the correct folder.
  9. Once you are in the correct folder for the plugin, type: open -e Info.plist
  10. The command will launch a text editor and you will be able to edit it. Go towards the end of the file and add the following lines:


With the recent release of Mavericks, the Mail.app application is currently version 7.0. The correct strings to use are as follows:


Please note that although the above strings make Mail accept plugins such as TruePreview, the functionality does not work. 
In the case of Universal Mailer, there is an update for Mavericks here.
Once that is done, save the file and close it. Back in the Terminal
  1. Type: cd ../..
  2. Now we need to move the "Disabled" plugin to the working folder. For TruePreview type: mv TruePreview.mailbundle ../Bundles
  3. A similar command can be used for other plugins
  4. Finally, launch Mail
  5. And Voilà

So, what happens next time that the Apple gods decide to update Mail? Well, you can obtain the correct UUIDs as follows:

defaults read /Applications/Mail.app/Contents/Info PluginCompatibilityUUID

defaults read /System/Library/Frameworks/Message.framework/Resources/Info PluginCompatibilityUUID

Each of them will return a two different strings that are the new UUIDs for Mail.app running under. You then need to add those strings to the info.plist file as explained above.

This may come handy when Mavericks is finally out... Then again, things may have changes a lot behind the scenes...

Let me know if you have found this tip helpful!

Read me...

All Watched Over by Machines of Loving Grace

This is the title of the 1967 poem by Richard Brautigan and of course that is the name of the great three-part documentary by Adam Curtis. If you haven't watched it, please do yourself a favour and take a look.

All Watched Over by Machines of Loving Grace

by Richard Brautigan

I'd like to think (and
the sooner the better!)
of a cybernetic meadow
where mammals and computers
live together in mutually
programming harmony
like pure water
touching clear sky.

I like to think
(right now, please!)
of a cybernetic forest
filled with pines and electronics
where deer stroll peacefully
past computers
as if they were flowers
with spinning blossoms.

I like to think
(it has to be!)
of a cybernetic ecology
where we are free of our labors
and joined back to nature,
returned to our mammal brothers and sisters,
and all watched over
by machines of loving grace.

Read me...

Relating Airy and Bessel functions

Reblogged from The Endeavour by J. D. Cook.

The Airy functions Ai(x) and Bi(x) are independent solutions to the differential equation

y'' - xy = 0

For negative x they act something like sin(x) and cos(x). For positive x they act something like exp(x) and exp(-x). This isn’t surprising if you look at the differential equation. If you replace xwith a negative constant, you sines and cosines, and if you replace it with a positive constant, you get positive and negative exponentials.

The Airy functions can be related to Bessel functions as follows:

mathrm{Ai}(x) = left{ begin{array}{ll} frac{1}{3}sqrt{phantom{-}x} left(I_{-1/3}(hat{x}) - I_{1/3}(hat{x})right) & mbox{if } x > 0 \<br /><br /><br /> \<br /><br /><br /> frac{1}{3}sqrt{-x} left(J_{-1/3}(hat{x}) + J_{1/3}(hat{x})right) & mbox{if } x < 0 end{array} right.


mathrm{Bi}(x) = left{ begin{array}{ll} sqrt{phantom{-}x/3} left(I_{-1/3}(hat{x}) + I_{1/3}(hat{x})right) & mbox{if } x > 0 \<br /> \<br /> sqrt{-x/3} left(J_{-1/3}(hat{x}) - J_{1/3}(hat{x})right) & mbox{if } x < 0 end{array} right.

Here J is a “Bessel function of the first kind” and I is a “modified Bessel function of the first kind.” Also

hat{x} = frac{2}{3} left(sqrt{|x|}right)^3

To verify the equations above, and to show how to compute these functions in Python, here’s some code.

The SciPy function airy computes both functions, and their first derivatives, at once. I assume that’s because it doesn’t take much longer to compute all four functions than to compute one. The code for Ai2 and Bi2 below uses np.where instead of if... else so that it can operate on NumPy vectors all at once. You can plot Ai and  Ai2 and see that the two curves lie on top of each other. The same holds for Bi and  Bi2 .

from scipy.special import airy, jv, iv
from numpy import sqrt, where

def Ai(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return ai

def Bi(x):
    (ai, ai_prime, bi, bi_prime) = airy(x)
    return bi

def Ai2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        third*sqrt( x)*(iv(-third, hatx) - iv(third, hatx)),
        third*sqrt(-x)*(jv(-third, hatx) + jv(third, hatx)))

def Bi2(x):
    third = 1.0/3.0
    hatx = 2*third*(abs(x))**1.5
    return where(x > 0,
        sqrt( x/3.0)*(iv(-third, hatx) + iv(third, hatx)),
        sqrt(-x/3.0)*(jv(-third, hatx) - jv(third, hatx)))

There is a problem with Ai2 and Bi2: they return nan at 0. A more careful implementation would avoid this problem, but that’s not necessary since these functions are only for illustration. In practice, you’d simply use airy and it does the right thing at 0.


Read me...

R ranked as #1 language for data mining and analytics

R ranked as #1 language for data mining and analytics (source KDnuggets)

r analytics

Read me...

Sayings 2.1


Reblogged from “Sayings 2.1” by Doghouse Diaries

Previously: “Sayings 2.0

Sayings 2_1

Read me...

Syncing Gmail contacts with Mac (OS)

Aha... now that I have posted about Gmail contacts and iOS, it seems that some of us are having questions about syncing with Mac OS... Well, here is a brief post about this.

OPTION 1 - Synchronize with Google 

Open the Contacts App and go to "Preferences". Go to "On My Mac" and you will see the following window:




Select "Synchronize with Google", you will need to accept the terms and conditions and then authenticate your account. Note that, this method will bring all Gmail contacts under All Contacts (which includes everybody you have emailed) and they will be pushed to your Mac OS X address book. If you want to keep your Gmail and iCloud (for instance) or any other account separate from each other, then the next option is for you.

OPTION 2 - Separate Accounts

If you want to keep your Gmail contacts separate, then go to the "Preferences" in the Contacts app and follow these steps:

  1. Open the Accounts category.
  2. Click + and select CardDAV
  3. Type your full Gmail address for the User name and authenticate your account
  4. Under server enter "google.com" and click "Create"

If you do not want to sync your Mail, Notes or Calendars, make sure they are not selected. After you do this you will have your Gmail contacts under a new account in the groups list.



Read me...

Syncing Gmail contacts to iPhone or iPad

Further to my previous post about Gmail, I noticed earlier in the week that I was missing a number of contacts in my phone. "Surely I did not delete them, did I?" I thought... Checked and checked again and indeed I was missing contacts stored in my Gmail account. How could that be? I had set up my account so that my contacts were synced... oh well, it turns out that back in September 2012 Gmail announced the possibility to sync contacts using CardDAV. However, in December, in their wisdom they decided to discontinue Google Sync from January 30th 2013. I had no idea about this, but here is a post from macworld about it.

So, my contacts had not been deleted, Gmail was not working as I imagined... So, here is how to sync your contacts using CardDAV.

  • Open the "Settings" application on your iPhone or iPad.
  • Select Mail, Contacts, Calendars.
  • Select "Add Account" and then "Other"
  • Select "Add CardDAV Account" and fill in the information. In the Server enter "google.com"
  • Select "Next" at the top of your screen, and make sure that the "Contacts" option is turned on.

Once you have done that open your Contacts app and the syncing will start.

A similar problem may occur for calendars, if you are looking for info about that follow this link



Read me...

Fixing Gmail incorrect password/user error (iPhone and iPad)


I recently got a very annoying error message in an iPhone regarding the password/user resolution for my Gmail account. I again and again kept entering the correct combination and nothing seemed to work. I thought I was mistyping, or perhaps my fingertips suddenly got fatter... I tried logging in via the website and lo and behold I could indeed login.

Error PasswordThe message: 'The user name or password for "imap.gmail.com" is incorrect' and it appears even if you haven't made any changes to your account. Well, here is a way to solve the issue:

  1. Quit all mail clients that are having the issue. For the iPhone or iPad you can open the multi-tasking menu and tap the mail icon for a few seconds. A red minus sign will appear at the upper left-hand corner. Tap the minus sign and that quits the application.
  2. Open a web browser (Safari will do) and go to the following link: http://www.google.com/accounts/DisplayUnlockCaptcha
  3. Login using your credentials for the troublesome account to verify the account.
  4. Open again your Mail app and voilà


Read me...

RFU Podcast feed - one that can be used!

Association crest
Association crest (Photo credit: Wikipedia)

I have been waiting quite a bit for the RFU to make their podcasts available via iTunes or some other similar service. I used to listen to them but for one reason or another the feed changed to the extent that no submission was done to iTunes and the RSS of the RFU's website is basically dead.

So, inspired by a post by Rolando Garza, I decided to hack an RSS that can actually be used to download the RFU podcast and get some information about Rugby. So I used a combination of Feedity which used HTML scraping to generate an RSS of almost any page. With the help op Yahoo Pipes I managed to use the magic of regular expressions to add appropriate dates and enclosures to the feed and the result is the RFU Podcast Feed.

So, as long as the RFU does not change the way they deal with their website and the posting of their mp3 content, then you can enjoy a bit of Rugby right in your mp3 devices.

Read me...

Remove Duplicated from "Open with" in OS X

I quite like the some of the "user-friendly" features that come with OS X. But sometimes they get a bit clogged up. A point in case is the “Open With” menu that appears when any file in the Finder is right-clicked. The idea of the menu is to provide the user with the chance of choosing an alternative programme or application to open a file with, instead of the one by default.

The problem is that sometimes Finder creates multiple copies of the same entry in the menu and although harmless, it is very annoying. So, how can you fix this problem, well do not despair, the command line is your friend:

  • Launch Terminal from the /Applications/Utilities/ directory and following command string onto a single line:
/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/LaunchServices.framework/Versions/A/Support/lsregister -kill -r -domain local -domain user
  • Then, once the previous command has finished (and it may take a while) in order for the command to do its trick type the following in the command line:
killall Finder

Now when Finder has relaunched, all the repeated entries must be gone.

Read me...

Fixed Problem with Sparrow Not Quitting

Sparrow (e-mail client)
Sparrow (e-mail client) (Photo credit: Wikipedia)

When I first heard of Sparrow, a new email client (in beta back October 2010) that used the IMAP protocol and promised to be good, minimalist e-mail interface, I was indeed interested in finding more about it. At the beginning you could only use it to read GMail, but eventually it started supporting providers, including even Exchange.

I was not particularly horrified by its acquisition by Google, but I was a bit sad to see that the move meant that they would not update or change the application. There was even that petition asking Google to "keep Sparrow alive". Anyway, after the taken over took place, I found that suddenly some bit and pieces were not working as they had been. One that has been an annoying one, was the fact that some accounts would take forever to download mail and particularly the fact that when quitting the application, it would just get zombified, i.e. Sparrow would not quit and I would have to Force Quit the application.

I have now managed to sort that problem. I am not sure if this was an issue with my Exchange server or indeed a bug in Sparrow, but now the application is happier than it was for a long time (and me too!). The issue was with a particular Exchange account, more specifically with the SMTP server. For some reason, Sparrow did not like the settings that seem to be working fine in Mail.app or Thunderbird. So, here is what I did:

I deleted the account and started afresh setting it up, when asked for the SMTP server details, I made sure that the "Secure Connection" option was left unticked and that the Port was the one specified by my provider (in this case Port 25 - see figure below). That has made the application to work and exit fine. If you are having a similar problem, I hope this helps.sparrow


Read me...

Programming Language Index - version 2013

A couple of years ago I had a look at the state of the TIOBE index that ranks the most popular programming languages.

So has C# finally dethrone C++ as THE language of the year? Or has LOLCODE and Brainfuck made it into the list? Well not quite, but an interesting thing is the uptake of Objective-C taking the third place! Of course an explanation can be found in the explosion of iOS apps that are developed with that language.

The usual suspects, i.e. C and Java are still at the top, followed by Objective-C and C++. It is interesting to note that they all share a very similar structure.

Position Jan 2013 Position
Jan 2012
Delta in Position Programming Language
1 2 C
2 1 Java
3 5 Objective-C
4 4 C++
5 3 C#
6 6 PHP
7 7 (Visual) Basic
8 8 Python
9 9 Perl
10 10 JavaScript
11 12 Ruby
12 24 Visual Basic .NET
13 13 Lisp
14 14 Pascal
15 11 Delphi/Object Pascal
16 17 Ada
17 23 MATLAB
18 20 Lua
19 21 Assembly
20 72 Bash

Languages in the other top ten are pretty good candidates and should not be too much of a surprise to see PHP, VB and Python there. Nice to see that languages like Pascal and Ada are still there in the top 20. But Bash? Really? How can we explain the move from 72nd to 20th?

And after that? Well, Fortran appears in place 25th... (I know!), COBOL and SQL are there and for those that have taken the R programming language to their hearts, it makes an appearance at the 26th place. An interesting addition is the appearance of the educational language Alice at the 50th place.

Position Programming Language Ratings
21 PL/SQL 0.585%
22 Transact-SQL 0.578%
23 SAS 0.571%
24 COBOL 0.496%
25 Fortran 0.462%
26 R 0.444%
27 Scheme 0.433%
28 ABAP 0.430%
29 Logo 0.389%
30 Prolog 0.359%
31 Erlang 0.334%
32 Haskell 0.331%
33 Scala 0.319%
34 Q 0.318%
35 D 0.296%
36 RPG (OS/400) 0.291%
37 Smalltalk 0.254%
38 Forth 0.239%
39 APL 0.235%
40 NXT-G 0.233%
41 ML 0.227%
42 Common Lisp 0.206%
43 ActionScript 0.195%
44 Awk 0.192%
45 F# 0.187%
46 Scratch 0.187%
47 PL/I 0.167%
48 LabVIEW 0.165%
49 Tcl 0.159%
50 Alice 0.158%
Read me...

Peltier Effect - Sci-advent - Day 13


peltier effectThe Peltier effect is named after Jean Charles Athanase Peltier who discovered it by accident while investigating electricity. In the eventful experiment, Peltier joined a copper and a bismuth wires together and connected them to each other, then to a battery. When he switched the battery on, one of the junctions of the two wires got hot, while the other junction got cold.

The Peltier effect is the heat exchange that results when electricity is passed across a junction of two conductors, and is a close relative of the Seebeck effect (effectively the same phenomenon in reverse, used in thermocouples used to measure temperature), and the Thomson effect (generation of electricity along a conductor with a temperature gradient). Sparing ourselves the maths, conduction electrons have different energies in different materials, and so when they are forced to move from one conductor to another, they either gain or lose energy. This difference is either released as heat, or absorbed from the surroundings.

When two conductors are arranged in a circuit, they form a heat pump, able to move heat from one junction to the other. Unfortunately, though, it’s not always this simple, as the Peltier effect is always up against the Joule effect – the ‘frictional’ heating that results from electrons bouncing off the atoms. In most systems, this swamps the Peltier effect, and means that all that you get is a bit more heating at one junction, and a bit less heating at the other. Nonetheless, the Peltier effect has a lot of technological potential. It is very reliable, and since it has no moving parts, it rarely needs maintenance while being mobile.

Read me...

Ada Lovelace – Sci-advent – Day 9

Ada Lovelace

Ada Lovelace. Painting by Margaret Sarah Carpenter (1793–1872)

Ada Augusta Byron, Countess of Lovelace, was the daughter of the poet George Gordon, Lord Byron. She studied mathematics at the University of London with Charles Babbage, whose analytical engines were the precursors of the modern computer. Today 10th of December, it would have been her 197th birthday. That is why Google created a doodle for her (see image below).

Ada Lovelace is today known as a mathematician and computer pioneer; she created the concept of an operating system. Supplementing her translation of an Italian article on Babbage's analytical engine with an encoded algorithm she published the first computer program, albeit for a machine that would not be built until more than 150 years later as a historical project.

The Ada computer language was named after her.

Lovelace Doodle

Read me...

The Babbage Difference Engine - Sci-Advent - Day 3



In 1849, British inventor Charles Babbage completed designs for a difference engine, a very early mechanical computer. Due to cost and complexity the machine was never built in his lifetime and for 150 years nobody knew if the machine would have worked. In 2002, a Babbage Difference Engine based on the original plans was completed—and it actually works. The hand-cranked device has 8,000 parts, weighs 5 tons, and is 11 feet long. Two such machines now exist, one at the Science Museum in London and another at the Computer History Museum in Mountain View, California. To get a sense of the incredible intricacy of the Babbage Difference Engine, take a look at these interactive high resolution images of the Computer History Museum machine. The images, created by xRez Studio, are each composites of up to 1,350 individual photos. The studio also shot this short video of the machine in operation.


Read me...

The Harwell Dekatron is alive... alive!

If you happen to have a chance to visit Bletchley Park do not miss the opportunity to visit the National Museum of Computing where you will be able to see a large collection of computers of all sizes and ages. A recent addition is the Harwell Dekatron / WITCH which came back to like on November 20th, 2012.

The Harwell Dekatron or WITCH is the World's oldest original working digital computer dating from 1951. WITCH is an acronym that stands for Wolverhampton Instrument for Teaching Computing from Harwell. The computer aquired this name when, in 1957,  it was offered in a competition to an educational establishment. The competition was won by the Wolverhampton and Staffordshire College of Technology.

The machine uses "dekatrons" for its volatile memory (think of is as RAM) and it works on a decimal system, as opposed to the binary. The dekatrons are visible and thus one can literally see the state of the memory when the machine is operating. This sounds great when trying to explain how a computer works!

More information can be obtained here

Read me...

Microsoft Office 2010 - issue with opening files as "Read-Only"

Office 2010 logo

This happened to me the other day when trying to open an older(-ish) Excel file created with Office 2003 in the new 2010 version of the software: I double clicked on the file and a message appeared telling me that the file will be opened in read-only mode and that whenever it becomes free then I will be able to edit it. The strange thing is that no one else had the file opened.If you have the same issue with your files, read on.

There seems to be a new feature in the 2010 edition of MS Office called Protected View created to "enhance protection against mail attachments, files originated from the internet and located in unsafe locations". This sounds great, but the problem is that Protected View will remove support for legacy document formats, and causes these documents to be opened in read-only mode. A solution posted my Microsoft is:

  1. Run the Office 2010 application with the problem. Notice that this procedure has to be done individually with each of the applications in MS Office suite (great!).
  2. Click on the Office button on the upper left-hand corner and select "Options"
  3. In the "Options" dialogue box, select "Trust Center" (on the left)
  4. Click on "Trust Center Settings" (on the right)
  5. Select "Protected View"
  6. Disable any of all the protected view options by unticking the check boxes.
  7. Click OK when done.

Another alternative is to re-save your legacy document. In order to do that do the following:

  1. Open the problematic legacy document
  2. Click File and select Save As
  3. In the dialogue box, on the lower left-hand corner there is a drop-down menu called "Tools", select "General Options"
  4.  Make sure that the "Read-Only recommended" check box is unticked.
  5. Save the file and hope for the best...

I hope this is useful to you.

Read me...

iBooks Author supports LaTeX now

When Apple launched iBooks Author back in January 2012 I was quite curious to see the things that you were able to do with it. It all looked very nice and relatively easy to use. You can create documents using some templates provided and you then are able to export them as PDF or even publish them as iBooks.

Unfortunately, at the time, Apple failed to put any easy support to include equations or mathematical symbols. That alone put me off using the application altogether (see post). However, in the recent update (released on October 23rd) Apple has finally included an equation editor that uses LaTeX or  MathML. I have just tried it and it seems to do a good work. Definitely not as powerful as the actual LaTeX engine (it does not let you number the equations automatically for instance), but it is an improvement.

Here are some screenshots of the little first trial I did. As you can see the update clearly states that the new editor accepts native LaTeX or MathML:



Now, to insert a new equation:



This opens up the equation editor:



In the new window you can start typing your LaTeX commands. Notice that you don't need to start an equation environment as you would do in LaTeX, you simply type the commands that will create the maths:



Once you have done that, simply tell iBooks Author to insert the equation, and voilà:



Have you used iBooks Author? What do you think of it? What is your opinion about the support for LaTeX?

I think I may give it a go, but will probably continue using LaTeX itself. If you want to learn learn about using it have a look at these past posts:

Read me...

MATLAB for mac in Mountain Lion without X11

Well, I have now made the move to Mountain Lion and for a bit it did look quite good, until I had the nerve of trying to start MATLAB. Now, I must admit that the version of MATLAB that I have is by no means the latest, but it does do the work (for those of you who asked, I am running 2008a). So, I realised that the final blow for X11 was given and that Mountain Lion did indeed get rid of it.

I had encountered this issue when upgrading GIMP, and at the time, everything seemed to be working fine with XQuartz. So, there was I thinking to myself "It is just a matter then of re-installing XQuartz and off we go". How wrong was I! I installed XQuartz, downloaded from here. The first glimpse that things were not quite correct was when I had to tell manually to GIMP the location of X11. Then tried to launch MATLAB and quite quickly the following message popped up:

"X11 does not appear to be installed. X11 version 1.1.3 or greater is required. For OS X 10.5 or later, X11 is available on the OS X installation DVD. Please find and run the Optional Installs.mpkg installer."

Great thing that Mathworks has told me that, but Apple does not do X11 anymore, so no installing from the DVD, righ?t! Worse still, unlike GIMP, there was no prompt from MATLAB to tell it the location of X11. I tried creating some symbolic links, but this did not work either. Finally, after a lot of fiddling and searching and all, I found a way to run MATLAB successfully. A solution? Oh well here it is:

  1. Install XQuartz
  2. Launch XQuartz and from the menu launch an xterm.
  3. Type the following command first:
  • $MATLAB/bin/matlab -maci

Where $MATLAB is the path to your installation. And voilà!

Incidentally, if you are having problems with the graphics in MATLAB, such as the application crashing when plotting and the like, you can type the following command before launching MATLAB as specified above:

export DYLD_LIBRARY_PATH=/System/Library/Frameworks/JavaVM.framework/Libraries

Let me know how you get on with this and should you find another alternative solution let me know!



Read me...

Python for iOS

There used to be a time when computers came with tools for someone to start programming, something like a version of BASIC would get you started. That has changed to the point that some users cannot even imagine how to interact with their machines without a nice, eye-candy, even cumbersome graphical interface.
iPhones and iPads are indeed powerful devices, but in their wisdom Apple would not let you easily program them. Fortunately people are not easily convinced to drop it and recently I came across Python for iOS which is available in the AppStore. The application provides us with a simple Python interpreter that make it easy to use in these devices. The user needs to remember that the application does not create native apps, but the tool might be very handy in conjunction with a more advanced development tool. Also, it has the added bonus of allowing newcomers to start programming in devices that are largely seen to be purely as consumer ones as well as using a popular language.

Will you give it a go? Let me know what you think.

python_2 Python Python

Read me...

A hidden shortcut to switch to previous Desktop Space in Mac OS X

A picture of the Magic Trackpad next to the Ap...
A picture of the Magic Trackpad (Photo credit: Wikipedia)

Imagine this, you are using Desktop 1 to write a long document and you are doing so with information that is displayed in Desktop 4. You can indeed move the relevant window from Desktop 4 to Desktop 1, but that simply does not help. So you end up moving back and forth between the two. Did you know that you can do this using a double-tap with four fingers on your trackpad? No? Well, this is because it is a hidden gesture. To activate the gesture all you have to do is open a Terminal (Finder - Applications - Utilitites - Terminal) and type the following two commands (please note that the first line is one command):

defaults write com.apple.dock double-tap-jump-back -bool TRUE
killall Dock

The changes take effect immediately after the second command is issued. Enjoy!

Read me...

Japanese writing in iOS and Mac OS X

I have written Japanese (not as well as I would like though) for some time and  doing so with the computer is always a pleasure. I'm used to almost always use the same dictionary, the same method of writing, the same shortcuts, etc. Here are some of the things I use.

In iOS:
- I have always used the "qwerty" keyboard for both Japanese and the rest (UK English, Spanish), and I have occasionally used the 10-key swiping method with the Kana keyboard (see picture below). In order to use the kana keyboard you really need to know your syllabary (hiragana) as the keys are arranged by sound, and all you have to do is select the correct consonant sound and swipe in the direction of the vowel sound:

  • A – in the middle,
  • I – swipe to the left,
  • U – swipe upwards,
  • E – swipe to the right,
  • O – swipe downwards.



It may take a while to get used to it, but I just love the simplicity of it all and it even has a key for emoticons!

The Japanese auto-completion seems much more advanced in mobiles than in computers, and I think better than editors in either English, Spanish or other languages because it automatically chooses for you. As for a dictionary I use Kotoba.It is free and works without a network connection.

For Mac:
I use "kotoeri" by default. I have also enabled the option to write kanji by hand with the trackpad to search the dictionary for kanji. Instruction on how to activate it and use it can be found here. In a nutshell:

  1. Choose System Preferences from the Apple () menu.
  2. Choose Language & Text from the View menu.
  3. Select the Input Sources tab.
  4. Enable the checkboxes for Pinyin, Wubi Xing, or Wubi Hua under Chinese - Simplified, or Cangjie, Dayi Pro, Jianyi, Pinyin, or Zhuyin under Chinese - Traditional.

In OS X you can type accents and other characters with "option key" combinations without changing the keyboard layout. Also, you can press each letter for a few seconds and this will open up a menu box similar to the iOS version. Try it up!

As a native dictionary I use JEDict (free), plus a few that I consult on web such as Denshi Jisho.

To use Katakana you can hit Ctr+K, which converts things directly to the script. Finally, I recommend writing Japanese text in a Japanese font, because most Western sources do not have the characters ans this can always be an issue.

Read me...

It seems Apple took down the iOS version of Chrome rather quickly. Tantrum?

I heard via Cult of Mac that the iOS version of Chrome was available in the AppStore. My friend downloaded it successfully and kept on going about how quickly it was. And indeed it was.
After dinner I tried to download it too, but I found that whenever I tried to I kept on getting a "The item you tried to buy is no longer available".
Did Apple just threw the toys out of the pram? Have you been able to download the app?

Here is the article from Cult of Mac:
Chrome for iOS

No chrome 2 No chrome 1










UPDATE: 29th June, 2012
It looks like this was a glitch with the app store... I managed to download it just now.

Read me...

Happy birthday Turing

Today, a 100 years ago Alan Turing was born. As a form of celebration Google has put a functioning Turing machine as their latest doodle. A Turing machine is a device that uses a tape with symbols that are manipulated according to certain rules and as you can imagine it was proposed by Turing in 1936.

Turing machine

Read me...

Setting up Posterous in Tweetbot for iPad

Posterous Logo
Posterous Logo (Photo credit: SWikipedia)

UPDATE: Sadly this post is now obsolete with the shutting down of Posterous...


I have recently started using Tweetbot as a Twitter client and I must say that I am quite pleased with the way it handles things like mentions, RTs and particularly the display of media such as photos and video. It seems to be quite easy to use and setup multiple accounts. However, there was something that I didn't quite like... I tend to use Posterous to upload pictures and other media. I prefer this to services such as Twitpic or Moby, and as such I was expecting Tweetbot to handle Posterous as easily as these other services. Although in their site Tweetbot mention that they support Posterous, once in the application it was nowhere to be seen. If like me you want to use Posterous, do not despair, it is just a matter of configuring the "Custom" service. Here is what you need to do:

  1. In Tweetbot, open the Settings (at the bottom of the navigation bar on the left hand side).
  2. Under account settings, tap your username and tap in either the "Image Upload" to "Video Upload" (changing one will make the service available in the other).
  3. Scroll to the bottom of the menu and select "Custom"
  4. You will be asked to enter an API endpoint, enter one to the two following options:
  • https://posterous.com/api2/upload.xml
  • https://posterous.com/api2/upload.json

And you are ready to go! Please note that this assumes that you already have a Posterous account and that it knows about your Twitter identity. If it doesn't, Posterous will create a new account for you. For more info about the API, visit this page.

Read me...

Saving screenshots with a useful name (Mac)

Are you a Mac user? Do you end up taking one, two, thousands of screenshots? Well, then you would agree with me when I say that having them automatically saved with a long and not very useful name is a bit of a pain.

It would be great if you could have a choice over the name used to save your screenshots. Well, there is! Here is how:

Open a terminal and type the following (using your own useful text string in between the quotes):

defaults write com.apple.screencapture name "A useful name"

Then, restart the user interface system with the following command:

killall SystemUIServer

And that is it! Enjoy!

Read me...

Working Collaboratively Online: Wunderkit and Hojoki

Working collaboratively is nothing new... The challenge of continuing doing so with an ever-increasing use of online tools could definitely make life much easier. However, should you not be careful, you can quickly get a large number of new accounts in services that only a few people use.

You can tackle collaborative work using things such as email, but I am sure that you can agree with me that by the second iteration of doing and undoing tracked changes (once you have managed to convince other to track them that is...) becomes a bit tiring. In that respect, tools such as google docs have a distinctive advantage.

More recently I have come across a couple of new takes on the subject, one is Wunderkit and the other one is Hojoki. I started having a look at both of them, so this post is more about first impressions rather than fixed recommendations. Should you have any views on this, please do let me know.


This platform is brought to us by 6 Wunderkinder, a Berlin startup that also created Wunderlist (which is a good to-do application). Wunderkit lets you create projects that then can be shared with other people. The application lets you connect to Twitter and Facebook. Your contacts are treated in a similar way to followers in Twitter and you can invite contacts to your projects. You are supposed to be able to discover other people, but I must admit that the process was a bit cumbersome.

Once you have created a project and invited some people, your followers can post messages, comment on tasks, setup discussions and send status updates. A very interesting aspect of Wunderkit us that it includes some applications that can be very useful:

  • A progress tracker: You can easily see that is the status of the project and can easily see what peopler have been discussing as well as the activities that your collaborators have been working on.
  • A to-do list: The to-do list lets you set up tasks and lists. You can assign these tasks to specific members and setup due dates. I wish they could synchronise these lists with Wunderlist.... but never mind.
  • A notepad: This is a useful addition to the task lists ass you can add ideas, notes, scripts, etc, to your project.

Another useful thing about this application is the fact that not only does it live in the web, but the 6 Wunderkinder have created mobile applications that let you take your projects and lists with you. They also have a desktop application, but it seems that currently these additions are only for Apple devices. The accounts are free and should you need more support you can get a pro account. So far so good.


Tho other tool I wanted to talk about is Hojoki. Hojoki is also the creation of a German team and the prospect is a very interesting one. The main premise of the application is the accessibility, in a single place, of a number of existing outlets you already use: Dropbox, Google Docs, Github, Highrise, Mendeley, etc... Once you connect your different services, Hojoki creates a single feed that gets updated as soon as team members create actions such as saving or creating files, submitting updates, etc. It also integrated with Twitter, but should you be following a lot of people, this can be a bit too much! You can also setup workspaces and

It is a good idea and it exploits the cloud features of many applications. Currently it only works from a web browser although they say that a mobile app is a bit of a work in progress. Accounts are also free.

Well, all you have to do now is give them a go and let me know what you think. We might even be able to start a project using one of these tools.

Read me...

Uploading videos to Vimeo

Now that you have created your videos with either your PC or your Mac, you are ready to share them with the world. I find Vimeo very easy to use and quite flexible in terms of content, size of files and things of that sort. In this video I show you try quickly how to create an account and how to upload your masterpiece.

As usual, let me know what you think.


Read me...

Videocasting with a Mac

Continuing with the subject of capturing video, in this tutorial we will cover some tools to capture your screen using a Mac. The tools are Quicktime Player, MPEG Streamclip and iMovie. These tools either come with Mac (Quicktime and iMovie) or are available from the web.

Enjoy and keep in touch!



Read me...

Videocasting with a PC

Talking to some people about screen capturing and video tutorials, I came across the fact that, although there is some interest in the activity, there is the idea that you need sophisticated tools to create even the simplest video presentation.

In this video I show how some simple videos can be produced by capturing screenshots using a PC with windows installed. The tools that I use are CamStudio and Freemake Video Converter, which are readily available in the web.

As usual, any comments are more than welcome. Enjoy!


Read me...

Structured Documents in LaTeX

The LaTeX logo, typeset with LaTeX

Continuing with the brief introduction to LaTeX that I posted recently, in this video I discuss the use of LaTeX to produce a document that has a structure similar to that of a book for example. The idea is to build a master file that controls the flow of the document and separates each "Chapter" in separate files. This provides the author with a lot of flexibility in terms of organising content and makes large documents far more manageable than when using a single LaTeX file.

Enjoy and any feedback, comments or suggestions are more than welcome.


Read me...

Using LaTeX to write mathematics

I have been meaning to do something like this for a long time and finally got the courage to do it. A lot of times I get completely horrified by the way in which some documents that contain mathematical notations are mangled (quite literally) by using MS Word. It helps sometimes that some people have access to MathType but still...

LaTeXSo, in this video I intend to provide some help to those that are interested in using LaTeX to include mathematics and  produce their documents. LaTeX is freely available for various platforms. You can obtain MikTeX for  Windows here, and MacTeX for Mac here. There are a great variety of editors to choose from; in this video I recommend TeXmaker, which I believe provides quite a lot of help to those of us that still are attached to the pointing and clicking of MS Word.

Let me know what you think! Any feedback is always welcome.


Read me...

iBook Author

So, I just found out about Apple announcing rp_ibookauthor.jpgiBooks Author which according to the information they provide  "is an amazing new app that allows anyone to create beautiful Multi-Touch textbooks" and is a free download from the Mac App Store.

Installation was not too slow, considering that perhaps lots of other users were doing exactly the same. I had a quick go at  selecting a template and it really seemed to be quite straightforward to use. It does look like a combination of Pages and Keynote. I will have to play more with it, but something that I did find disappointing was the lack of support to handle mathematics. I am not after LaTeX (I already use that quite a lot), but it would be nice to be able to handle equations natively. I do hope someone at Apple is reading!


Read me...

Access your Library folder in Mac OS Lion

library1I am not entirely sure I do think of OS Lion as the best operating system ever. It does have some nice features, but it also has some annoyances. One of them is the seemingly missing Library folder.

As a matter of fact, the folder is not missing, but by default, Apple now hides this folder to prevent users from messing up with it. But do not despair there are ways to get to this folder, either temporarily or on a more permanent basis. Here is how:

Temporary access:

  1. In finder, access the GO menu
  2. You can make the Library folder visible by hitting the Option Key
  3. And that's it, you can now open the Library folder

Permanent access:

  1. Open up a Terminal
  2. Type the following command: chflags nohidden ~/Library
  3. If you want to undo this, simply type: chflags hidden ~/Library


Read me...

Apple Knowledge Navigator

In 1987 Apple released this video about a hypothetical devices called  Knowledge Navigator. This can be seen as the idea behind Siri, the personal assistant recently announced by Apple.

Read me...

Lion and Air Display don't like each other

vote symbol: information
Image via Wikipedia

I am generally quite happy with using a Mac and things seem to be going quite well with my machine. Nonetheless, I could not resist upgrading my operating system from Leopard to Lion... after all, Apple markets is as "the most advanced desktop operating system". The update itself happened without a glitch, but the machine seemed to have become more sluggish. I assumed it was the number of applications that I had installed and the fact that some of them, such as Maple 9.5 and the version of PhotoShop that I had relied on the usage of Rosetta to work. I got rid of the newly obsolete software, but this did not sort the issues.

One of the more annoying issues, even more than the lack of malleability in Launchpad, was the very insufferable fact that the screensaver acquired a mind of its own: it would just spring into action on its own even when I was typing or using the mouse... After searching for a solution, the only thing that worked was to turn the screensaver off... Now, this is not ideal. But now I think I have found the answer: the problem was the limitation that Air Display has when installed in Lion.

Avatron, the makers of Air Display (a screen extension software) know about this and although they mentioned that only certain models are affected, I found that as soon as I got rid of Air Display not only my machine did not run into troubles with the screensaver but also woke up from the horrendous sluggishness it had been suffering.

How to uninstall Air Display:

  • Go to Applications -> Utilities
  • Run the "Uninstall Air Display"
  • The machine will automatically re-start
  • Et voilà
Read me...

Furigana (ふりがな) in Mac

Furigana example
Image via Wikipedia

First of all, I guess I must explain what ふりがな (furigana) are. Japanese uses characters of Chinese origin called Kanji. Because of the way they have been adopted into Japanese, a single character is more often than not used to write a variety of words and this means that the kanji acquires different ways to be read depending on the word.  Deciding which reading is meant depends on context, intended meaning, the use in conjunction to other kanji, etc. The readings are usually categorised as either onyomiー音読み (literally, sound reading) or kunyomiー訓読み (literally, meaning reading).

So, what about these furigana? Well, since the reading of kanji can get a bit tricky when you are learning to read them, sometimes small hiragana are used to indicate the phonetic reading intended (see the picture above).

Furigana are commonly used for children, who might not recognise kanji, but are able to read the word when written in hiragana. It is also common to see them used in textbooks for learners of Japanese as a second language. Japanese adults make use of them on words written in uncommon or difficult-to-read kanji.

About a month ago I had to prepare a speech to be given in my Japanese lesson and I had the idea that it would be great to add furigana to my script. But how do you place furigana along your sentences? Well, here are some instructions to  add furigana to kanji in Word 2011 for Mac (I also managed to do it in LaTeX, but I will create a separate entry to explain that - if you need this info, please contact me and I will be happy to help):

Under the Microsoft Office folder in you Applications directory, there must be a folder called "Additional Tools". Inside this folder there is a directory called "Microsoft Language Register", open the "Microsoft Language Register" application that lives there and select "Japanese" from the dropdown menu and click OK. What this does is to enable some advanced features such as furigana writing, vertical text and character combination when using Japanese.

Open Word and start typing something in hiragana. You can convert the text into kanji by hitting the spacebar. Here is where the magic comes: highlight the kanji that needs a furigana entry, click "Format" in the menu bar (at the top) and select "Phonetic Guide" and there you go!


UPDATE: I finally created a post about using furigana in LaTeX. Find the post here.

Read me...

Mathematically inclined CAPTCHA

Early CAPTCHAs such as these, generated by the...
Image via Wikipedia

I'm sure you have encountered CAPTCHAS before. You might not know them with that name, but they have become a familiar feature of many websites. So, you want to book some tickets for a gig of your favourite band? Do you want to sign up to a new social network? Or simply interested in recovering your lost password? Well, you are more than likely to have used a CAPTCHA.

A CAPTCHA is a way to identify that the request to the services mentioned above (and many others) is not generated by a computer. This usually asking the user to complete a simple test for a human being but harder to replicate by a computer. One such task is character recognition. The text is supposed to be so distorted that a computer might have trouble identifying them, nonetheless a human being would be able to solve the problem in a very straightforward manner.

Recently this has been put to a good use with the use of reCAPTCHA, which is a service that helps digitise printed material. In many occasions the quality of some words is not good and therefore OCR (Optical Character Recognition) software struggles. However, many CAPTCHAS are solved by humans every single day and this is a resource that reCAPTCHA is chanelling. The idea is to send words that the computer is having problems identifying. So, if the computer cannot do it, how does the system know that you have given the correct answer???

Well, you are provided with two words one known and the other one is the word that needs resolving. If the answer for the known one is correct the system assumes that the second one is also correct. The key is that you don't know which word is which. If many people are providing the same answer to that unknown word, then it is highly likely that it has been identified.

All of this is great, but what is the connection with the mathematically inclined CAPTCHA. Well, recently a friend of mine came across the following CAPTCHA. That is an excellent way to prove that you are not a bot, and that you are definitely a geek! Well done!

Maths Captcha


Read me...

Pretend you are Hercule Poirot...

Hercule Poirot explains how it all happened
Image by elena-lu via Flickr

I really enjoyed this error message in a LaTeX file that my office mate came across yesterday while preparing some slides. Usually the errors are quite obvious so there is no need to check the console. This time it was something a bit obscure... so much so that LaTeX suggested:

Pretend that you're Hercule Poirot: Examine all the clues, and deduce the truth by order and method.

Great! Where is my hat and fake moustache???




Read me...

Pac-Man animated with humans

The Original Human PAC-MAN Performance by Guillaume Reymond

Read me...

How to resize a window in Mac

It may sound a bit strange to have a post about how to resize a window. "You just select the lower right-hand side corner and drag it!" I hear you say. But what happens when the window is so tall that you can’t even get to the resize handle? Well, here is the answer:

t may not be obvious, but the solution is right there in front of you! Clicking the green zoom (+) button on the toolbar of any window will automatically resize it to best fit your current screen resolution!

Read me...

Downgrade iPhone 3GS from iOS 4 back to 3.1.3

I have been very happy with the performance of my iPhone, but I could not help noticing that after upgrading the 3GS to iOS 4, the phone not only slowed down, but effectively stood still. Not really what you want when you are in need of getting directions, finding the name of that actor in that film, or simply making a phonecall. So, if you are in that boat, here is a recipe to downgrade your device and recover some functionality! You will need the following ingredients:

  1. iPhone 3GS
  2. iTunes
  3. Cable to connect iPhone to iTunes
  4. A copy of iOS 3.1.3
  5. RecBoot
  6. Some patience

Preparation 1 - Get iOS 3.1.3 ready

This sounds like a tricky one, but do not panic, it might well be that you do indeed have a copy of the iOS available in your hard drive, check in:

~/Library/iTunes/iPhone Software Updates

On Windows, your iPhone OS updates should be stored in:

C:Documents and Settings[username]Application DataApple ComputeriTunesiPhone Software Updates

If you see a file inside this folder corresponding to




those are likely the restore images you need.

If you don't see anything that resembles the 3.1.3 OS or you just want  a freshly downloaded one,  iClarified has a list of iPhone firmware files. Just find 3.1.3 for your phone and download it to a place in your hard drive that you can remember.

Preparation 2 - RecBoot

Later on in the process, you will need RecBoot to be able to tell iTunes to free your iPhone after downgrading. You can download it here (available for Mac and Windows).

Preparation 3 - Put your iPhone into DFU mode

You need to put your iPhone into Device Firmware Update (or DFU) mode in order to downgrade to 3.1.3., here is how:

  1. Plug in your iPhone.
  2. Power it down by holding the sleep/lock button at the top and sliding to power off.
  3. Once it's powered down, press and hold both the sleep/lock button and the home button for ten seconds.
  4. After ten seconds, release the power button but continue holding down the home button.
  5. If you did it right, iTunes will pop up a window telling you that it's detected an iPhone in recovery mode and your iPhone's screen will be black. If it didn't work, start from the beginning and try again.

Preparation 4 - Downgrade to 3.1.3

It is now time to do the downgrading. Dismiss the iTunes alert that told you you're in recovery mode. Select the iPhone in the iTunes sidebar

  1. Hold Cmd and click the Restore button
  2. iTunes will pop up a window prompting you to choose a file. Navigate to the location of the 3.1.3 OS file you obtained in preparation 1.
  3. Select that file, and iTunes will start the OS restore process. You will now use the bit of patience as thos takes a few minutes
  4. When it's finished, you'll  receive an error message  and your iPhone will boot up with a "Connect to iTunes" screen.

Preparation 5 - Recovering the iPhone

This is where RecBoot becomes useful. Open RecBoot, and click "Exit Recovery Mode". After a few seconds the software should prompt your iPhone to leave the plug-me-into-iTunes mood  and there you go, you have a freshly downgraded iPhone device!

Serve cold and enjoy!

Read me...

Get rid of Ctrl-M characters

carriage return
Image by holeymoon via Flickr

Have you found yourself opening an old text file in your shinny new Mac and strange characters such as ^M appear all over the place? Do not panic, nothing that this post would not help to fix.

Those strange characters come from the format in which different operating systems encode things like carriage returns at the end of line.  Some  applications won't recognise the carriage returns and will display a file as a single line, interspersed with Ctrl-M characters.

In Mac OS X, the situation is more complicated given that it is a flavour of Unix itself. In some cases text files have carriage returns and in others they have new lines. For the most part, classic applications still require text files to have carriage returns, while the command-line Unix utilities require new lines (aka line feeds). Mac OS X-native applications are usually capable of interpreting both.

There are many ways to resolve the differences in format. There are some Unix command line utilities such as trawk and Perl to do the conversion. From Mac OS X, each can be accessed from the Terminal application. In order to understand some of the syntax used below, it is important to mention that Unix character sequences to identify different types of "spaces"

  • r: CarriageReturn
  • n: New Line
  • t: Tab
  • v: VerticalTab
  • f: FormFeed
  • b: BackSpace


The Unix program tr is used to translate between two sets of characters. Characters specified in one set are converted to the matching character in the second set. Thus, to convert the Ctrl-M of a Mac OS text file to the new line (Ctrl-j) of a Unix text file, at the Unix command line, enter:

  • tr 'r' 'n' < macfile.txt > unixfile.txt

Here, r and are special escape sequences that tr interprets as Ctrl-M (a carriage return) and Ctrl-j (a new line), respectively. Thus, to convert a Unix text file to a Mac OS text file, enter:

  • tr 'n' 'r' < unixfile.txt > macfile.txt


To use awk to convert a Mac OS file to Unix, at the Unix prompt, enter:

  • awk '{ gsub("r", "n"); print $0;}' macfile.txt > unixfile.txt

To convert a Unix file to Mac OS using awk, at the command line, enter:

  • awk '{ gsub("n, "r"); print $0;}' unixfile.txt > macfile.txt

On some systems, the version of awk may be old and not include the function gsub. If so, try the same command, but replace awk with gawk or nawk.


To convert a Mac OS text file to a Unix text file using Perl, at the Unix shell prompt, enter:

  • perl -p -e 's/r/n/g' < macfile.txt > unixfile.txt

To convert from a Unix text file to a Mac OS text file with Perl, at the Unix shell prompt, enter:

  • perl -p -e 's/n/r/g' < unixfile.txt > macfile.txt

I hope this helps.

UPDATE - Thanks to D Asirvadem for some comments and corrections. He adds in regards to "In Mac OS X, the situation is more complicated given that it is a flavour of Unix itself":

Yes and no. I would not put it that way. The problem only happens when you import a text file from an older variant of Unix (eg AIX) to a newer variant of Unix (eg. MacOS or Linux RHEL). It is no more complicated on MacOS.
Enhanced by Zemanta
Read me...


You surely must be familiar with the concept of an operating system (OS) for your computer. How has not heard of Windows and its different incarnations? Windows 95, Windows 98, Vista or Windows 7? What about Mac OS X - Panther, Snow Leopard or Lion? And what about Linux and its different distros?

If you are a long term user of Microsoft products you are thus well aware of the moodiness of the operating system, blue-screen-of-death and the endless restarts after a seemingly infinite number of updates. Well, you would not be alone if you were to start antropomorphising your favourite OS, and if you are a Japanese user it doesn't take long for you to start creating manga characters for them, i.e. OS-tans.

Why OS-tan? Well, you probably have heard the Japanese suffix -san (ーさん) used at the end of someone's name. It is an honorific suffix and roughly translates as Mr, Mrs, Miss, or Ms. If you want to be more familiar with someone and what to show that the person is close to you, you might use the diminutive honorific suffix "-chan" (-ちゃん). A common childish mispronunciation of this suffix is "-tan" (-たん) and thus the meaning of OS-tan becomes clear.

OS-tans are personifications of various operating systems, which started with the  common perception of Windows Me as unstable and prone to frequent crashes. Discussions on Futaba Channel likened this to the stereotype of a fickle, troublesome girl and so Me-tan was born. The characters are usually represented by girls, although some male OS-tans exist. In particular the OS-tans for the different  Windows versions are represented by sisters of various ages.

For instance, XP-tan is a dark-haired girl with ribbons in her hair and an "XP" hair ornament typically worn on the left side. Windows XP is criticised for bloating a system and being very pretty without being as useful. Additionally, as a reference to the memory usage of Windows XP, she is often seen eating or holding an empty rice bowl labeled "Memory".  Windows 7 is represented by a character called Nanami Madobe (窓辺ななみ Madobe Nanami). The premium set of the OS  includes a Windows 7 theme featuring 3 Nanami wallpapers, 19 event sound sets, CD with 5 extra Nanami sounds. This makes it the first OS-tan marketed by the company producing the operating system. In addition, the character also got its own Twitter account and Facebook page.

The Mac OS X girl is often portrayed as a catgirl, following with the Apple "wild cat" naming tradition, she wears a platinum white coat and a wireless AirPort device fashioned as a hat. In the Linux case, sometimes a penguin is used as a reference to Tux, but there is also the image of a girl with helmet and flippers. Her helmet usually has horns on it, likely a reference to the GNU software which comprises the common system programs present in nearly all Linux distributions.

There are many more OS-tans and there are even mangas and animations featuring the characters, including supporting ones such as Dr Norton, Firefox-tan and Opera-tan. A list of OS-tans can be found here. So next time your system crashes or you need an extra driver, you can always think of the OS-tan behind the machine.

A group of OS-tan. Background, left (clockwise...
Image via Wikipedia
Read me...

Japanese chiisai characters in Katakana

Ki (kana)
Image via Wikipedia

Writting the  "chiisai" Japanese characters when using Hiragana(ひらがな)is quite straight forward. Chiisai (小さい) means small, and it helps to make a different sound by combining a character with a chiisai one.

For instance to write ちょっと, you have to type the following keys: "chotto", which produces the "chiisai yo" and "chiisai tsu" needed to write this Japanese word.

However, it seems that when using Katakana (カタカナ) things become a bit complicated, so here is how to produce the chiisai characters:

ッ (Katakana)

xtu (key sequence)

キャ キュ キョ (Katakana)
kya kyu kyo (key sequence)

シャ シュ ショ (Katakana)
sha shu sho (key sequence)

チャ チュ チョ (Katakana)
cha chu cho (key sequence)

ニャ ニュ ニョ (Katakana)
nya nyu nyo (key sequence)

ヒャ ヒュ ヒョ (Katakana)
hya hyu hyo (key sequence)

ミャ ミュ ミョ (Katakana)
mya myu myo (key sequence)

リャ リュ リョ (Katakana)
rya ryu ryo (key sequence)

ギャ ギュ ギョ (Katakana)
gya gyu gyo (key sequence)

ジャ ジュ ジョ (Katakana)
ja ju jo (key sequence)

ビャ ビュ ビョ (Katakana)
bya byu byo (key sequence)

ピャ ピュ ピョ (Katakana)
pya pyu pyo (key sequence)

Exceptional character.
These are basically used only for technical words.
[Katakana - (key sequence)]
ウィ (uxi)
クァ (kuxa)
クィ (kuxi)
クェ (kuxe)
クォ (kuxo)
ティ (texi)
フュ (fyu)
ディ (dexu)
ヴァ (va)
ヴィ (vi)
ヴェ (ve)
ヴォ (vo)

I came across this trick in Yahoo answers.

Read me...

Quick switching for Kana syllabaries in OS X

If you type things in Japanese, you have quite likely come across the problem of switching between Hiragana (ひらがな) and Katakana (カタカナ).
Japanese syllabary, Hiragana あ (A) .
Image via Wikipedia
You can switch between Hiragana and Katakana by holding down Shift while typing.

As an example, typing in  わたしはヘススです。requires Hiragana, Katakana, and Hiragana again. The quickest way to write this sentence starting from the US English Keyboard would be as follows (assuming you enabled the Command-Space Bar switching method:

  1. Press Command-Space Bar.
  2. Type the following keys: "watashiha ".
  3. Hold down the Shift key to switch to Katakana, then type "HESUSU ".
  4. Let go of Shift to switch back to Hiragana, and type "desu."
Read me...

Keyboard Shortcut for "Save as PDF..." in MAC OS X

I have been wondering about the possibility of having a shortcut to "Save as PDF" in OS X. I came across  a hint at MacOSXHints. And even better than that, a walkthrough version in this link.

Read me...

Deleting E-mail addresses from MacMail autocomplete list

Like many other emailing applications Mac OS X Mail is pretty good at remembering the email addresses of people you usually communicate with. As soon as you start typing it in the To: field of an email, a few suggestions will come up. It is pretty useful expect when you want to get rid of those addresses that you don't use very often.
Mail icon
Image via Wikipedia

Fortunately, there's a way to delete old (or unwanted) addresses from the auto-complete list in Mac OS X Mail. The new address will be remembered automatically, and soon the auto-complete feature is as useful as ever again.

Delete an Email Address from Auto-Complete in Mac OS X Mail

To remove an email address from the auto-complete list in Mac OS X Mail:

  • Start typing the recipient's address or name in a new message.
  • Select the desired address from the auto-complete list as if you'd compose an email to them.
  • Click the small down arrow in the recipient.
  • Select Remove from Previous Recipients List from the menu.

You can also search for the unwanted address directly in the previous recipients list:

  • Select Window | Previous Recipients from the menu in Mac OS X Mail.
  • Highlight the address you want to remove.
    • You can highlight multiple addresses by holding down the Command key.
  • Click Remove from List.

Clean up Mac OS X Mail's Auto-Complete List

To clean up or empty the auto-complete list of previous recipients' addresses in Mac OS X Mail:

  • Select Window | Previous Recipients from the menu.
  • Click on the Last Used header so the arrow points downward.
  • Make sure no entry is highlighted.
  • Hold down the Shift key.
  • Click on an address last used a year ago.
    • Of course, you can choose a different interval and select all addresses not used in the past month, for example.
  • Verify all entries not used in the last year are highlighted.
  • Click Remove From List.
Read me...

Really great day to Bletchley Park @bletchleypark. Worth paying a visit!

Really great day to Bletchley Park @bletchleypark. Worth paying a visit!

Read me...

NASA Explores Semantic Search

I came across a news article about NASA using technology from Google and Smartlogic to perform semantic searches of its manned space-flight program.

Smartlogic is a UK based company and the software that NASA is using is called Semaphore which retrieves data semantically; the data is organised semantically and the search is done by parsing each sentence of the query to obtain its meaning.

The original article can be found here.

Read me...

A flash version of HELL

A few days ago the great xkcd published a version of Hell based on the famous Tetris game... except that the bottom of the game is not flat, but curved.

I found this amusing and couldn't help tweeting about it, and I was told by a friend that if anything this "only shows our age more than anything else :(".

I guess he is right, but having seen a new flash version of this Helli-sh game is brilliant and goes to show that there are other geeks out there who are also proud of showing their age.

Let me know if you manage to score :)

hell tetris
Read me...

Programming Languages

I remember the first time I had the opportunity to program a computer. As you might imagine it was nothing too complicated, after all it was the first time I did anything like that. It was a simple programme of the "Hello World!" type. Written in BASIC (aka Basic All-Purpose Symbolic Instruction Code) it was a programme that printed the sequence of numbers from 1 to 10. Pretty neat, but not very useful. Since then I had a go at a number of programming languages, scripts and tools, going from COBOL and Pascal to C++ and Python.

When people ask me about my favourite programming language, I tend to reply with another question: "What for?". I sincerely believe that there is no such thing as the perfect programming language, and it all the depends on what it is that you need your computer to do. I mean, you would not bang a nail with a spanner, you would rather use a hammer for that. Of course, there is no question about the possibility of using the spanner for that particular task, but you would find that doing so has advantages (it's the tool you already know) and disadvantages (the tool is not designed with that particular purpose in mind).

There is a plethora of programming tools and some of them have been around for years, either because they are indeed very well designed for their purpose, or because the amount to underlying programmes and functions written with them is so overwhelming that it is easier to maintain them alive. Some other languages are more recent and I am sure that some of them will stand the test of time... but not all of them.

Very recenlty, TIOBE Software released their April index ranking the most popular programming languages. They show that the reliable C language is back to number 1. I was not totally surprised by this, I always thought that the popularity of the language would place it among the first 5 top places, along with C++ and Java. What I did not expect to see what to find MATLAB in number 18.

The index is updated once a month. The ratings are based on the number of skilled engineers world-wide, courses and third party vendors. The definition of the TIOBE index can be found here, and the first 20 places are listed below:

Apr 2010
Apr 2009
Delta in Position Programming Language Ratings
Apr 2010
Apr 2009
1 2 C 18.058% +2.59% A
2 1 Java 18.051% -1.29% A
3 3 C++ 9.707% -1.03% A
4 4 PHP 9.662% -0.23% A
5 5 (Visual) Basic 6.392% -2.70% A
6 7 C# 4.435% +0.38% A
7 6 Python 4.205% -1.88% A
8 9 Perl 3.553% +0.09% A
9 11 Delphi 2.715% +0.44% A
10 8 JavaScript 2.469% -1.21% A
11 42 Objective-C 2.288% +2.15% A
12 10 Ruby 2.221% -0.35% A
13 14 SAS 0.717% -0.07% A
14 12 PL/SQL 0.710% -0.38% A
15 - Go 0.710% +0.71% A
16 15 Pascal 0.648% -0.07% B
17 17 ABAP 0.625% -0.03% B
18 20 MATLAB 0.616% +0.13% B
19 22 ActionScript 0.545% +0.09% B
20 19 Lua 0.521% +0.03% B

Other programming languages

Well, where does this index place some of the languages that I have used at some point?; here we go: C, C++, VB, Python, Java, Pascal, MATLAB and Perl are all in the first 20 places.

Bourne Shell (26), COBOL (29), Fortran (34 - although they do not mention what flavour: 77,95, etc), Prolog (43 - is anyone using that for anything? seriously?), VBSpcript (50) are all in the first 50 places. They also list (in no particular order) numbers 51 to 100, including: LabView, Maple, Mathematica, R and SPSS.

Curiosities (or are they?)

Some of you, dear readers, might say that a lot of the languages are not really programming languages. A friend of mine rejected, for example, the idea of MATLAB as a programming language.

"Surely all scripting languages are programming languages, but not all programming languages are scripting languages" I hear you say. Well, as it was pointed out by another friend of mine: "If you really want to hurt yourself look at 'Root'" - a framework developed in 1994 by CERN, which has a scriptable command-line C++ interpreter! Really!

For the hardcore programmer in you, there are some interesting languages out there to have a look at and definitely play with. For example there is Whitespace (it seems that the original link is dead now, please check a Wayback page here) which, unlike any other programming tool, ignores any non-whitespace characters. Only spaces, tabs and linefeeds have meaning. You can see an example here. In a similar fashion, Brainfuck considers only eight commands in the language, namely: > < + - . , [ ] You can see an example here.

Now, if you really want to see how the text messaging culture has made it into the "Hello World!" of computer programming, look no further than LOLCODE, whose commands are expressed in lolcat and as you can imagine, the language is not clearly defined in terms of operator priorities and correct syntax (LOL!). Here is an example:

     O NOES

Other commands include "I HAS A variable", "variable R value" and "BTW" to denote comments!

Honestly, what next?...

Read me...

Screenshot in Macs

Here are some useful commands to take snapshots of the screen in a mac:

  • Command-Shift-3: Take a screenshot of the screen, and save it as a file on the desktop
  • Command-Shift-4, then select an area: Take a screenshot of an area and save it as a file on the desktop
  • Command-Shift-4, then space, then click a window: Take a screenshot of a window and save it as a file on the desktop
  • Command-Control-Shift-3: Take a screenshot of the screen, and save it to the clipboard
  • Command-Control-Shift-4, then select an area: Take a screenshot of an area and save it to the clipboard
  • Command-Control-Shift-4, then space, then click a window: Take a screenshot of a window and save it to the clipboard

In Leopard, the following keys can be held down while selecting an area (via Command-Shift-4 or Command-Control-Shift-4):

  • Space, to lock the size of the selected region and instead move it when the mouse moves
  • Shift, to resize only one edge of the selected region
  • Option, to resize the selected region with its center as the anchor point
Read me...

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.