The percentage of excellence in book – reblog

I recently asked the guys at the Data Science class at GA to bring a good example of a “daft graph” and they all did that with gusto. As usual, there was a certain cable news channel that was mentioned a lot of times for their misleading use of graphics.

However,  there was one example that an attendees sent from a blog by Matt Reed and it was so great that I could not help re-posting the blog here. The original can be seen in this site.


The percentage of excellence in books…

1919 – Extracts from an Investigation into the Physical Properties of Books as They Are At Present Published. The Society of Calligraphers, Boston.
This is a small pamphlet that was designed and authored by the graphic designer W.A. Dwiggins  and his cousin L.B. Sigfried. It pilloried the format of books and his concern for the poor methods of printing trade books in the US at that time.
The book was published by the imaginary  Society of Calligraphers and the stinging investigation was a hoax cooked up by Dwiggins – nevertheless it did have an effect on publishing in the US following its wide distribution.
The graph by Dwiggins shows the reduction in book quality since 1910.

Gartner’s Magic Quadrant for Analytics Platforms is out

The Business Intelligence and Analytics Platform report by Gartner is close to the heart of many people interested in the analytics market and it reflect the changes and new tools that people are and will be using.

The so-called Magic Quadrant is a concise summary of the position of various players in the BI/Analytics space and it makes for good marketing, particularly for those in the Leaders-Visionaries quadrant. A particular mention goes to Tableau which has been there for three years in a row. The “ability to execute” has positioned them quite high in the quadrant, followed by Qlik. The usual suspects such as IBM and SAS are still there, and it is interesting to see Microsoft there too, one may assume taking some of the space that Revolution Analytics would have used…

You can read the report here.

Gartner Magic


Lessons learned from teaching an 11-week data science course

I came across this blog post by Kevin Markham where he describes his experiences and thoughts about having taught the 11-week data science course with General Assembly. He has managed to capture some of my own thoughts with the course that I ran in London and I agree with the points he makes. The use of Python as the main language for the course went really well, and although there was the nagging itch of implementing some things in R, I think it paid off in the long run. We also used Anaconda as the recommended distribution and although some were happy with using other distros, in general it made things easier for the availability of packages and their usage.


DataKevin mentions the need for “more concepts than maths” and I agree with this view. I think it is important to discuss some of the maths, but given that the mix of students includes a wide range of abilities, making emphasis in the concepts is far more important. Having said that, I think it was important to make this clear from the beginning. I also tried to provide enough references and reading material for those more inclined to delve into the maths.

I had a session for APIs and databases (mySQL and MongoDB) and I am glad I did that as this enabled the students to consider that part of the project lifecycle. Kevin (and Alessandro Gagliardi – in the comments) comment on their experiences of including some NLP in their courses and that seems to be a good idea to consider. In my case I had a session on social network analysis, and although it was the basics, I think the students really enjoyed it.

As for visualisation, I had an invited speaker from Tableau and then worked with the tool for the rest of the class. I think it went well, and perhaps the only issue is that it came way to early in the course. One thing that was emphasised from the start was the project and even when the students were constantly reminded about this, it still was difficult to get the projects “finished”. I put that in quotes as I believe that the main purpose of the project is to get the students started, particularly as it is hard so say when a project is actually done… there is always more things to tweak or do!

What features should I implement in an interactive dashboard?

I came across this post by Andy Cotgreave about features to use in an interactive dashboard as those created in Tableau.

He deconstructed a dashboard (a bit meta there, right?) to quantify the Impact vs Difficulty of each of the design choices you can potentially make. You can see a screenshot below, but you can play with the interactive version in the link above.

Dashboard Features London

Yesterday I had the chance to attend the first conference in London. It was a fully packed day with lots of interesting speakers and fun people. The variety of the talks was quite good and most of the presentations were very well prepared. I was surprised at the bad use of video in a couple of the talk in the morning session, but apart from that it was all very good.

I ended up winning a print and it is not decorating one of the walls at home. You can see a picture at the end of the gallery below. The conference tool place at Protein in the heart of Hipsterland (aka Shoreditch) and it was a well attended event.

I particularly enjoyed the talk by David McCandless who turned out to be the mystery guest. Similarly, the presentation by Pascal Raabe about memories was very good and inspiring. Another good presentation was the “smelly” talk given by Kate McLean.

Andy Kirk gave a view about the Design of Time and you can see the slides here.

If you are interested in seeing what twitter was saying before, during and after the conference, check this page.

Finally, the conference was at Eventfire archived here, and I am surprised to see that I was the top contributor according to them! :D

What is data science?

Venn Diagram

Talking to some friends from General Assembly, I ended up being asked to provide a brief quote about what data science is and given the short amount of time to think about the question I ended up with the following:

“Data science and analytics are rapidly gaining prominence as some of the more sought after disciplines in academic and professional circles. In a nutshell, data science can be understood as the extraction of knowledge and insight from various sources of data, and the skills required to achieve this range from programming to design, and from mathematics to storytelling.”
I am convinced there is more to it than the above lines, but I was asked for a small quote. Anyway, what do you think?

The Infinite Jukebox

I just saw this website and I simply had to share it with you: The Infinite Jukebox.

The idea of the application is to generate a “never-ending and ever changing version of any song”, and this is done in a very engaging and entertaining way. You can upload your own track, which in turn is uploaded The Echo Nest, where it is decomposed into individual beats.

The beats of the song get analysed and matched to similar bits in the same song; the result is presented in a chord diagram and as the song is played the paths that join similar sounding beats come into play and make the song to brach out to a completely different part of the song. Enjoy!

infinite juke box


Science is beautiful exhibition

When I first heard about the plans that the British Library had about an exhibitions called Science is Beautiful I got very excited. I did even make an entry in my diary about the date that it was planned to be opened. Closer to the time I even encourage Twitter followers and colleagues to go to the exhibition.

lorence Nightingale's "rose diagram", showing the Causes of Mortality in the Army in the East, 1858. Photograph: /British Library
lorence Nightingale’s “rose diagram”, showing the Causes of Mortality in the Army in the East, 1858. Photograph: /British Library

The exhibition promised to explore how “our understanding of ourselves and our planet has evolved alongside our ability to represent, graph and map the mass data of the time.” So I finally made some time and made it to the British Library today… the exhibition was indeed there with some nice looking maps and graphics, but I could not help feeling utterly disappointed. I was very surprised they even call this an exhibition, the very few images, documents and interactive displays were very few and not very immersive. Probably my favourite part was looking at “The Pedigree of Man” and the “Nightingale’s Rose” together with an interactive show. Nonetheless, I felt that the British Library could have done a much better job given the wealth of documents they surely have at hand. Besides, the technology used to support the exhibits was not that great… for example the touch screens were not very responsive and did not add much to the presentation.

Sadly I cannot really longer recommend visiting the stands, and I feel that you are better off looking a the images that the Guardian has put together in their DataBlog, and complement with the video that Nature has made available. You can also read the review that Rebekah Higgitt wrote for the Guardian.

Enhanced by Zemanta