Random thoughts about random subjects… From science to literature and between manga and watercolours, passing by data science and rugby; including film, physics and fiction, programming, pictures and puns.
The Business Intelligence and Analytics Platform report by Gartner is close to the heart of many people interested in the analytics market and it reflect the changes and new tools that people are and will be using.
The so-called Magic Quadrant is a concise summary of the position of various players in the BI/Analytics space and it makes for good marketing, particularly for those in the Leaders-Visionaries quadrant. A particular mention goes to Tableau which has been there for three years in a row. The “ability to execute” has positioned them quite high in the quadrant, followed by Qlik. The usual suspects such as IBM and SAS are still there, and it is interesting to see Microsoft there too, one may assume taking some of the space that Revolution Analytics would have used…
I came across this post by Andy Cotgreave about features to use in an interactive dashboard as those created in Tableau.
He deconstructed a dashboard (a bit meta there, right?) to quantify the Impact vs Difficulty of each of the design choices you can potentially make. You can see a screenshot below, but you can play with the interactive version in the link above.
Yesterday I had the chance to attend the first Visualized.io conference in London. It was a fully packed day with lots of interesting speakers and fun people. The variety of the talks was quite good and most of the presentations were very well prepared. I was surprised at the bad use of video in a couple of the talk in the morning session, but apart from that it was all very good.
I ended up winning a print and it is not decorating one of the walls at home. You can see a picture at the end of the gallery below. The conference tool place at Protein in the heart of Hipsterland (aka Shoreditch) and it was a well attended event.
I particularly enjoyed the talk by David McCandless who turned out to be the mystery guest. Similarly, the presentation by Pascal Raabe about memories was very good and inspiring. Another good presentation was the “smelly” talk given by Kate McLean.
Andy Kirk gave a view about the Design of Time and you can see the slides here.
If you are interested in seeing what twitter was saying before, during and after the conference, check this page.
Finally, the conference was at Eventfire archived here, and I am surprised to see that I was the top contributor according to them! :D
The idea of the application is to generate a “never-ending and ever changing version of any song”, and this is done in a very engaging and entertaining way. You can upload your own track, which in turn is uploaded The Echo Nest, where it is decomposed into individual beats.
The beats of the song get analysed and matched to similar bits in the same song; the result is presented in a chord diagram and as the song is played the paths that join similar sounding beats come into play and make the song to brach out to a completely different part of the song. Enjoy!
I couldn’t help noticing that “Putneysw15” web site published in December a list of predicted dates and times when there will be “exceptional high tides” at Putney bridge. They showed the data from the Port of London Authority in a table and I thought it would be a good idea to visualise it. The figure above is a very crude visualisation of the data and perhaps it may be a good companion to the article in that appears here.
We’re frequently asked: What is the best tool to visualize data?
There is obviously no single answer to that question. It depends on the task at hand, and what you want to achieve.
Here’s an attempt to categorize these tasks and point to some of the tools we’ve found to be useful to complete them:
The right tool for the task
Simple one-off charts
The most common tool for simple charting is clearly Excel. It is possible to make near-perfect charts of most chart types using Excel – if you know what you’re doing. Many Excel defaults are sub-optimal, some of the chart types they offer are simply for show and have no practical application. 3D cone shaped “bars” anyone? And Excel makes no attempt at guiding a novice user to the best chart for what she wants to achieve. Here are three alternatives we’ve found useful:
Tableau is fast becoming the number one tool for many data visualization professionals. It’s client software (Windows only) that’s available for $999 and gives you a user-friendly way to create well crafted visualizations on top of data that can be imported from all of the most common data file formats. Common charting in Tableau is straight-forward, while some of the more advanced functionality may be less so. Then again, Tableau enables you to create pretty elaborate interactive data applications that can be published online and work on all common browser types, including tablets and mobile handsets. For the non-programmer that sees data visualization as an important part of his job, Tableau is probably the tool for you.
DataGraph is a little-known tool that deserves a lot more attention. A very different beast, DataGraph is a Mac-only application ($90 on the AppStore) originally designed to create proper charts for scientific publications, but has become a powerful tool to create a wide variety of charts for any occasion. Nothing we’ve tested comes close to DataGraph when creating crystal-clear, beautiful charts that are also done “right” as far as most of the information visualization literature is concerned. The workflow and interface may take a while to get the grips of, and some of the more advanced functionality may lie hidden even from an avid user for months of usage, but a wide range of samples, aggressive development and an active user community make DataGraph a really interesting solution for professional charting. If you are looking for a tool to create beautiful, yet easy to understand, static charts DataGraph may be your tool of choice. And if your medium is print, DataGraph outshines any other application on the market.
The best way to see samples of DataGraph’s capabilities is to download the free trial and browse the samples/templates on the application’s startup screen.
R is an open-source programing environment for statistical computing and graphics. A super powerful tool, R takes some programming skills to even get started, but is becoming a standard tool for any self-respecting “data scientist”. An interpreted, command line controlled environment, R does a lot more than graphics as it enables all sorts of crunching and statistical computing, even with enormous data sets. In fact we’d say that the graphics are indeed a little bit of a weak spot of R. Not to complain about the data presentation from the information visualization standpoint, most of the charts that R creates would not be considered refined and therefore needs polishing in other software such as Adobe Illustrator to be ready for publication. Not to be missed if working with R is the ggplot2 package that helps overcome some of the thornier of making charts and graphs for R look proper. If you can program, and need a powerful tool to do graphical analysis, R is your tool, but be prepared to spend significant time to make your outcome look good enough for publication, either in R or by exporting the graphics to another piece of software for touch-up.
The R Graphical Manual holds an enormous collection of browsable samples of graphics created using R – and the code and data used to make a lot of them.
Videos and custom high-resolution graphics
If you are creating data visualization videos or high-resolution data graphics, Processing is your tool. Processing is an open source integrated development environment (IDE) that uses a simplified version of Java as its programming language and is especially geared towards developing visual applications.
Processing is great for rapid development of custom data visualization applications that can either be run directly from the IDE, compiled into stand-alone applications or published as Java Applets for publishing on the web.
The area where we have found that Processing really shines as a data visualization tool, is in creating videos. It comes with a video class called MovieMaker that allows you to compose videos programmatically, frame-by-frame. Each frame may well require some serious crunching and take a long time to calculate before it is appended to a growing video file. The results can be quite stunning. Many of the best known data visualization videos are made using this method, including:
As can be seen from these examples Processing is obviously also great for rendering static, high-resolution bitmap visualizations.
So if data driven videos, or high-resolution graphics are your thing, and you’re not afraid of programming, we recommend Processing.
Charts for the Web
There are plenty – dozens, if not hundreds – of programming libraries that allow you to add charts to your web sites. Frankly, most of them are sh*t. Some of the more flashy ones use Flash or even Silverlight for their graphics, and there are strong reasons for not depending on browser plugins for delivering your graphics.
We believe we have tested most of the libraries out there, and there are only two we feel comfortable recommending, each has its pros and cons depending on what you are looking for:
Other libraries and solutions that may be worth checking out are the popular commercial solution amCharts, Google’s hosted Chart Tools and jQuery library Flot.
Special Requirements and Custom Visualizations
If you want full control of the look, feel and interactivity of your charts, or if you want to create a custom data visualization for the web from scratch, the out-of-the box libraries mentioned above will not suffice.
In fact – you’ll be surprised how soon you run into limitations that will force you to compromise on your design. Seemingly simple preferences such as “I don’t want drop shadows on the lines in my line chart”, or “I want to control what happens when a user clicks the X-axis” and you may already be stretching it with your chosen library. But consider yourself warned: The compromises may well be worth it. You may not have the time and resources to spend diving deeper, let alone writing yet-another-charting-tool™
However, if you are not one to compromise on your standards, or if you want to take it up a notch and follow the lead of some of the wonderful and engaging data journalism happening at the likes of the NY Times and The Guardian, you’re looking for something that a charting library is simply not designed to do.
The tool for you will probably be one of the following:
D3.js or “D3″ for short is in many ways the successor of Protovis. In fact Protovis is no longer under active development by the original team due to the fact that its primary developer – Mike Bostock – is now working on D3 instead.D3 builds on many of the concepts of Protovis. The main difference is that instead of having an intermediate representation that separates the rendering of the SVG (or HTML) from the programming interface, D3 binds the data directly to the DOM representation. If you don’t understand what that means – don’t worry, you don’t have to. But it has a couple of consequences that may or may not make D3 more attractive for your needs.The first one is that it – almost without exception – makes rendering faster and thereby animations and smooth transitions from one state to another more feasible. The second is that it will only work on browsers that support SVG so that you will be leaving Internet Explorer 7 and 8 users behind – and due to the deep DOM integration, enabling VML rendering for D3 is a far bigger task than for Protovis – and one that nobody has embarked on yet.
After thorough research of the available options, we chose Protovis as the base for building out DataMarket’s visualization capabilities with an eye on D3 as our future solution when modern browsers finally saturate the market. We see that horizon about 2 years from now.
Data Science is exploding… or so it seems. I came across an article by Oscar Olmedo that describes some of the stages that practitioners in the area follow, and I think it makes for an interesting discussion.
In general data science is interested in extracting knowledge from a given data set and in order to do that tools such as mathematical modelling and machine learning are employed. Olmedo lists the following steps in the data science process:
Data selection and gathering
Data cleaning/integration, and storage
I think I agree with the view and in particular I am a firm believer that steps 1 and 2 are probably the most crucial and time consuming. You can read Olmedo’s post here.