Random thoughts about random subjects… From science to literature and between manga and watercolours, passing by data science and rugby; including film, physics and fiction, programming, pictures and puns.
When I first heard about the plans that the British Library had about an exhibitions called Science is Beautiful I got very excited. I did even make an entry in my diary about the date that it was planned to be opened. Closer to the time I even encourage Twitter followers and colleagues to go to the exhibition.
The exhibition promised to explore how “our understanding of ourselves and our planet has evolved alongside our ability to represent, graph and map the mass data of the time.” So I finally made some time and made it to the British Library today… the exhibition was indeed there with some nice looking maps and graphics, but I could not help feeling utterly disappointed. I was very surprised they even call this an exhibition, the very few images, documents and interactive displays were very few and not very immersive. Probably my favourite part was looking at “The Pedigree of Man” and the “Nightingale’s Rose” together with an interactive show. Nonetheless, I felt that the British Library could have done a much better job given the wealth of documents they surely have at hand. Besides, the technology used to support the exhibits was not that great… for example the touch screens were not very responsive and did not add much to the presentation.
I couldn’t help noticing that “Putneysw15” web site published in December a list of predicted dates and times when there will be “exceptional high tides” at Putney bridge. They showed the data from the Port of London Authority in a table and I thought it would be a good idea to visualise it. The figure above is a very crude visualisation of the data and perhaps it may be a good companion to the article in that appears here.
We’re frequently asked: What is the best tool to visualize data?
There is obviously no single answer to that question. It depends on the task at hand, and what you want to achieve.
Here’s an attempt to categorize these tasks and point to some of the tools we’ve found to be useful to complete them:
The right tool for the task
Simple one-off charts
The most common tool for simple charting is clearly Excel. It is possible to make near-perfect charts of most chart types using Excel – if you know what you’re doing. Many Excel defaults are sub-optimal, some of the chart types they offer are simply for show and have no practical application. 3D cone shaped “bars” anyone? And Excel makes no attempt at guiding a novice user to the best chart for what she wants to achieve. Here are three alternatives we’ve found useful:
Tableau is fast becoming the number one tool for many data visualization professionals. It’s client software (Windows only) that’s available for $999 and gives you a user-friendly way to create well crafted visualizations on top of data that can be imported from all of the most common data file formats. Common charting in Tableau is straight-forward, while some of the more advanced functionality may be less so. Then again, Tableau enables you to create pretty elaborate interactive data applications that can be published online and work on all common browser types, including tablets and mobile handsets. For the non-programmer that sees data visualization as an important part of his job, Tableau is probably the tool for you.
DataGraph is a little-known tool that deserves a lot more attention. A very different beast, DataGraph is a Mac-only application ($90 on the AppStore) originally designed to create proper charts for scientific publications, but has become a powerful tool to create a wide variety of charts for any occasion. Nothing we’ve tested comes close to DataGraph when creating crystal-clear, beautiful charts that are also done “right” as far as most of the information visualization literature is concerned. The workflow and interface may take a while to get the grips of, and some of the more advanced functionality may lie hidden even from an avid user for months of usage, but a wide range of samples, aggressive development and an active user community make DataGraph a really interesting solution for professional charting. If you are looking for a tool to create beautiful, yet easy to understand, static charts DataGraph may be your tool of choice. And if your medium is print, DataGraph outshines any other application on the market.
The best way to see samples of DataGraph’s capabilities is to download the free trial and browse the samples/templates on the application’s startup screen.
R is an open-source programing environment for statistical computing and graphics. A super powerful tool, R takes some programming skills to even get started, but is becoming a standard tool for any self-respecting “data scientist”. An interpreted, command line controlled environment, R does a lot more than graphics as it enables all sorts of crunching and statistical computing, even with enormous data sets. In fact we’d say that the graphics are indeed a little bit of a weak spot of R. Not to complain about the data presentation from the information visualization standpoint, most of the charts that R creates would not be considered refined and therefore needs polishing in other software such as Adobe Illustrator to be ready for publication. Not to be missed if working with R is the ggplot2 package that helps overcome some of the thornier of making charts and graphs for R look proper. If you can program, and need a powerful tool to do graphical analysis, R is your tool, but be prepared to spend significant time to make your outcome look good enough for publication, either in R or by exporting the graphics to another piece of software for touch-up.
The R Graphical Manual holds an enormous collection of browsable samples of graphics created using R – and the code and data used to make a lot of them.
Videos and custom high-resolution graphics
If you are creating data visualization videos or high-resolution data graphics, Processing is your tool. Processing is an open source integrated development environment (IDE) that uses a simplified version of Java as its programming language and is especially geared towards developing visual applications.
Processing is great for rapid development of custom data visualization applications that can either be run directly from the IDE, compiled into stand-alone applications or published as Java Applets for publishing on the web.
The area where we have found that Processing really shines as a data visualization tool, is in creating videos. It comes with a video class called MovieMaker that allows you to compose videos programmatically, frame-by-frame. Each frame may well require some serious crunching and take a long time to calculate before it is appended to a growing video file. The results can be quite stunning. Many of the best known data visualization videos are made using this method, including:
As can be seen from these examples Processing is obviously also great for rendering static, high-resolution bitmap visualizations.
So if data driven videos, or high-resolution graphics are your thing, and you’re not afraid of programming, we recommend Processing.
Charts for the Web
There are plenty – dozens, if not hundreds – of programming libraries that allow you to add charts to your web sites. Frankly, most of them are sh*t. Some of the more flashy ones use Flash or even Silverlight for their graphics, and there are strong reasons for not depending on browser plugins for delivering your graphics.
We believe we have tested most of the libraries out there, and there are only two we feel comfortable recommending, each has its pros and cons depending on what you are looking for:
Other libraries and solutions that may be worth checking out are the popular commercial solution amCharts, Google’s hosted Chart Tools and jQuery library Flot.
Special Requirements and Custom Visualizations
If you want full control of the look, feel and interactivity of your charts, or if you want to create a custom data visualization for the web from scratch, the out-of-the box libraries mentioned above will not suffice.
In fact – you’ll be surprised how soon you run into limitations that will force you to compromise on your design. Seemingly simple preferences such as “I don’t want drop shadows on the lines in my line chart”, or “I want to control what happens when a user clicks the X-axis” and you may already be stretching it with your chosen library. But consider yourself warned: The compromises may well be worth it. You may not have the time and resources to spend diving deeper, let alone writing yet-another-charting-tool™
However, if you are not one to compromise on your standards, or if you want to take it up a notch and follow the lead of some of the wonderful and engaging data journalism happening at the likes of the NY Times and The Guardian, you’re looking for something that a charting library is simply not designed to do.
The tool for you will probably be one of the following:
D3.js or “D3″ for short is in many ways the successor of Protovis. In fact Protovis is no longer under active development by the original team due to the fact that its primary developer – Mike Bostock – is now working on D3 instead.D3 builds on many of the concepts of Protovis. The main difference is that instead of having an intermediate representation that separates the rendering of the SVG (or HTML) from the programming interface, D3 binds the data directly to the DOM representation. If you don’t understand what that means – don’t worry, you don’t have to. But it has a couple of consequences that may or may not make D3 more attractive for your needs.The first one is that it – almost without exception – makes rendering faster and thereby animations and smooth transitions from one state to another more feasible. The second is that it will only work on browsers that support SVG so that you will be leaving Internet Explorer 7 and 8 users behind – and due to the deep DOM integration, enabling VML rendering for D3 is a far bigger task than for Protovis – and one that nobody has embarked on yet.
After thorough research of the available options, we chose Protovis as the base for building out DataMarket’s visualization capabilities with an eye on D3 as our future solution when modern browsers finally saturate the market. We see that horizon about 2 years from now.