Random thoughts about random subjects… From science to literature and between manga and watercolours, passing by data science and rugby; including film, physics and fiction, programming, pictures and puns.
Anyone interested in creating their own data visualizations should be giddy with delight with the quickly growing number of tools available to create them without any need for programming skills, and in most cases for free: Tableau, Flourish, Datawrapper, RawGraphs, Chartbuilder or QGIS (for mapping) are some of the best, and the list goes on and on. I’m convinced in a relatively short time drag and drop tools with be as powerful and flexible as D3.js and other developer tools, making data visualization accesible to everyone.
The exciting news is seeing two software giants entering the field with new web-based tools: Adobe launched Data Illustrator a few months ago in a collaboration with the Georgia Institute of Technology, and Microsoft Research is behind the just released Charticulator. Both work very intuitively, allowing the author to bind multiple attributes of data to graphical elements. They are indeed powered by D3.js, among other libraries.
Both offer introduction videos in their hope pages. Here is Data Illustrator:
And here is Charticulator:
The tools offer tutorial sections and multiple step-by-step videos in their galleries; and they link to the research papers describing the tools, which are worth reading (Data Illustrator,Charticulator).
Creating complex visualizations like the chord diagram below seems ridiculously simple in Charticulator, and the same can be said of Data Illustrator’s visualizations. See the video:
This is not a review as I have just started playing with them, but on first look both tools are impressive. It’s still really early in their development, but if Adobe and Microsoft throw their mighty resources to support and improve them, we can expect great things in the near future. Perhaps one day Data Illustrator could be embedded within Adobe Illustrator, allowing designers to work fluidly and easily between D3 and Illustrator without leaving the graphical interface. And Charticulator could integrate into PowerPoint. Stay tuned!
I recently came across Flourish, a data visualisation tool that makes things easy and can be used even if your programming skills are a bit rusty. The tool is the brainchild of studio Kiln, who have made the tool entirely web-based and they even offer a free public version.
Starting up is easy as you are encouraged to use templates and can upload your data from a CSV or Excel. Some of the templates offer the usual scatterplots and bar charts, but you also have things like Sankey diagrams or 3D globe maps. If you are interested you can also create your own custom templates.
Flourish’s free version allows you to publish and share visualisations, or to embed them in your website. Beware that the data will be visible to everyone once you publish. Give it a go and let me know what you think.
Last week I had the opportunity not only of hosting, but also of speaking at the Data+Visual Meetup organised by Eric Hannell. The occasion was well attended and not just by those interested in data visualisation, but also in big data as the Big Data Developers in London meetup took place concurrently.
In 2016 Andy Cotgreave will be joining me on the weekly #MakeoverMonday series so that we can compare how we each quickly take a foreign data set and turn it into a more meaningful visualisation. We’re also very curious to see our different approaches.
As for my talk, well, I wanted to use it as a reminder of the uses that visualisation of data and information has in every-day life and of the best practices that one should bear in mind when putting together a visual. Since data visualisation can be used (among other things) for:
See data in context
Support graphical calculation
Present an argument
Tell a story
we should take into account how the visualisation is going to be consumed, the audience and the message that we want to transmit. During the talk I showed some examples where data visualisation has been used effectively, but also some where it has’t (and how they could be improved). The aim is not to criticise (no-one deliberately goes out of their way to make a bad visual), but to learn.
It has been nearly 12 months in development almost to the day, and I am very please to tell you that the first full draft of my new book entitled “Data Science and Analytics with Python” is ready.
The book is aimed at data enthusiasts and professionals with some knowledge of programming principles as well as developers and business people interested in learning more about data science and analytics The proposed table of contents is as follows:
The Trials and Tribulations of a Data Scientist
Firsts Slithers with Python
The Machine that Goes “Ping”: Machine Learning and Pattern Recognition
The Relationship Conundrum: Regression
Jackalopes and Hares, Unicorns and Horses: Clustering and Classification
Decisions, Decisions: Hierarchical Clustering, Decision Trees and Ensemble Techniques
Dimensionality Reduction and Support Vector Machines
At the moment the book contains 53 figures and 18 tables, plus plenty of bits and pieces of code ready to be tried.
The next step is to start the re-reading, re-draftings and revisions in preparation for the final version and submission to my publisher CRC Press later in the year. I will keep you posted as how things go.
Much has been said in pro and against the use of pie charts… And the discussion is by no means something new For example in 1923, the American economist Karl G. Karsten warned us against the pie chart. Karsten’s claims in his book Charts and Graphs are remarkably similar to those heard today:
The disadvantages of the pie chart are many. It is worthless for study and research purposes. In the first place the human eye cannot easily compare as to length the various arcs about the circle, lying as they do in different directions. In the second place, the human eye is not naturally skilled in comparing angles… In the third place, the human eye is not an expert judge of comparative sizes or areas, especially those as irregular as the segments of parts of the circle. There is no way by which the parts of this round unit can be compared so accurately and quickly as the parts of a straight line or bar.
If you are interested to read more about the history of the controversial use of the pie chart, take a look at this post by Dan Kopf.
I recently asked the guys at the Data Science class at GA to bring a good example of a “daft graph” and they all did that with gusto. As usual, there was a certain cable news channel that was mentioned a lot of times for their misleading use of graphics.
1919– Extracts from an Investigation into the Physical Properties of Books as They Are At Present Published. The Society of Calligraphers, Boston.
This is a small pamphlet that was designed and authored by the graphic designer W.A. Dwiggins and his cousin L.B. Sigfried. It pilloried the format of books and his concern for the poor methods of printing trade books in the US at that time.
The book was published by the imaginary Society of Calligraphers and the stinging investigation was a hoax cooked up by Dwiggins – nevertheless it did have an effect on publishing in the US following its wide distribution.
The graph by Dwiggins shows the reduction in book quality since 1910.
I have been having a break from creating the Quantum Tunnel Podcast. Partly this was because I did not have a suitable replacement to host the material after Apple got rid of MobileMe and sites… Then I just didn’t have that much time. I will pick it up one of these days… Do remind me please.
Nonetheless, my podcast listening has continued and I get my fix from the likes of the excellent RadioLab podcast, Freakonomics and even More or Less. Recently I have started collecting podcasts that talk about data science and machine learning and here are some examples of what has hit my podcast list:
Data Stories is a great chat forum between Enrico Bertini and Moritz Stefaner plus guests; really interesting guests! The main focus is data visualisation but they chat about all sorts of related topics.
The Data Skeptic ranges from 10 minute conversations between Kyle Polich and his wife Linda, trying to elucidate concepts and areas of interest in statistics, machine learning, probability and others, through to interviews/chats with guests. Worth checking out!
I really enjoy the conversations that Katherine Gorman and Ryan Adams have regarding topics around machine learning. I like the question and answer session where listeners can send their queries. Interesting guests and always fun to listen to
I recently heard about this podcast and just downloaded the latest episode but have not had a chance to hear it. I assume that the information will be as interesting as other O’Reilly’s outlets. Looking forward to hearing what Ben Lorica has to say!