Data Illustrator and Charticulator – Reblog

Reblog from here.

New tools: Data Illustrator and Charticulator

Posted on August 31, 2018 by 5wgraphicsblog

Anyone interested in creating their own data visualizations should be giddy with delight with the quickly growing number of tools available to create them without any need for programming skills, and in most cases for free: Tableau, Flourish, Datawrapper, RawGraphs, Chartbuilder or QGIS (for mapping) are some of the best, and the list goes on and on. I’m convinced in a relatively short time drag and drop tools with be as powerful and flexible as D3.js and other developer tools, making data visualization accesible to everyone.

The exciting news is seeing two software giants entering the field with new web-based tools: Adobe launched Data Illustrator a few months ago in a collaboration with the Georgia Institute of Technology, and Microsoft Research is behind the just released Charticulator. Both work very intuitively, allowing the author to bind multiple attributes of data to graphical elements. They are indeed powered by D3.js, among other libraries.

Both offer introduction videos in their hope pages. Here is Data Illustrator:

And here is Charticulator:

The tools offer tutorial sections and multiple step-by-step videos in their galleries; and they link to the research papers describing the tools, which are worth reading (Data Illustrator, Charticulator).

Creating complex visualizations like the chord diagram below seems ridiculously simple in Charticulator, and the same can be said of Data Illustrator’s visualizations. See the video:

This is not a review as I have just started playing with them, but on first look both tools are impressive. It’s still really early in their development, but if Adobe and Microsoft throw their mighty resources to support and improve them, we can expect great things in the near future. Perhaps one day Data Illustrator could be embedded within Adobe Illustrator, allowing designers to work fluidly and easily between D3 and Illustrator without leaving the graphical interface. And Charticulator could integrate into PowerPoint. Stay tuned!

Flourish – data visualisation made easy

I recently came across Flourish, a data visualisation tool that makes things easy and can be used even if your programming skills are a bit rusty. The tool is the brainchild of studio Kiln, who have made the tool entirely web-based and they even offer a free public version.

Starting up is easy as you are encouraged to use templates and can upload your data from a CSV or Excel. Some of the templates offer the usual scatterplots and bar charts, but you also have things like Sankey diagrams or 3D globe maps. If you are interested you can also create your own custom templates.

Flourish’s free version allows you to publish and share visualisations, or to embed them in your website. Beware that the data will be visible to everyone once you publish. Give it a go and let me know what you think.

-j

CoreML – Boston Prices exploration

In the previous post of this series we described some of the basics of linear regression, one of the most well-known models in machine learning. We saw that we can relate the values of input parameters x_i to the target variable y to be predicted. In this post we are going to create a linear regression model to predict the price of houses in Boston (based on valuations from 1970s). The dataset provides information such as Crime (CRIM), areas of non-retail business in the town (INDUS), the age of people who own the house (AGE), average number of rooms (RM) as well as the median value of homes in $1000s (MEDV) as well as other attributes.

Let us start by exploring the data. We are going to use Scikit-learn and fortunately the dataset comes with the module. The input variables are included in the data method and the price is given by the target. We are going to load the input variables in the dataframe boston_df and the prices in the array y:

from sklearn import datasets
import pandas as pd 
boston = datasets.load_boston() 
boston_df = pd.DataFrame(boston.data)
boston_df.columns = boston.feature_names
y = boston.target

We are going to build our model using only a limited number of inputs. In this case let us pay attention to the average number of rooms and the crime rate:

X = boston_df[['CRIM', 'RM']]
X.columns = ['Crime', 'Rooms']
X.describe()

The description of these two attributes is as follows:

            Crime       Rooms
count  506.000000  506.000000
mean     3.593761    6.284634
std      8.596783    0.702617
min      0.006320    3.561000
25%      0.082045    5.885500
50%      0.256510    6.208500
75%      3.647423    6.623500
max     88.976200    8.780000

As we can see the minimum number of rooms is 3.5 and the maximum is 8.78, whereas for the crime rate the minimum is 0.006 and the maximum value is 88.97, nonetheless the median is 0.25. We will use some of these values to define the ranges that will be provided to our users to find price predictions.

Finally, let us visualise the data:

We shall bear these values in mind when building our regression model in subsequent posts.

You can look at the code (in development) in my github site here.

First steps into the Internet of Things – Meetup 25th August

Hello guys,

I am joining forces with Bob Yelland (IBM) again to organise a joint meetup. I say again as we organised a joint session a few months back between the Big Data Developers in London and the Data+Visual meet up. I even gave a talk on that one, about “Data Visualisation: The good, the bad and the ugly” Unlike the previous one, we are actually physically joining the attendees rather than having parallel sessions.

The event is now live and it will take place on the 25th of August. Shall I see you there?

On the Skills Matter site:  https://skillsmatter.com/meetups/8259-datapalooza-nights-meetup#overview

and the MeetUp site: https://www.meetup.com/Big-Data-Developers-in-London/events/232919166/

 

Datapalooza IoT

Intro to Data Science Talk

Yesterday I had the pleasure to give a community talk at Campus London as part of the events organised by General Assembly London. The place was fully packed and I was quite pleased to see that the audience was very engaged as they asked questions, made comments and great remarks.

As expected, the audience was quite varied from students interested to break into the field, to seasoned analysts and startup entrepreneurs. The questions were all very pertinent and I hope that the answers provided were useful to all of them.

The talk was effectively an introduction to what data science is, the tools used and opportunities and challenged in the field. You can find a handout of the slides here.

Download (PDF, Unknown)

 

 

 

Women in State Legislature

This week, for the Makeover Monday we have some data from a visualisation created by the National Conference on State Legislatures. According to the NCSL:

Approximately 1,809 women serve in the 50 state legislatures at the beginning of the 2016 legislative session. Women make up 24.5 percent of all state legislators nationwide.

Here is my entry:

Data+Visual Meetup – “Data Visualisation: The good, the bad and the ugly”

Last week I had the opportunity not only of hosting, but also of speaking at the Data+Visual Meetup organised by Eric Hannell. The occasion was well attended and not just by those interested in data visualisation, but also in big data as the Big Data Developers in London meetup took place concurrently.

The Data+Visual started up with a talk by Andy Kriebel and his Makeover Monday project:

In 2016 Andy Cotgreave will be joining me on the weekly #MakeoverMonday series so that we can compare how we each quickly take a foreign data set and turn it into a more meaningful visualisation. We’re also very curious to see our different approaches.

I took the open invite that Andy has made to take part, and I have posted my first Makeover Monday visualisation.

As for my talk, well, I wanted to use it  as a reminder of the uses that visualisation of data and information has in every-day life and of the best practices that one should bear in mind when putting together a visual. Since data visualisation can be used (among other things) for:

  • Answer questions
  • Make decisions
  • See data in context
  • Support graphical calculation
  • Find patterns
  • Present an argument
  • Tell a story
  • Inspire

we should take into account how the visualisation is going to be consumed, the audience and the message that we want to transmit. During the talk I showed some examples where data visualisation has been used effectively, but also some where it has’t (and how they could be improved). The aim is not to criticise (no-one deliberately goes out of their way to make a bad visual), but to learn.

Enjoy and catch you soon.

Makeover Monday: Will a sugar tax have an impact on childhood obesity?

Following up the Data+Visual Meetup hosted at IBM last Wednesday, I wanted to take part in the Makeover Monday project that Andy Kriebel highlighted during his talk.

This week the data was came from the BBC and in particular the visualisation that shows how people in the UK get their added sugar:

This is a story that follows up the recent announcement by Chancellor George Osbourne about a tax on sugary drinks in the UK. Here is my Makeover Monday for the visualisation above:

Sugar UK

 

 

Voilà!

First full draft of “Data Science and Analytics with Python”

It has been nearly 12 months in development almost to the day, and I am very please to tell you that the first full draft of my new book entitled “Data Science and Analytics with Python” is ready.

Data Analytics Python

The book is aimed at data enthusiasts and professionals with some knowledge of programming principles as well as developers and business people interested in learning more about data science and analytics The proposed table of contents is as follows:

  1. The Trials and Tribulations of a Data Scientist
  2. Firsts Slithers with Python
  3. The Machine that Goes “Ping”: Machine Learning and Pattern Recognition
  4. The Relationship Conundrum: Regression
  5. Jackalopes and Hares, Unicorns and Horses: Clustering and Classification
  6. Decisions, Decisions: Hierarchical Clustering, Decision Trees and Ensemble Techniques
  7. Dimensionality Reduction and Support Vector Machines

At the moment the book contains 53 figures and 18 tables, plus plenty of bits and pieces of code ready to be tried.

The next step is to start the re-reading, re-draftings and revisions in preparation for the final version and submission to my publisher CRC Press later in the year. I will keep you posted as how things go.

Keep in touch!

 

Should You Ever Use a Pie Chart?

Much has been said in pro and against the use of pie charts… And the discussion is by no means something new For example in 1923, the American economist Karl G. Karsten warned us against the pie chart. Karsten’s claims in his book Charts and Graphs are remarkably similar to those heard today:

The disadvantages of the pie chart are many. It is worthless for study and research purposes. In the first place the human eye cannot easily compare as to length the various arcs about the circle, lying as they do in different directions. In the second place, the human eye is not naturally skilled in comparing angles… In the third place, the human eye is not an expert judge of comparative sizes or areas, especially those as irregular as the segments of parts of the circle. There is no way by which the parts of this round unit can be compared so accurately and quickly as the parts of a straight line or bar.

If you are interested to read more about the history of the controversial use of the pie chart, take a look at this post by Dan Kopf.

Pie Chart
Pie Chart