Today I decided to take a bit of a detour while working on a potential project for #VizForSocialGood. I was focused on a data set provided by UNICEF that showed the number of migrants from different areas/regions/countries to destination regions/countries. I’m pretty sure it is the direct companion to a chord diagram that UNICEF published as part of their Uprooted report.
As I was working through the data, I wanted to take it and start at the same place. Focus on migration globally and then narrow the focus in on children affected by migration.
Needless to say – I got side tracked. I started by wanting to make paths on maps showing the movement of migrants. I haven’t really done this very often, so I figured this would be a great data set to play with. Once I set that up, it quickly divulged into something else.
I wasn’t satisfied with the density of the data. The clarity of how it was displayed wasn’t there for me. So I decided to take an abstract take on the same concept. As if by fate I had received Chart Chooser cards in the mail earlier and Josh and I were reviewing them. We were having a conversation about the various uses of each chart and brainstorming on how it could be incorporated into our next Tableau user group (I really do eat, drink, and breathe this stuff).
Anyway – one of the charts we were talking about was the sankey diagram. So it was already on my mind and I’d seen it accomplished multiple times in Tableau. It was time to dive in and see how this abstraction would apply to the geospatial.
I started with Chris Love’s basic tutorial of how to set up a sankey. It’s a really straightforward read that explains all the concepts required to make this work. Here’s the quick how-to in my paraphrased words.
- Duplicate your data via a Union, identify the original and the copy (Which is great because I had already done this for the pathing) As I understand it from Chris’s write-up this let’s us ‘stretch out’ the data so to speak.
- Once the data is stretched out, it’s filled in by manipulating the binning feature in Tableau. My interpretation would be that the bins ‘kind of’ act like dimensions (labeled out by individual integers). This becomes useful in creating individual points that eventually turn into the line (curve).
- Next there are ranking functions made to determine the starting and end points of the curves.
- Finally the curve is built using a mathematical function called a sigmoid function. This is basically an asymptotic function that goes from -1 to 1 and has a middle area with a slope of ~1.
- After the curve is developed, the points are plotted. This is where the ranking is set up to determine the leftmost and rightmost points. Chris’s original specifications had the ranking straightforward for each of the dimensions. My final viz is a riff on this.
- The last steps are to switch the chart to a line chart and then build out the width (size) of the line based on the measure you used in the ranking (percent of total) calculation.
So I did all those steps and ended up with exactly what was described – a sankey diagram. A brilliant one too, I could quickly switch the origin dimension to different levels (major area, region, country) and do similar work on the destination side. This is what ultimately led me to the final viz I made.
So while adjusting the table calculations, I came to one view that I really enjoyed. The ranking pretty much “broke” for the initial starting point (everything was at 1), but the destination was right. What this did for the viz was take everything from a single point and then create roots outward. Initial setup had this going from left to right – but it was quite obvious that it looked like tree roots. So I flipped it all.
I’ll admit – this is mostly a fun data shaping/vizzing exercise. You can definitely gain insights through the way it is deployed (take a look at Latin America & Caribbean).
After the creation of the curvy (onion shape), it was a “what to add next” free for all. I had wrestled with the names of the destination countries to try and get something reasonable, but couldn’t figure out how to display them in proximity with the lines. No matter – the idea of a word cloud seemed kind of interesting. You’d get the same concept of the different chord sizes passed on again and see a ton of data on where people are migrating. This also led to some natural interactivity of clicking on a country code to see its corresponding chords above.
Finally to add more visual context a simple breakdown of the major regions origin to destinations. To tell the story a bit further. The story points for me: most migrants move within their same region, except for Latin America/Caribbean.