The Flow of Human Migration

Today I decided to take a bit of a detour while working on a potential project for #VizForSocialGood.  I was focused on a data set provided by UNICEF that showed the number of migrants from different areas/regions/countries to destination regions/countries.  I’m pretty sure it is the direct companion to a chord diagram that UNICEF published as part of their Uprooted report.

As I was working through the data, I wanted to take it and start at the same place.  Focus on migration globally and then narrow the focus in on children affected by migration.

Needless to say – I got side tracked.  I started by wanting to make paths on maps showing the movement of migrants.  I haven’t really done this very often, so I figured this would be a great data set to play with.  Once I set that up, it quickly divulged into something else.

I wasn’t satisfied with the density of the data.  The clarity of how it was displayed wasn’t there for me.  So I decided to take an abstract take on the same concept.  As if by fate I had received Chart Chooser cards in the mail earlier and Josh and I were reviewing them.  We were having a conversation about the various uses of each chart and brainstorming on how it could be incorporated into our next Tableau user group (I really do eat, drink, and breathe this stuff).

Anyway – one of the charts we were talking about was the sankey diagram.  So it was already on my mind and I’d seen it accomplished multiple times in Tableau.  It was time to dive in and see how this abstraction would apply to the geospatial.

I started with Chris Love’s basic tutorial of how to set up a sankey.  It’s a really straightforward read that explains all the concepts required to make this work.  Here’s the quick how-to in my paraphrased words.

  1. Duplicate your data via a Union, identify the original and the copy (Which is great because I had already done this for the pathing)  As I understand it from Chris’s write-up this let’s us ‘stretch out’ the data so to speak.
  2. Once the data is stretched out, it’s filled in by manipulating the binning feature in Tableau.  My interpretation would be that the bins ‘kind of’ act like dimensions (labeled out by individual integers).  This becomes useful in creating individual points that eventually turn into the line (curve).
  3. Next there are ranking functions made to determine the starting and end points of the curves.
  4. Finally the curve is built using a mathematical function called a sigmoid function.  This is basically an asymptotic function that goes from -1 to 1 and has a middle area with a slope of ~1.
  5. After the curve is developed, the points are plotted.  This is where the ranking is set up to determine the leftmost and rightmost points.  Chris’s original specifications had the ranking straightforward for each of the dimensions.  My final viz is a riff on this.
  6. The last steps are to switch the chart to a line chart and then build out the width (size) of the line based on the measure you used in the ranking (percent of total) calculation.

So I did all those steps and ended up with exactly what was described – a sankey diagram.  A brilliant one too, I could quickly switch the origin dimension to different levels (major area, region, country) and do similar work on the destination side.  This is what ultimately led me to the final viz I made.

So while adjusting the table calculations, I came to one view that I really enjoyed.  The ranking pretty much “broke” for the initial starting point (everything was at 1), but the destination was right.  What this did for the viz was take everything from a single point and then create roots outward.  Initial setup had this going from left to right – but it was quite obvious that it looked like tree roots.  So I flipped it all.

I’ll admit – this is mostly a fun data shaping/vizzing exercise.  You can definitely gain insights through the way it is deployed (take a look at Latin America & Caribbean).

After the creation of the curvy (onion shape), it was a “what to add next” free for all.  I had wrestled with the names of the destination countries to try and get something reasonable, but couldn’t figure out how to display them in proximity with the lines.  No matter – the idea of a word cloud seemed kind of interesting.  You’d get the same concept of the different chord sizes passed on again and see a ton of data on where people are migrating.  This also led to some natural interactivity of clicking on a country code to see its corresponding chords above.

Finally to add more visual context a simple breakdown of the major regions origin to destinations.  To tell the story a bit further.  The story points for me: most migrants move within their same region, except for Latin America/Caribbean.

Makeover Monday 2017 – Week 4 New Zealand Tourism

This week’s Makeover was addressing Domestic and International tourism trend in New Zealand.  No commentary provided with the data set, the original was just 2 charts left to the user to interpret.  See Eva’s tweet for the originals:

Going back to basics this week with what I like and dislike about it:

  • Titles are clear, bar chart isn’t too busy (like)
  • Not too many grid lines (like)
  • It’s easy to see the shape of the data and seasonality (like)
  • The scales are different between International & Domestic (dislike)
  • 3 years for easy comparison (like)
  • Eva chose this to promote her home country (like)

I think this was a good data set for week 4.  No data story to rewrite, special attention was made by Eva to mitigate data misinterpretation, and she added on a bonus of geospatial data for New Zealand.

My process really began with the geospatial part.  I haven’t yet had a chance to work with geospatial particulars developed and appended to data sets.  My experience in this has been limited to using Tableau’s functionality to manually add in latitude and longitude for unclear/missing/invalid data points.

So as I got started, I had no idea how to use the data.  There were a few fields that certainly pointed me in the right direction.  The first was “Point Order.”  Immediately I figured that needed to be used on “path” to determine where each data point fell.  That got me to this really cute outlined version of NZ (which looks like an upside down boot):

So I knew something additional needed to be done to get to a filled map.  That’s when I discovered the “PolygonNumber” field.  Throwing that onto detail, changing my marks to polygon and voila – New Zealand.  Here’s a Google image result for a comparison:

Eva did a great job trying to explain how NZ is broken up in terms of regions/territories/areas, but I have to admit I got a little lost.  I think what’s clear from the two pictures is I took the most granular approach to dissecting the country.

I’m super thrilled that I got this hands on opportunity to implement.  Geospatial is one of those areas of analytics that everyone wants to go and by including this – I feel much more equipped for challenges in the future.

Next up was the top viz: I’ve been wanting to try out a barbell/DNA chart for a long time.  I’ve made them in the past, but nothing that’s landed in a final viz.  I felt like there was an opportunity to try this out with the data set based on the original charts.  I quickly threw that together (using Andy Kriebel’s video tutorial) and really enjoyed the pattern that emerged.

The shape of the data really is what kicked off the path that the final viz ended up taking on.  I liked the stratification of domestic vs. international and wanted to carry that throughout.  This is also where I chose a colors.

The bottom left chart started out it’s life as a slope chart.  I originally did the first data point vs. the last data point (January 2008 vs. April 2016) for both of the types of tourism.  It came out to be VERY misleading – international had plummeted.  When I switched over to an annual aggregation, the story was much different.

International NZ tourism is failing!
Things look less scary and International is improving!

Good lesson in making sure to holistically look at the data.  Not to go super macro and get it wrong.  Find the right level of aggregation that keeps the message intact.

The last viz was really taking the geospatial component and adding in the tourism part.  I am on a small multiples kick and loved the novelty of having NZ on there more than once.  Knowing that I could repeat the colors again by doing a dual axis map got me sold.

All that was left was to add in interactivity.  Interactivity that originally was based on the barbell and line chart for the maps, but wasn’t quite clear.  I HATE filter drop downs for something that is going to be a static presentation (Twitter picture), so I wanted to come up with a way to give the user a filter option for the maps (because the shading does change over time), but have it be less tied to the static companion vizzes.  This is where I decided to make a nice filter sheet of the years and drop a nice diverging color gradient to add a little more beauty.  I’m really pleased with how that turned out.

My last little cute moment is the data sourcing.  The URLs are gigantic and cluttered the viz.  So instead I made a basic sheet with URL actions to quickly get to both data sets.

A fun week and one that I topped off by spelling Tourism wrong in the initial Tweet (haha).  Have to keep things fun and not super serious.

Full dashboard here.

Synergy through Action

This has been an amazing week for me.  On the personal side of things my ship is sailing in the right direction.  It’s amazing what the new year can do to clarify values and vision.

Getting to the specifics of why I’m calling this post “Synergy through Action.”  That’s the best way for me to describe how my participation in this week’s Tableau and data visualization community offerings have influenced me.

It all actually started on Saturday.  I woke up and spent the morning working on a VizforSocialGood project, specifically a map to represent the multiple locations connected to the February 2017 Women in Data Science conference.  I’d been called out on Twitter (thanks Chloe) and felt compelled to participate.  The kick of passion I received after submitting my viz propelled me into the right mind space to tackle 2 papers toward my MBA.

Things continued to hold steady on Sunday where I took on the #MakeoverMonday task of Donald Trump’s tweets.  I have to imagine that the joy from accomplishment was the huge motivator here.  Otherwise I can easily imagine myself hitting a wall.  Or perhaps it gets easier as time goes on?  Who knows, but I finished that viz feeling really great about where the week was headed.

Monday – Alberto Cairo and Heather Krause’s MOOC was finally open!  Thankfully I had the day off to soak it all in.  This kept my brain churning.  And by Wednesday I was ready for a workout!

So now that I’ve described my week – what’s the synergy in action part?  Well I took all the thoughts from the social good project, workout Wednesday, and the sage wisdom from the MOOC this week to hit on something much closer to home.

I wound up creating a visualization (in the vein of) the #WorkoutWednesday redo offered up.  What’s it of?  Graduation rates of specific demographics for every county in Arizona for the past 10ish years.  Stylized into small multiples using at smattering of slick tricks I was required to use to complete the workout.

Here’s the viz – although admittedly it is designed more as a static view (not quite an infographic).

 

And to sum it all up: this could be the start of yet another spectacular thing.  Bringing my passion to the local community that I live in – but more on a widespread level (in the words of Dan Murray, user groups are for “Tableau zealots”).

Makeover Monday 2017 – Week 3 Trump Tweets

**Update (1/20/17) : The original data set had a date formatting snafu resulting in 1307 tweets at the 12:00-12:59 PM (UTC time) hour to be displayed as 00:00-00:59 (aka 12 AM hour).  This affected 4.3% of the original data set visualization and has been corrected.  I have also added a footnote denoting the visualization is in EST.  This affects the shape of the data in both the 4 AM – 8 AM and 4 PM – 8 PM sections.

Rolling right along into week 3’s Makeover Monday.  The data set this week: Donald Trump’s tweets.  The original Buzzfeed viz and article accompanying this analyzed Trump’s retweet activity since his announcement of running for president.  The final viz ended up being what I would best describe as bubble charts of the top users he retweeted during this time:

What’s interesting is that the actual article goes into significant depth on how their team systematically reviewed the tweets.  It’a a bummer that the additional analysis done couldn’t be synthesized into visual form.

My take on the makeover this week was driven completely by the underlying data available.  The TDE provided had the following fields:

Two things stuck out to me with the data.  First: the username being retweeted wasn’t included; second: the entire tweet text was included.  Having full text available just screams for some sort of text analysis.  I got committed at that point to doing something with the text.

My initial idea was to do some sort of sentiment analysis.  Recently I had installed both R-Studio and Python on my PC to try integration with Tableau.  I’d had success with R-Studio (mind you after watching a brief YouTube video), but I hadn’t gotten Python to cooperate (my effort in assisting in this cooperation = 2 out of 10).  I figured since I had both available maybe I should make an attempt.  After marinating on the concept I didn’t feel comfortable adding more sentiment analysis to the fire of American politics.  (On a personal note: I have been politically checked out since the early primaries.)

So instead of doing sentiment analysis, I decided to turn the data more into text mining for mentions and hashtags.  I had done some fiddling with the time component and was digging how the cycle plot/horizon chart were playing out visually.  So it seemed natural to continue on a progression of getting more details out of the bars and times of day.

Note on the time: time is graciously parsed into correct format with the data.  In looking at the original time, I am under the impression it was represented in GMT (+0000).  To adjust for this, I added -5 hours to all of the parsed dates to put it in EST aka Trump time.

So back to text mining.  Post #data16 conference, a colleague of mine was recounting how to use regex to scrub through text.  I walked away from his talk thinking I need to use that next time I have the opportunity.  And what I love about it: NATIVE FUNCTION TO TABLEAU!!  So this was making me sing.  Now I don’t know a ton about regex (lots of notation I have yet to memorize), so I decided to quickly google my way to getting the user handles and hashtags.  These handy results really made this analysis zip along: regexr & regex+twitter.

Everything else came to life pretty quickly.  I knew I wanted to include at least one or two tweets to read through, but I wanted to keep it curated.  I think this was accomplished well and I spent a good deal of time trying out different time combinations just to see what would bubble to the surface.

A final note on aesthetics this week: I’m reading Alberto Cairo’s The Functional Art, and as I mentioned in an earlier post, I’m also participating in his MOOC that starts tomorrow.  I am only 4 chapters in, but Alberto has me taking a few things to heart.  I don’t think it is by coincidence that I decided to push the beauty side of things.  I always strive for elegance, but I strive for it through white space and keeping that “data ink ratio” at a certain point.  But I’m not blind to the different visualizations out there that attract people.  So for once I used a non-white background (yay!).  And I also went for a font that’s well outside of the look of my usual vizzing font.

More than focusing on aesthetics, is of course the function of the viz.  I tried to spend more time thinking about the audience and what they were going to “get” out of it.  I hope that the final product is less of a “visual aid” to my analysis and more of an interactive tool to explore the tweets of the soon to be President.

Full viz available on my Tableau public page.

Makeover Monday 2017 – Week 2

It’s time for Makeover Monday – Week 2.  This week’s data set was the quarterly sales (by units) of Apple iPhones for the past 10ish years.  The original article accompanying the data indicated that the golden years of Apple may be over.

So let me start by saying – I broke the rules (or rather, the guidelines).  Makeover Monday guidelines indicate that the goal is to improve upon the original visualization and stick to the original data fields.  I may have overlooked that guideline this week in favor of adding a little more context.

When I first approached the data set and dropped it into Tableau, the first thing I immediately noticed was that Q4 always has a dip compared to the other quarters of the year.

This view contradicted all of my existing knowledge of how iPhone releases work.  Typically every year Apple holds a conference around the middle/end of September announcing the “new” iPhone.  That can either be the gap increase (off year, aka the S) or the new generation.  It lines up such that pre-sales and sales come in the weeks shortly following.  And in addition to that I would suspect that sales would stay heightened throughout the holiday season.

This is where I immediately went back to the data to challenge it and I noticed that Apple defines its fiscal year differently.  Specifically October to December (of the previous year) counts as Q1 of the current year.  Essentially Q1 of 2017 is actually 10/1/16 to 12/31/16.  Meaning that in the normalized world thinking about quarters, everything should be adjusted.

Now I was starting to feel much better about how things were looking.  It aligned with my real world expectations.

I still couldn’t help but feel that a significant portion of the story was missing.  In my mind it wasn’t fair to only look at iPhone sales over time without understanding more data points of the smartphone market.  I narrowed it down to overall sales of smartphones and number of smartphone users.  The idea I had was this: have we reached a point where the number of smartphone users is now a majority?  Essentially the Adoption Curve came to my mind – maybe we’ve hit that sweet spot where the Late Majority is now getting in on smartphones.

To validate the theory and keep things simple, I did quick searches for data sets I could bring into the view.  As if through serendipity, the two additional sources I stumbled upon came from the same as the original (statistica.com).  I went ahead and added them into my data set and got to work.

My initial idea was this: line plot of iPhone sales vs. overall smartphone sales.  See if directionality was the same.  Place a smaller graph of smartphone users to the side (mainly because it was US only, couldn’t find a free global data set).  And the last viz was going to be a combination of the 3 showing basic “growth” change.  That in my mind would in a very basic way display an answer to my questioning.

I went through a couple of iterations and finally landed on the view below as my final.

I think it sums up the thought process and answers the question I originally asked myself when I approached the data set.  And hopefully I can be pardoned (if even necessary) since the accompanying data added in merely enhanced information at hand and kept with the simplicity of data points available (units and time).

#WorkoutWednesday Week 1

Another great community activity is Workout Wednesday hosted by Andy Kriebel and Emma Whyte.  According to Andy it’s “designed to test your knoweldge of Tableau and help you kick on in your development.”  They’re alternating odd vs. even weeks.

Here’s the first task in a visual nutshell (using Superstore data set):

I’m happy to say that I was able to complete the task.  What was the most interesting part?  To get the dots on the single lines I ended up redoing a field that had a secondary table calculation and using some built in functions.  Those functions were RUNNING_SUM() and TOTAL().  The dots continued to be tricky, but I resolved to using AND logic within my IF statement and leveraging LOOKUP().

I also did a micro upgrade.  The instructions indicated that the red should highlight the “most current year.”  When interacting with the viz on the original blog, I noticed that only 2015 was red and the title was static.  So I added in logic to highlight the most recent year and added the dynamic change to the title as well.

Full viz on my Tableau Public page.

Makeover Monday 2017 – Week 1

It’s officially 2017 – the start of a new year.  As such, this is a great time for anyone in the Tableau universe to make a fresh commitment to participate in the community challenge known as Makeover Monday.

As I jump into this challenge, I’ve made the conscious decision to start with the things I already like doing and to add on each time.  This to me is the way that I’ll be able to stay actively involved and enthusiastic.  Essentially: keep it simple.

For this week’s data set it was obvious that something of a comparative nature needed to be applied.  I started off with a basic dot plot and went from there.

What I ended up with: a slope chart with the slope representing the delta in rank of income by gender, the size of the line representing the annual monetary difference in income, and 3 colors representing categorized multipliers on the wage gap.

I wanted this to be for a phone, so I held to the idea of a single viz.  Interactivity is really limited to tooltips, most other nuance comes from the presentation of the visualization itself.

And I pushed myself to add a little journalistic flare this week.  Not really my style, but I figured I would see where it took me.

Book Binge – December Edition

I typically spend the end of my year self reflecting on how things have gone – both the good and the bad.  Usually that leads me to this thoughtful place of “I need more books.”  For some reason to me books are instant inspiration and a great alternative to binge streaming.  They remind me of the people I want to be, the challenges I want to battle and conquer, and seamlessly entangle themselves into whatever it is I am currently experiencing.

Here are 3 of my binges this month:

First up: You are a Badass: How to Stop Doubting Your Greatness and Start Living Your Life by Jen Sincero

This is a really great read.  Despite the title being a little melodramatic (I don’t really believe that I’m not already a super badass, or that my greatness isn’t already infiltrating the world), Jen writes in a style that is very easy to understand.  She breaks down several “self help” concepts in an analytical fashion that reveals itself through words that actually make sense.  There’s a fair amount of brash language as well, something I appreciate in writing.

Backstory on this purchase:  I actually bought a copy of this book for me and 2 fellow data warriors.  I wanted it to serve as a reminder that we are badasses and can persevere in a world where we’re sometimes misunderstood.

To contradict all the positiveness I learned from Jen Sincero, I then purchased this guy: The Subtle Art of not Giving a F*ck by Mark Manson.  (Maybe there’s a theme here: I like books with profanity on the cover?)

Despite the title, it isn’t about how you can be indifferent to everything in the world – definitely not a guide on how to detach from everything going on.  Instead it’s a book designed to help you prioritize the important things, see suffering as a growth opportunity, and figure out what suffering you like to do on a repeated basis.  I’m still working my way through this one, but I appreciate some of the basic principles that we all need to hear.  Namely that the human condition IS to be in a constant state of solving problems and suffering and fixing, improving, overcoming.  That there is no finish line, and when you reach your goal you don’t achieve confetti and prizes (maybe you do), but instead you get a whole slew of new problems to battle.

Last book of the month is more data related.  It’s good old Tableau Your Data by Dan Murray + Interworks team.

I was inspired to buy this after I met Dan (way back in March of 2016).  I’ve had the book for several months, but wanted to give it a shout out for being my friend.  I’ve had some sticky challenges regarding Tableau Server this month and the language, organized layout, and approach to deployment have been the reinforcement (read as: validation) I’ve needed at times in an otherwise turbulent sea.

More realistically – I try to buy at least 1 book a month.  So I’m hoping to break in some good 2017 habits of doing small recaps on what I’ve read and the imprint new (or revisited) reads leave behind.

The Float Plot

One of the more interesting aspects of data visualization is how new visualization methods are created.  There are several substantial charts, graphs, and plots out there that visualization artists typically rely on.

As I’ve spent time reading more about data visualization, I started thinking about potential visualizations out there that could be added into the toolkit.  Here’s the first one that I’ve come up with: The Float Plot.

The idea behind the float plot is simple.  Plot one value that has some sort of range of good/acceptable/bad values and use color banding to display where it falls.  It works well with percentage values.

I’ve also made a version that incorporates peers.  Peers could be previous time period values or they could be less important categories.  The version with peers reminds me somewhat of a dot plot, but I particularly appreciate the difference in size to distinguish the important data point.

What’s also great about the Float Plot is that it doesn’t have to take up much space.  It looks great scaled short vertically or narrow horizontally.

Enjoy the visualization on my Tableau public profile here.

Statistical Process Control Charts

I’ve had this idea for a while now – create a blog post and video tutorial discussing what Statistical Process Control is and how to use different Control Chart “tests” in Tableau.

I’ve spent a significant portion of my professional career in business process improvement and always like it when I can integrate techniques learned from a discipline derived from industrial engineering and apply it in a broader sense.

It also gives me a great chance to brush up on my knowledge and learn how to order my thoughts for presenting to a wide audience.  And let’s not forget: an opportunity to showcase data visualization and Tableau as the delivery mechanism of these insights to my end users.

So why Statistical Process Control?  Well it’s a great way to use the data you have and apply different tests to start early detection.  Several of the rules out there are aimed at finding “out-of-control,” non-normal, or repetitive parts within a stream of data.  Different rules have been developed based on how we might be able to detect them.

The video tutorial above goes through the first 3 Western Electric rules.  Full details on Western Electric via Wikipedia: here.

Rule 1: Very basic, uses the principle of a bell curve to put a spotlight on points that are above or below the Upper Control Limit (UCL) or Lower Control Limit (LCL) also known as +/- 3 standard deviations from the mean.  These are essentially outlier data points that don’t fall within our typical span of 99.7%.

Rule 2: Takes into consideration surrounding observations.  Looking at 3 consecutive observations are 2 out of 3 above or below the 2 SD mark from the average.  In this rule the observations must be on the same side of the average line when beyond 2 SD.  Since we’re at 95% at 2 SD, having 2 out of 3 in a set in that range could signal an issue.

Rule 3: Starts to consider even more data points within a collection of observations.  In this scenario we’re now looking for 4 out of 5 observations +/- 1 SD from the average.  Again, we’re retaining the positioning above/below the average line throughout the 5 points.  This one really shows the emergence of a trend.

I applied the first 3 rules to my own calorie data to see detect any potential issues.  It’s very interesting to see the results.  For my own particular data set, Rule 3 was of significant value.  Having it in line as the new daily data funnels in could prevent me from going on a “streak” of either over or under consuming.

 

Interact with the full version on my Tableau Public profile here.