Makeover Monday Week 8 – Potatoes in the EU

I’ll say this first – I don’t eat potatoes.  Although potatoes are super tasty, I refuse to have them as part of my diet.  So I was less than thrilled about approaching a week that was pure potato (especially coming off the joy of Valentine’s Day).  Nonetheless – it presented itself with a perfect opportunity for growth and skill testing.  Essentially, if I could make a viz I loved about a vegetable I hate – that would speak to my ability to interpret varying data sets and build out displays.

I’m very pleased with the end results.  I think it has a very Stephen Few-esque approach.  Several small multiples with high and low denoted, color playing throughout as a dual encoder.  And there’s even visual interest in how the data was sorted for data shape.

So how did I arrive there?  It started with the bar chart of annual yield.  I had an idea on color scheme and knew that I wanted to make it more than gray.

This gave perfect opportunity to highlight the minimum and maximum yields.  To see what years different countries production was affected by things like weather and climate.  It’s actually very interesting to see that not too many of the dark bars (max value) are in more recent times.  Seems like agricultural innovation is keeping pace with climate issues.

After that I was hooked on this idea of sets of 3.  So I knew I wanted to replicate a small multiple in a different way using the same sort order.  That’s where Total Yield came in.  I’ve been pondering this one in the shower on the legitimacy of adding up annual ratios for an overall yield.  My brain says it’s fine because the size of the country doesn’t change.  But my vulnerable brain part says that someone may take issue with it.  I’d love for a potato farming expert to come along and tell me if that’s a silly thing to add up.  I see the value in doing a straight total comparison of the years.  Because although there’s fluctuation in the yield annually, we have a normalized way to show how much each countries produces irrespective of total land size.

Next was the dot plot of the current year.  This actually started out its life as a KPI indicator of up or down from previous, but it was too much for the visual.  I felt the idea of the dot plot of current year would do more justice to “right now” understanding.  Especially because you can do some additional visual comparison to its flanks and see more insight.

And then rinse/repeat for the right side.  This is really where things get super interesting.  The amount of variability in pricing for each country, both by average and current year.  Also – 2013 was a great year for potatoes.

Makeover Monday Week 6 – Chicago Taxis

This week’s data set presented itself with a new and unique challenge – 100 million plus records and a slight nod to #IronViz 2016.


Yep – Taxi data and lots of it, this time originating from Chicago.  The city of Chicago recently released 2013 to 2016 data on taxi rides and the kind folks at EXASOL took the opportunity to ingest the massive data set and make it available for #MakeoverMonday.  Synergy all around is felt: from EXASOL getting free visualizations, to data vizzers getting the opportunity to work in a data storage platform designed to make massive data vizzing possible, to understanding the unique challenges seen once you get such an explosively large data set into Tableau.

So how did I take on this challenge and what are some things I learned?  Well, first I had an inkling going in that this was going to be a difficult analysis.  I had some insight from an #IronViz contestant that maybe taxi data isn’t super interesting (and who would expect it to be).  There’s usual things we all likely see that mirror human events (spikes around holidays, data collected around popular locations, increases in $$), but beyond that it can be tough to approach this data set looking for a ‘wow’ moment.

To offset this, my approach was exploration.  Organize, separate, quantify, and educate the data interactor into the world of Chicago taxis.  Let’s use the size of the aggregations to understand what Chicago is like.  Essentially my end design was built to allow someone to safely interact with the data and get a sense for how things change based on several (strategically) predetermined dimensions.

The dimensions I chose to stick with for this visualization: focus on the community areas (pick up and drop off points), separate shorter trip lengths into 5 mile increments, and bucket time into 6 different chunks throughout the day.  (Admittedly the chunking of time was a rip off my own week 3 makeover where I used the same calculation to bucket time.)

Having the basic parameters of how the data was being organized, it was time to go about setting up the view.  I tried several combinations of both size, color, shape to facilitate a view I was committed to – using circles to represent data points for pickup locations by some time constraint.  I arrived at this as the final:


What I really like about it: red exposes the maximum for each year and ride length.  Demonstrates very quickly where “most rides” are.  There’s visual interest in the 10 to 15 mile range category.  There’s interest in the blank space – the void of data.  And this also makes a great facilitator of interactivity.  Consider each circle a button enticing you to click and use it to filter additional components on the right side.

Next were the burly maps.  Word of caution: do not use the unique identifier “trip id” and go plotting points.  You’ll get stuck quickly.  I’m not saying it is impossible, but I think to render the initial visualization it would take about an hour (based on my trying).  So instead of going to the trip id level, I took a few stabs at getting granular.  First step was, let’s just plot each individual day (using average lat/lon for pickup/dropoff spots).  This plotted well.  Then it was a continuation of adding in more and more detail to get something that sufficiently represented the data set, without crippling the entire work product.  I accomplished this by adding in both the company ID (a dimension of 121 members) and the accompanying areas (pickup/drop off area, 78 member dimension).  This created the final effect and should represent the data with a decent amount of precision.

Finally I went ahead and added in a few more visually interesting dot plots that highlighted maximums for specific areas.  This is ideally used in conjunction with the large table at left to start gathering more understanding.

I have to say – I am somewhat pleased with the result of this dashboard.  I’m not sure it will speak to everyone (thinking the massive left side table may turn folks off or have whispers of “wasted opportunity” to encode MORE information).  However – I am committed to the simplicity of the display.  It accomplishes for me something I was aiming to achieve – understanding several different geographical locales of Chicago and making something that could stand alone and provide “aha” moments.  In particular I’d like to know why Garfield Ridge overtook O’Hare for 10-15 mile fares in ’14 and ’15.  One for Google to answer.

Book Binge – January Edition

It’s time for another edition of book binge – a random category of blog posts devised (and now only on its second iteration) where I walk through the different books I’ve read and purchased this month.

First – a personal breakthrough!  I have always been an avid reader, but admittedly become lazy in recent years.  Instead of reading at least one book a month, I was going on small reading sprees of 2 or 3 books every four or five months.  After the success of my December reads, I figured I would keep things going and try to substitute books as entertainment whenever possible.

Here are a few books I read in January:

The Functional Art by Alberto Cairo

I picked this one up because it is quintessential to the world of data journalism and data visualization.  I also thought it would be great to get into the head of one of the instructors of a MOOC I’m taking.  Plus who can resist the draw of the slope chart on the cover?

I loved this one.  I like Alberto’s writing style.  It is rooted in logic and his use of text spacing and bold as emphasis is heavy on impact.  I appreciate that he says data visualization has to first be functional, but reminds us that it has to be seen to matter.  It’s also interesting to read the interviews/profiles in the end of the book of journalists.  This is an excellent way for me to shift my perspective and paradigm.  I come from the analysis/mathematical side of things – these folks are there to communicate stories of data.  A great read that is broken up in such a way that it is easy to digest.

Next book was The Visual Display of Quantitative Information by Edward Tufte

Obviously a classic read for anyone in the data visualization world by the “father” of modern information graphics.  I must admit I picked up all 4 of Tufte’s books in December, and couldn’t get my brain wrapped around them.  I was flipping through the pages to get a sense for how the information was contained and felt a little intimidated.  That intimidation was all in my head.  Once I began reading – the flow of information made perfect sense.

I appreciate Tufte’s voice and axiom type approach to information graphics.  Yes – there are times when it is snarky and absurd, but it is also full of purpose.  He walks through information graphics history, spotlighting many of the greats and lamenting the lack of recent progression (or more of a recession) in the art.

I have two favorites in this one: how he communicates small multiples and sparklines.  The verbiage used to describe the impact (and amount of information) small multiples can convey is poetic (and I don’t really like poetry).  His work on developing and demonstrating sparklines is truly illuminating.  There were times where I had dreams of putting together some of the high “data-ink” low “chartjunk” visuals that he described.  And his epilogue makes me forgive all the snarkyness.  The first in a series that I am ecstatic to continue to read.

The last book I’ll highlight this month was a short read – a Christmas present from a friend.

Together is Better by Simon Sinek

I’m very familiar with Simon – mostly because of his famous TED talk on starting with why. I’ve read his book on the subject as well. So I was delighted to be handed this tiny gem.  Written in hybrid format of children’s book and inspirational quote book – this is a good one to flip through if you’re in need of a quiet moment.  Simon calls himself a self professed optimist at the end, and that’s definitely how I left the book feeling.

It aims at sparking the inner fire we all have – and the most powerful moment: Simon saying that you don’t have to invent a new idea and then follow it.  It is perfectly acceptable to commit to someone else’s vision and follow them.  It frees you completely from the world of “special,” new, and different that entrepreneurial and ambitious types (myself) get hung up on.  You don’t have to make up an original idea – just find something that resonates deeply with you and latch on.  That is just as powerful as being a visionary.

The other part of this book devotes a significant amount of snippet takes on leadership.  A friendly reminder of what leadership looks like.  Leadership is not management.

I’ve got more books on the way and will be back in a month with three new reads to share.