#WorkoutWednesday Week 23 – American National Parks

I’m now back in full force from an amazing analytics experience at the Alteryx Inspire conference in Las Vegas.  The week was packed with learning, inspiration, and community – things I adore and am honored to be a part of.  Despite the awesome nature of the event, I have to admit I’m happy to be home and keeping up with my workout routine.

So here goes the “how” of this week’s Workout Wednesday week 23.  Specifications and backstory can be found on Andy’s blog here.

Here’s a picture of my final product and my general assessment of what would be required for approach:

Things you can see from the static image that will be required –

  • Y axis grid lines are on specific demarcations with ordinal indicators
  • X-axis also has specific years marked
  • Colors are for specific parks
  • Bump chart of parks is fairly straight forward, will require index() calculation
  • Labels are only on colored lines – tricky

Now here’s the animated version showing how interactivity works

  • Highlight box has specific actions
    • When ‘none’ is selected, defaults to static image
    • When park of specific color is selected, only that park has different coloration and it is labeled
    • When park of unspecified color is selected, only that park has different coloration (black) and it is labeled

Getting started is the easy part here – building the bump chart.  Based on the data set and instructions it’s important to recognize that this is limited to parks of type ‘National Historical Park’ and ‘National Park.’  Here’s the basic bump chart setup:

and the custom sort for the table calculation:

Describing this is pretty straight for – index (rank) each park by the descending sum of recreation visitors every year.  Once you’ve got that setup, flipping the Y-axis to reversed will get you to the basic layout you’re trying to achieve.

Now – the grid lines and the y-axis header.  Perhaps I’ve been at this game too long, but anytime I notice custom grid lines I immediately think of reference lines.  Adding constant reference lines gives ultimate flexibility in what they’re labelled with and how they’re displayed.  So each of the rank grid lines are reference lines.  You can add the ‘Rank’ header to the axis by creating an ad-hoc calculation of a text string called ‘Rank.’  A quick note on this: if you add dimensions and measures to your sheet be prepared to double check and modify your table calculations.  Sometimes dimensions get incorporated when it wasn’t intended.

Now on to the most challenging part of this visualization: the coloration and labels.  I’ll start by saying there are probably several ways to complete this task and this represents my approach (not necessarily the most efficient one):

First up: making colors for specific parks called out:

(probably should have just used the Grouping functionality, but I’m a fast typer)

Then making a parameter to allow for highlighting:

(you’ll notice here that I had the right subset of parks, this is because I made the Park Type a data source filter and later an extract filter – thus removing them from the domain)

Once the parameter is made, build in functionality for that:

And then I set a calculation to dynamically flip between the two calculations depending on what the parameter was set to.

Looking back on this: I didn’t need the third calculation, it’s exactly the same functionality as the second one.  In fact as I write this, I tested it using the second calculation only and it functions just fine.  I think the over-build speaks to my thought process.

  1. First let’s isolate and color the specific parks
  2. Let’s make all the others a certain color
  3. Adding in the parameter functionality, I need the colors to be there if it is set to ‘(None)’
  4. Otherwise I need it to be black
  5. And just for kicks, let’s ensure that when the parameter is set to ‘(None)’ that I really want it to be the colors I’ve specified in the first calc
  6. Otherwise I want the functionality to follow calc 2

Here’s the last bit of logic to get the labels on the lines.  Essentially I know we’re going to want to label the end point and because of functionality I’m going to have to require all labels to be visible and determine which ones actually have values for the label.  PS: I’m really happy to use that match color functionality on this viz.

And the label setting:

That wraps up the build for this week’s workout with the last components being to add in additional components to the tooltip and to stylize.  A great workout that demonstrates the compelling nature of interactive visualization and the always compelling bump chart.

Interact with the full visualization here on my Tableau Public.

Alteryx Inspire – Day 1

When I went to the Tableau Conference last year, I felt it was important to spend some time documenting my experience.  Anytime I go to a conference related to my professional aspirations I’m always taken by the wealth of knowledge that’s uncovered.

The Alteryx Inspire conference is a pared down conference with about 2,000 attendees.  It is comfortably housed in the Aria hotel across 2 spacious and open floors.  There are escalators that split between level 3 and level 1 – there’s nice flow to it and plenty of natural light.  Events take place over three days: Monday, Tuesday, and Wednesday.  Monday is mostly a product training day and the bulk of sessions are the remainder of the week.  Opening keynote is Tuesday.

This year – being my first – I was extremely fortunate to be able to attend and to do the product training track.  This gives me a firsthand opportunity to see how the product company sells and trains on its tool.  Facilitators are typically great at selling the ‘why’ and ‘how’ behind something.

Today I sat for a full day going through the introduction to Alteryx Designer.  Not because it was my first time using the tool, but because I believe there’s something very powerful about origin stories.  There’s something you learn in the first 30 minutes that someone who doesn’t have the ‘formal training’ may never pick up.  That happened for me today and it was great to see everything in action.

As an advocate for data-informed decision making the tool is indispensable.  Just by listening to the 100+ in my classroom, it’s scary to witness firsthand the youth that exists with businesses accessing data.  Yes, there have been really great strides, but so many people are just at the beginning.  I chuckle when I hear the typical ‘Excel’ analogies, but the overwhelming majority are nodding with how much they relate to the joke.

I’ve always seen Alteryx as a natural companion for a data analyst.  For anyone out there trying to manage data it offers up a solution.  If only for the single act of being able to see a visual output of the thought process and work that went in to producing a data model.  A data model or report that can be shared, saved, printed (please don’t print), and most importantly: be communicated.  For someone doing data prep, blending, gathering – this is how you explain to your boss what you do.  This is the demonstration of what it takes to be the data wrangler.  This is how you share your critical thinking skills.

I’ve just scratched the surface and have 2 more full days of Alteryx.  One that has already been peppered with amazing collaboration opportunities and sharing of enthusiasm.  The vibe is chill, the people are great, and the mission is achievable.

Tomorrow is another day and an opportunity to take the building blocks and dream of skyscrapers.

#MakeoverMonday Week 22 – Internet Usage by Country

This week’s data set demonstrates the number of users per 100 people by country spanning several years.  The original data set and accompanying visualization starts as an interactive map with the ability to animate through the changing values year by year.  Additionally, the interactor can click into a country to see percentage changes or the comparative changes with multiple countries.

Channeling my inner Hans Rosling – I was drawn to play through the animation of the change by year, starting with 1960.  What sort of narrative could I see play out?

Perhaps it was the developer inside of me, but I couldn’t get over the color legend.  For the first 30 years (1960 to 1989) there’s only a few data points, all signifying zero.  Why?  Does this mean that those few countries actually measured this value in those years, or is it just bad data?  Moving past the first 30 years, my mind was starting to try and resolve the rest of usage changes.  However – here again my mind was hurt by the coloration.  The color legend shifts from year to year.  There’s always a green, greenish yellow, yellow, orange, and red.  How am I to ascertain growth or change when I must constantly refer to the legend?  Sure there’s something to say about comparing country to country, but it loses alignment once you start paginating through the years.

Moving past my general take on the visualization – there were certain things I picked up on and wanted to carry forward on my makeover.  The first was the value out of 100 people.  Because I noticed that the color legend was increasing year to year, this meant that overall number of users was increasing.  Similarly, when thinking about comparing the countries, coloration changed, meaning ranks were changing.

I’ll tell you – my mind was originally drawn to the idea of 3 slope charts sitting next to each other.  One representing the first 5 years, then next 5 years, and so on.  Each country as a line.  Well that wasn’t really possible because the data has 1990 to 2000 as the first set of years – so I went down the path of the first 10 years.  It doesn’t tell me much other than something somewhat obvious: internet usage exploded from 1990 to 2000.

Here’s how the full set would have maybe played out:

This is perhaps a bit more interesting, but my mind doesn’t like the 10 year gap between 1990 and 2000, five year gaps from 2000 to 2010, and then annual measurements from 2010 to 2015 (that I didn’t include on this chart).  More to the point, it seems to me that 2000 may be a better starting measurement point.  And it created the inflection point of my narrative.

Looking at this chart – I went ahead and decided my narrative would be to understand not only how much more internet usage there is per country, but to also demonstrate how certain countries have grown throughout the time periods.  I limited the data set to the top 50 in 2015 to eliminate some of the data noise (there were 196 members in the country domain, when I cut it to 100 there were still some 0s in 2000).

To help demonstrate that usage was just overall more prolific, I developed a consistent dimension to block out number of users.  So as you read it – it goes from light gray to blue depending on the value.  The point being that as we get nearer in time, there’s more dark blue, no light gray.

And then I went the route of a bump chart to show how the ranks have changed.  Norway had been at the top of the charts, now it’s Iceland.  When you hover over the lines you can see what happened.  And in some cases it makes sense, a country is already dominating usage, increasing can only go so far.

But there are some amazing stories that can unfold in this data set: check out Andorra.  It went from #33 all the way up to #3.

You can take this visualization and step back into different years and benchmark each country on how prolific internet usage was during the time.  And do direct peer comparatives to boot.

This one deserves time spent focused on the interactivity of the visualization.  That’s part of the reason why it is so dense at first glance.  I’m intentionally trying to get the end user to see 3 things up front: overall internet usage in 2000 (by size and color encoding) and the starting rank of countries, the overall global increase in internet usage (demonstrated by coloration change over the spans), and then who the current usage leader is.

Take some time to play with the visualization here.

Workout Wednesday Week 21 – Part 1 (My approach to existing structure)

This week’s Workout Wednesday had us taking NCAA data and developing a single chart that showed the cumulative progression of a basketball game.  More specifically a line chart where the X axis is countdown of time and the Y axis is current score.  There’s some additional detail in the form of the size of each dot representing 1, 2, or 3 points.  (see cover photo)

Here’s what the underlying data set looks like:

Comparing the data structure to the image and what needs to be produced my brain started to hurt.  Some things I noticed right away:

  • Teams are in separate columns
  • Score is consolidated into one column and only displayed when it changes
  • Time amount is in 20 minute increments and resets each half
  • Flavor text (detail) is in separate columns (the team columns)
  • Event ID restarts each half, seriously.

My mind doesn’t like that there’s a team dimension that’s not in the same column.  It doesn’t like the restarting time either.  It really doesn’t like the way the score is done.  These aren’t numbers I can aggregate together, they are raw outputs that are in a string format.

Nonetheless, my goal for the Workout was to take what I had in that structure and see if I could make the viz.  What I don’t know is this: did Andy do it the same way?

My approach:

First I needed to get the X axis working.  I’ve done a good bit of work with time so I knew a few things needed to happen.  The first part was to convert what was in MM:SS to seconds.  I did this in my mind to change the data to a continuous axis that I could format into MM:SS format.  Here’s the calculation:

I cheated and didn’t write my calculated field for longevity.  I saw that there was a dropped digit in the data and compensated by breaking it up into two parts.  Probably a more holistic way to do this would be to say if it is of length 4 then append a 0 to the string and then go about the same process.  Here’s the described results showing the domain:

Validation check: the time goes from 0 to 20 minutes (0 to 20*60 seconds aka 1200 seconds).  We’re good.

Next I needed to format that time into MM:SS continuous format.  I took that calculation from Jonathan Drummey.  I’ve used this more than once, so my google search is appropriately ‘Jonathan Drummey time formatting.’  So the resultant time ‘measure’ was almost there, but I wasn’t taking into consideration the +20 minutes for the first half and that the time axis was full game duration.  So here’s the two calculations that I made (first is +20 mins, then the formatting):

At this point I felt like I was kind of getting somewhere – almost to the point of making the line chart, but I needed to break apart the teams.  For that bit I leveraged the fact that the individual team fields only have details in them when that team scores.  Here’s the calc:

I still don’t have a lot going on – at best I have a dot plot where I can draw out the event ID and start plotting the individual points.

So to get the score was relatively easy.  I also did this in a custom to the data set kind of way with 3 calculations – find the left score, find the right score, then tag the scores to the teams.

Throwing that on rows, here’s the viz:

All the events are out of order and this is really difficult to understand.  To get closer to the view I did a few things all at once:

  • Reverse the time axis
  • Add Sum of the Team Score to the path
  • Put a combined half + event field on detail (since event restarts per half)

Also – I tried Event & Half separately and my lines weren’t connected (broken at half time; so creating a derived combined field proved useful at connecting the line for me)

Here’s that viz:

It’s looking really good.  Next steps are to get the dots to represent the ball sizes.

One of my last calculations:

That got dropped on size on a duplicated and synchronized “Team Score.”  To get the pesky null to not display from the legend was a simple right click and ‘hide.’  I also had to sort the Ball Size dimensions to align with the perceived sizing.  Also the line size was made super skinny.

Now some cool things happened because of how I did this:  I could leverage the right and left scores for tooltips.  I could also leverage them in the titling of the overall scores UNC = {MAX([LeftScore]}.

Probably the last component was counting the number of baskets (within the scope of making it a single returned value in a title per the specs of the ask).  Those were repeated LODs:

And thankfully the final component of the over sized scores on the last marks could be accomplished by the ‘Always Show’ option.

Now I profess this may not be the most efficient way to develop the result, heck here’s what my final sheet looks like:

All that being said: I definitely accomplished the task.

In Part 2 of this series, I’ll be dissecting how Andy approached it.  We obviously did something different because it seems like he may have used the Attribute function (saw some * in tooltips).  My final viz has all data points and no asterisks ex: 22:03 remaining UNC.  Looking at that part, mine has each individual point and the score at each instantaneous spot, his drops the score.  Could it be that he tiptoed around the data structure in a very different way?

I encourage you to download the workbook and review what I did via Tableau Public.

 

#MakeoverMonday Week 21 – Are Britons Drinking Less?

After some botched attempts at reestablishing routine, #MakeoverMonday week 21 got made within the time-boxed week!  I have one pending makeover and an in-progress blog post to talk about Viz Club and the 4 developed during that special time.  But for now, a quick recap of the how and why behind this week’s viz.

This week’s data set was straightforward – aggregated measures sliced by a few dimensions.  And to what I believe is now becoming an obvious trend on how data is published, it included both aggregated and lower dimensions within the same field (read this as “men,” “women,” “all people”).  The structured side of my doesn’t like it and screams for me to exclude from any visualizations, but this week I figured I’d take a different approach.

The key questions asked related to alcohol consumption frequency by different age and gender combinations (plus those aggregates) – so there was lots of opportunity to compare within those dimensions.  More to that, the original question and how the data was presented begged to rephrase into what became the more direct title (Are Britons Drinking Less?)

The question really informed the visualizations – and more to that point, the phrasing of the original article seemed to dictate to me that this was a “falling measure.”  Meaning it has been declining for years or year-to-year, or now compared to then – you get the idea.

With it being a falling measure and already in percentages, this made the concept of using an “difference from first” table calculation a natural progression.  When using the calculation the first year of the measure would be anchored at zero and subsequent years would be compared to it.  Essentially asking and answering for every year “was it more or less than the first year we asked?”  Here’s the beautiful small multiple:

Here the demographics are set to color, lightest blue being youngest to darkest blue being oldest; red is the ‘all.’  I actually really enjoyed being able to toss the red on there for a comparison and it is really nice to see the natural over/under of the age groups (which mathematically follows if they’re aggregates of the different groups).

One thing I did to add further emphasis was to put positive deltas on size – that is to say to over emphasize (in a very subdued, probably only Ann appreciates the humor behind it way) when it is anti-the trend.  Or more directly stated: draw the readers attention to different points where the percentage response has increased.

Here’s the resultant:

So older demographics are drinking more than they used to and that’s fueled by women.  This becomes more obvious to the point of the original article when looking at the Teetotal groups and seeing many more fat lines.

Here’s the calculation to create the line sizing:

Last up was to make one more view to help sell the message.  I figured a dot plot would mimic champagne bubbles in a very abstract way.  And I also thought open/closed circles in combination with the color encoding would be pleasant for the readers.  Last custom change there was to flip the vertical axis of time to be in reverse.  Time is read top down and you can see it start to push down to the left in some of the different groupings.

If you go the full distance and interact with the dashboard, the last thing I hope you’ll notice and appreciate is the color legend/filter bar at the top.  I hate color legends because they lack utility.  Adding in a treemap version of a legend that does double duty as highlight buttons is my happy medium (and only when I feel like color encoding is not actively communicated enough).

The value of Viz Club

I’ve been very interested in pursuing a concept I originally saw in my twitter feed, a picture of Chris Love at a pub with a blurb about ‘Viz Club.’  Following it further, I want to say that there were some additional details about the logistics: but the concept was simple. A few people get together and collaborate on data viz.

Fast forward to the past 8 months and this has been a concept that has wracked my brain. The Phoenix TUG has space once a month on a Saturday. This had originally started as a “workshop” time slot, but as things evolved it didn’t feel quite right.  There was an overwhelming expectation to have curriculum and a survey of how effective the instruction was. Now let me say this: there is absolutely a time and place for learning Tableau and I am all about empowering individuals with the skills to use Tableau.  However, a voluntary community workshop should not be a surrogate for corporate sponsored training and isn’t conducive to building out a passion and enthusiasm filled community.

Anyway – the workshops weren’t right. They weren’t attracting the right types of people and more importantly it was isolating folks who were more advanced in their journey.  Essentially it needed a format change, one that could be accepting of all stages of the journey and one that attracted the passion.

Thus the idea kept echoing in my mind: Viz Club.

So coming out of April, the leap was made. I’ve been facilitating a series called “Get Tableau Fit” where I reconstruct one or two Workout Wednesday challenges during the first half of the TUG.  People were starting to see the value in the exercises (crazy that it took them so long!) and were getting more and more interested in trying them.  Interested, but still with a small amount of trepidation to participate. The final barrier for folks to participate?  Perhaps it was finding structured time.  Being officially assigned a task and a time to do the task.

Hence Viz Club.

So we held our first club session on Saturday, and I have to say it was very successful. I was very curious to the types of people who would show up, what their problems would be, and how the flow would play out. I couldn’t have been happier with the results.

From a personal productivity standpoint, it was well worth the time investment.  There’s something about being in a collaborative and open environment that eliminates potential mental blocks.  Within the first hour of Viz Club I had done one of my catch-up #MakeoverMonday vizzes.

More important than my productivity was the productivity in the room.  Throughout the day questions were being asked and answered.  One specific community member took a concept all the way through to a dashboard – significant breakthroughs within the time box of our 5 hours together.  I know everyone involved felt the power of the time we spent together and it’s something we’re going to continue.

Dealing with data density

Recently I was on a project that involved working with data centered around one value. You can imagine the type: something where there is an intended value, or a maximum expected value. A good example from the Superstore data set may be something like “days between order date and ship date.”

Typically you’ll come in to a data set of this type and the first thing to do is try to survey it. Describe it in a visual format to set the foundation for your data story audience.  Traditionally you’d probably want to go for something like a histogram, but a histogram is going to look really pointless. I’m sure you can imagine it now, 95-99% of data centered on one INTEGER value, the rest potentially obscuring the scene due to poor data entry or capture. Not the best way to start your story.

What’s potentially worse than the density around a given number is the notion that you’re looking for something within the confines of a narrow range that may be worth investigating further. So in this world, an ugly histogram isn’t going to drive you down the path of further enlightenment or relevant question asking and answering.

To combat the curse of this type of data, I went on a mission to make some alternative visualizations. Those beyond the histogram and boxplot- my self admitted favorite tools to quickly understand data sets.

My first choice was to add jitter. Jittering is a great tool that helps to showcase the density of data. You’re breaking up a traditional box or dot plot where values are plotted at the same point and adding a position dimension to the open axis. It can be really helpful for opening conversations with those seeing their data for the first time. And even more powerful when you add on a dimension to show small multiples of the same measurement and how the numeric outputs are influenced by the segmentation of your chosen dimension.

After attempting jittering, I noticed that because of the nature of my data – the natural indexing that was applied got skewed by the unique key I was indexing on.  Essentially there was an opportunity for an unintended inference from the jitter.  This led me to do restart the index for every value, making a histogram next to a box plot.  The dots were stacked on each other, mimicking bars.  This was pretty effective.  The centered nature of the data could be easily felt.

Last move was to make something even more abstract – what you may think of as a violin plot.  How I attempted this was from the perspective of a dot plot on an X axis.  However each dot is the binned result of the data values, with the number of data points within that bin representing the dot’s size.  I added on some transparency for additional visual aid – I think this can either be done from a reinforce the size of the dot perspective, or if your X axis has a natural good vs. bad association.  Harping back on the example of turn around time, you could envision that if you went from in control to out of control how powerful it could be.  I also added on different percentile lines to help affix the data spread.

Consider this post a draft in progress that will be updated with visuals.

March & April Combined Book Binge

Time for another recount of the content I’ve been consuming.  I missed my March post, so I figured it would be fine to do a combined effort.

First up:

The Icarus Deception by Seth Godin

In my last post I mentioned that I got a recommendation to tune in to Seth and got the opportunity to hear him firsthand on Design Matters.  Well, here’s the first full Seth book I’ve consumed and it didn’t disappoint.  If I had to describe what this book contains – I would say that it is a near manifesto for the modern artist.  The world is run by industrialists and the artist is trying to break through.

I appreciate how Seth frames the concept of an artist – he unpacks the term and invites or ENCOURAGES everyone to identify as such.  Being an artist means being emotionally invested, showing up, giving a shit.  That giving a shit, caring, connecting is ALL there is.  That you succeed in the world by connecting, by sharing your art.  These concepts and ideals resonate deeply with me.  He also explains how vulnerable and gutting it can be to live as an artist – something I’ve felt and experienced several times.

During the course of listening to this book I was on site with a client.  We got to a certain point, agreed on the direction and visualizations, then shared them with the broader team.  The broader team came heavy with design suggestions – most notable the green/red discussion came in to play.  I welcome these challenges and as an artist and communicator it is my responsibility to share my process, listen to feedback, and collaborate to find a solution.  That definitely occurred throughout the process, but honestly caused me to lose my balance for a moment.

As I reflected on what happened – I was drawn to this idea that as a designer I try to have ultimate empathy for the end user.  And furthermore the amount of care given to the end user is never fully realized by the casual interactor.  A melancholy realization, but one that should not be neglected or forgotten.

Moving on to the next book:

Rework by Jason Fried & David Heinemeier Hansson

This one landed in my lap because it was available while perusing through library books.

A quick read that talks about how to succeed in business.  It takes an extreme focus on being married to a vision and committing to it.  The authors focus on getting work done.  Sticking to a position and seeing it through.  I very much appreciated that they were PROUD of decisions they made for their products and company.  Active decisions NOT to do something can be more liberating and make someone more successful than being everything to everyone.

Last up was this guy:

Envisioning Information by Edward Tufte

A continuation of reading through all the Tufte books.  I am being lazy by saying “more of the same.”  Or “what I’ve come to expect.”  These are lazy terms, but they encapsulate what Tufte writes about: understanding visual displays of information.  Analyzing at a deep level the good, bad, and ugly of displays to get to the heart of how we can communicate through visuals.

I particularly loved some of the amazing train time tables displayed.  This concept of using lines to represent timing of different routes was amazing to see.  And the way color is explored and leveraged is on another level.  I highly recommend this one if the thought of verbalizing your witnessing of Tufte’s strong tongue-in-cheek style sounds entertaining.  I know for me it was.

#MakeoverMonday Week 18

{witty intro}  This week’s makeover challenge was to take Sydney ferry data for 7 ferry lines and 8 months.  What’s even better is there was another dimension with a domain of 9 members.  This is a dream data set.  I say it’s a dream from the perspective of having two dimensions that can be manipulated and managed (no deciding HOW they have to be reduced or further grouped) and there’s decent data volume with each one.

In the world of visualization, I think this is a great starter data set.  And it was fun for me because I could focus on some of the design rather than deciding on a deep analytical angle.  Plus in the spirit of the original, my approach was to redo the output of “who’s riding the ferries” and make it more accessible.

So the lowdown: first decision made was the color palette.  The ferry route map had a lot of greens in it.  And obviously a lot of blues because of water.

So I wanted to take that idea and take it one step further.  That landed to me a world of deep blues and greens – using the darkest blue/green throughout typically represent the “most” of something.

These colors informed most decisions that came afterward.  I really wanted to stick to small multiples on this one, just by the sheer line up of the two medium/small domained dimensions.  Unfortunately – nothing of that nature turned out very interesting.  Here’s an example:

Like it’s okay and somewhat interesting – especially giving each row the opportunity to have a different axis range.  But you can see the “problem” immediately, there’s a few routes that are pretty flat and further to that, end users are likely going to be frustrated by the independent axis when they dive deeper to compare.

Pivoting from that point led me to the conclusion that the dimensions shouldn’t necessarily be shown together, but instead show one within the other.  But – worth noting, in the small multiple above you can see that the ‘Adult’ fare is just the most everywhere all the time.  Which led to this guy:

Where the bars are overall and the dots are Adult fares.  I felt that representing them in this context could free up the other dwarfed fare types to play with the data.

Last step from my end was to highlight those fare types and add a little whimsy.  I knew switching to % of total would be ideal because of the trip amounts for each route.  Interpret this as: normalizing to proportions gave opportunity to compare the routes.

I actually landed on the area chart by accident – I was stuck with lines, did my typical CTRL + drag of same pill to try and do some fun dual axis… and Tableau decided to automatically build me an area chart.

The original view of this was obviously not as attractive and I’ve done a few things to enhance how this displays.  The main thing was to eliminate the adult fare from the view visually.  We KNOW it’s the most, let’s move on.  Next was to stretch out the data a bit to see what’s going on in the remaining 30%-ish of rides. (Nerd moment: look at what I titled the sheet.)

Finishing up – there’s some label magic to show only those that are non-adult.  I also RETAINED the axis labels – I am hoping this helps to demonstrate and draw attention to the tagged axis at 50%.  What’s probably the most fun about this viz – you can hover over that same blue space and see the adult contribution – no data lost.

Overall I’m happy with the final effect.  A visually attractive display of data that hopefully invites users into deeper exploration.  Smaller dimension members given a chance to shine, and some straightforward questions asked and answered.

#MakeoverMonday Week 17

After a bit of life prioritization, I’m back in full force on a mission to contribute to Makeover Monday.  To that end, I’m super thrilled to share that I’ve completed my MBA.  I’ve always been an individual destined not to settle for one higher education degree, so having that box checked has felt amazing.

Now on to the Makeover!  This week’s data set was extra special because it was published on the Tableau blog – essentially more incentive to participate and contribute (there’s plenty of innate incentive IMO).

The data was courtesy of LinkedIn and represented 3 years worth of “top skills.”  Here’s my best snapshot of the data: 

 

This almost perfectly describes the data set, without the added bonus of there being a ‘Global’ skills in the Country dimension as well.  Mixing aggregations or concepts of what people believe can be aggregated, I sighed just a little bit.  I also sighed at seeing some countries are missing 2014 skills and 2016 is truncated to 10 skills each.

So the limitations of the data set meant that there had to be some clever dealing to get around this.  My approach was to take it from a 2016 perspective.  And furthermore to “look back” to 2014 whenever there was any sort of comparison.    I made the decision to eliminate “Global” and any countries without 2014 from the data set.  I find that the data lends itself best to comparison within a given country (my perspective) – so eliminating countries was something I could rationalize.

Probably the only visualization I really cared about was a slope chart.  I thought this would be a good representation of how a skill has gotten hotter (or not).  Here’s that:

Some things I did to jazz it up a bit.  Added a simple boolean expression to color to denote if the rank has improved since 2014.  Added on reference lines for the years to anchor the lines.  I’ve done slope charts different ways, but this one somehow evolved into this approach.  Here’s what the sheet looks like:

Walking through it, starting with the filter shelf.  I’ve got an Action filter on country (based on action filter buttons elsewhere on the dashboard).  Year has been added to context and 2015 eliminated.  Datasource filtered out the countries without 2014 data & global.  Skill is filtered to an LOD for 2016 Rank <> 0.  This ensures I’m only using 2016 skills.  The context filters keep everything looking pretty for the countries.

The year lines are reference lines – all headers are hidden.  There’s a dual axis on rows to have line chart & circle chart.  The second Year in columns is redundant and leftover from an abandoned labeling attempt (but adds nice dual labels automatically to my reference lines).

Just as a note – I made the 2016 LOD with a 2014 LOD to do some cute math for line size – I didn’t like it so abandoned.

Last steps were to add additional context to the “value” of 2016 skills.  So a quick unit chart and word cloud.  One thing I like to do on my word clouds these days is square the values on size.  I find that this makes the visual indicator for size easier to understand.  What’s great about this is that smaller rank is better, so instead of “^2” it became this:

Sometimes math just does you a real solid.

The kicker of this entire data set for me and gained knowledge: Statistical Analysis and Data Mining are hot!  Super hot!  Also really like that User Interface Design and Algorithm Design made it to the top 10 for the United States.  I would tell anyone that a huge component of my job is designing analytical outputs for all types of users and that requires an amount of UX design.  And coincidentally I’m making an algorithm to determine how to eliminate a backlog, all in Tableau.  (basic linear equation)