#IronViz – Let’s Go on a Pokémon Safari!

It’s that time again – Iron Viz!  The second round of Iron Viz entered my world via an email with a very enticing “Iron Viz goes on Safari!” theme.  My mind immediately got stuck on one thing: Pokémon Safari Zone.

Growing up I was a huge gamer and Pokémon was (and still is) one of my favorites.  I even have a cat named after a Pokémon, it’s Starly (find her in the viz!).  So I knew if I was going to participate that the idea for Pokémon Safari was the only way to go.

I spent a lot of time thinking about how I might want to bring this to life.  Did I want to do a virtual safari of all the pocket monsters?  Did I want to focus on the journey of Ash Ketchum through the Safari Zone?  Did I want to focus on the video games?

After all the thoughts swirled through my mind – I settled on the idea of doing a long form re-creation of Ash Ketchum’s adventure through the Safari Zone in the anime.  I sat down and googled to figure out the episode number and go watch.  But to my surprise the episode has been banned.  It hasn’t made it on much TV and the reason it is banned makes it very unattractive and unfriendly for an Iron Viz long form.  I was gutted and had to set off on a different path.

The investment into the Safari Zone episode got me looking through the general details of the Safari Zone in the games.  And that’s what ended up being my hook.  I tend to think in a very structured format and because there were 4 regions that HAD Safari Zones (or what I’d consider to be the general spirit of one) it made it easy for me to compare each of them against each other.

Beyond that I knew I wanted to keep the spirit of the styling similar to the games.  My goal for the viz is to give the end user an understanding of the types of Pokémon in each game.  To show some basic details about each pocket monster, but to have users almost feel like they’re on the Safari.

There’s also this feeling I wanted to capture – for anyone who has played Pokémon you may know it.  It’s the shake of the tall grass.  It is the tug of the Fishing Pole.  It’s the screen transition.  In a nutshell: what Pokémon did I just encounter?  There is a lot of magic in that moment of tall grass shake and transition to ‘battle’ or ‘encounter’ screen.

My hope is that I captured that well with the treemaps.  You are walking through each individual area and encountering Pokémon.  For the seasoned Safari-goer, you’ll be more interested in knowing WHERE you should go and understanding WHAT you can find there.  Hence the corresponding visuals surrounding.

The last component of this visualization was the Hover interactivity.  I hope it translates well because I wanted the interactivity to be very fluid.  It isn’t a click and uncover – that’s too active.  I wanted this to be a very passive and openly interactive visualization where the user would unearth more through exploring and not have to click.

#WorkoutWednesday Week 24 – Math Musings

The Workout Wednesday for week 24 is a great way to represent where a result for a particular value falls with respect to a broader collection.  I’ve used a spine chart recently on a project where most data was centered around certain points and I wanted to show the range.  Propagating maximums, minimums, averages, quartiles, and (when appropriate) medians can help to profile data very effectively.

So I started off really enjoying where this visualization was going.  Also because the spine chart I made on a recent project was before I even knew the thing I developed had already been named.  (Sad on my part, I should read more!)

My enjoyment turned into caution really quickly once I saw the data set.  There are several ratios in the data set and very few counts/sums of things.  My math brain screams trap!  Especially when we start tiptoeing into the world of what we semantically call “average of all” or “overall average” or something that somehow represents a larger collective (“everybody”).  There is a lot of open-ended interpretation that goes into this particular calculation and when you’re working with pre-computed ratios it gets really tricky really quickly.

Here’s a picture of the underlying data set:

 

Some things to notice right away – the ratios for each response are pre-computed.  The number of responses is different for each institution.  (To simplify this view, I’m on one year and one question).

So the heart of the initial question is this: if I want to compare my results to the overall results, how would I do that?  Now there are probably 2 distinct camps here.  1: take the average of one of the columns and use that to represent the “overall average”.  Let’s be clear on what that is: it is the average pre-computed ratio of a survey.  It is NOT the observed percentage of all individuals surveyed.  That would be option 2: the weighted average.  For the weighted average or to calculate a representation of all respondents we could add up all the qualifying respondents answering ‘agree’ and divide it by the total respondents.

Now we all know this concept of average of an average vs. weighted average can cause issues.  Specifically we’d feel the friction immediately if there were several low-end responses commingled with several higher response capturing entities.  EX: Place A: 2 people out of 2 answered yes (100%) and  Place B: 5 out of 100 answered ‘yes’ (5%).  If we average 100% and 5% we’ll get 52.5%.  But what if we take 7 out of 102, that’s 6.86% – a way different number.  (Intentionally extreme example.)

So my math brain was convinced that the “overall average” or “ratio for all” should be inclusive of the weights of each Institution.  That was fairly easy to compensate for: take each ratio and multiply it by the number of respondents to get raw counts and then add those all back up together.

The next sort of messy thing to deal with was finding the minimums and maximums of these values.  It seems straightforward, but when reviewing the data set and the specifications of what is being displayed there’s caution to throw with regard to level of aggregation and how the data is filtered.  As an example, depending on how the ratios are leveraged, you could end up finding the minimum of 3 differently weighted subjects to a subject group.  You could also probably find the minimum Institution + subject result at the subject level of all the subjects within a group.  Again I think the best bet here is to tread cautiously over the ratios and get into raw counts as quickly as possible.

So what does this all mean?  To me it means tread carefully and ask clear questions about what people are trying to measure.  This is also where I will go the distance and include calculations in tool tips to help demonstrate what the values I am calculating represent.  Ratios are tricky and averaging them is even trickier.  There likely isn’t a perfect way to deal with them and it’s something we all witness consistently throughout our professional lives (how many of us have averaged a pre-computed average handle time?).

Beyond the math tangent – I want to reiterate how great a visualization I think this is.  I also want to highlight that because I went deep-end math on it that I decided to go deep end development different.

The main difference from the development perspective?  Instead of using reference bands, I used a gannt bar as the IQR.  I really like using the bar because it gives users an easier target to hover over.  It also reduce some of the noise of the default labeling that occurs with reference lines.  To create the gannt bar – simply compute the IQR as a calculated field and use it as the size.  You can select one of the percentile points to be the start of the mark.

#MakeoverMonday Week 25 | Maricopa County Ozone Readings

We had another giant data set this week – 202 million records of EPA Ozone readings across the United States.  The giant data set is generously hosted by Exasol.  I encourage you to register here to gain access to the data.

The heart of the data is pretty straight forward – PPM readings across several sites around the nation for the past 25+ years.  As I went through and browsed the data set, it’s easy to see that there are multiple readings per site per day.  Here’s the basic data model:

Parameter Name only has Ozone, Units of Measure only has Parts per million.  There is one little tweak to this data set – the Datum field.  Now this wasn’t a familiar term for me, so I described the domain to see what it had.

I know exactly what one of these 4 things means (beyond Unknown) – that’s WGS84.  I was literally at the Alteryx Inspire conference two weeks ago and in a Spatial Analytics session where people were talking about different standards for coordinate systems on Earth.  The facilitators mentioned that WGS84 was a main standard.  For fun I decided to plot the number of records for each Datum per year to see how the Lat/Lon have potentially changed in measurement over time.  Since 2012 it seems like WGS84 has dominated as the preferred standard.

So armed with that knowledge I sort of kept it in my back pocket of something I may need to be mindful of if I enter the world of mapping.

Beyond that, I had to start my focus on preparing something for Tableau Public.  202 million records unfortunately won’t sit on Public and I have to extract the data.  Naturally I did what every human would do and zeroed in on my city: Phoenix Metropolitan area aka Maricopa County.

So going through the data set there are multiple sites that are taking measurements.  And more than that, these sites are taking measurements multiple times per day.  I really wanted to express that somehow in my final visualization.  Here’s all the site averages plotted each day for the past 30 years – thanks Exasol!

So this is averaged per day per site – and you can see how much variation there is.  Some are reporting very low numbers, even zeros.  Some are very high.

If I take off the site ID, here’s what I get for the daily averages:

Notice the Y-axis – much less dramatic.  Now the EPA has the AQI measurements and it doesn’t even get into the “bad” range until 0.071 PPM (Unhealthy for Sensitive Groups).  So there’s less of a story to some extent when we take the averages.  This COULD be because of the sites in Maricopa county (maybe there are low or faulty numbers dragging down the average) or it could be because when you do the average you’re getting better precision of truth.

I’m going down this path because at this point I decided to make a decision: I wanted to look at the maximum daily measurement.  Given that these are instantaneous measurements, I felt that knowing the maximum measurement in a given day would also provide insight and value into how Ozone levels are faring.  And more specifically, knowing my region a little bit – the measurement sites could be outside of well populated areas and may naturally have lower occurring measurements.

So that was step one for me: move to the world of MAX.  This let me leverage all the site data and get going.  (Also originally I wanted to jitter and display all the sites because I thought that would be interesting – I distilled the data down further because I wasn’t getting what I wanted in terms of presentation in the end result).

Okay – next up was plotting the data.  I wanted to do a single page very dense data display that had all the years and the months and allowed for easy comparisons.  I had thought a cycle plot may be appropriate, but after trying a few combinations I didn’t see anything special about day of the week additions and noticed that the measurement really is about time of year (the month).  Secondary comparison being each year.

Now that I’ve covered that part – next up was how to plot.  Again, this originally started out its life as dots that were going to be color encoded using the AQI scale with PPM on the Y-axis.  And I almost published it that way.  But to be honest with you, I don’t know if the minutia of the PPM really matters that much.  I think that dimension defined on top of the measurement is easier for an end user to understand.  Hence my final development fork: turn the categorical result into a unit measure (1, 2, 3, 4 etc.) as a byproduct to represent height of a bar chart.  And that’s where I got really inspired.  I made “Good” -1 and “Moderate” 0.  That way anything positive on the Y-axis is a bad day.  To me this will allow you to see the streaks of bad throughout the time periods.

Close up of 2015 – I love this.  Look at those moderates just continuing the axis.  Look how clear the not so good to very bad is.  This resonates with me.

Okay – so final steps here were going to be to have a map of all the measurements at each site (again the max for each site based on the user clicking a day).  It was actually quite cute showing Phoenix more close up.  And then I was going to have national readings (max for each site upon clicking a day) as a comparison.  This would have been super awesome – here’s the picture:

So good.  And perhaps I could have kept this, but knowing I have to go to Tableau Public – it just isn’t going to handle the national data well.  So I sat on this for an evening and while I was driving to work I decided to do a marginal chart that showed the breakdown of number of days of each type.  The “why” was because it looks like things are getting better – more attention needs to be drawn to that!

So last steps ended up being to add on the marginal bar charts and then go one step further to isolate the “bad days” per year and have them be the final distilled metric at the far far right.  My thought process: scan each year, get an idea of performance, see it aggregated to the bar chart, then see the bad as a single number.  For sheer visual pleasure I decided to distill the “bad” further into one more chart.  I had a stacked bar chart to start, but didn’t like it.  I figured for the sake of artistry I could get away with the area chart and I really like the effect it brings.  You can see that the “very bad” days have become less prominent in recent years.

So that pretty much sums up the development process.  Here’s the full viz again and a comparison to the original output for Maricopa County, which echos the sentiment of my maximums – Ozone measurements are going down.

 

 

#MakeoverMonday Week 24 – The Watercolours of Tate

First – I apologize.  I did a lot of web editing this week that has led to a series of system fails.  The first was spelling the hashtag wrong.  Next I decided to re-upload the workbook and ruin the bit link.  What will be the next fail?

Anyway – to rectify the series of fails I decided that the best thing to do would be to create a blog post.  Blog posts merit new tweets and new links!

So week 24’s data was the Tate Collection, which upon click through of this link indicates it is a decent approximation of artwork housed at Tate.

Looking at the underlying data set, here’s the columns we get:

And the records:

So I started off decently excited about the fact that there were 2 URLs to leverage in the data set.  One with just a thumbnail image only and the other a full link to the asset.  However, the Tate website can’t be accessed via HTTPS, so it doesn’t work for on dashboard URLs on Tableau Public.  I guess Tableau wants us to be secure – and I respect that!

So my first idea of going the route of all float with an image in the background was out.

Now my next idea was to limit the data set.  I had originally thought to do the “Castles of Tate” – check out the number of titles:

A solid number: 2,791 works of art.  A great foundation for the underneath.  Except of course for what we knew to be true of the data: Turner.

Sigh – this bummed me out.  Apparently only Turner really likes to label works of art with “Castle.”  Same was true for River and Mountain.  Fortunately I was able to easily see that using the URL actions on Tableau Desktop (again can’t do that on Public because of security reasons):

Here is a classic Turner castle:

Now yes, it is artwork – but doesn’t necessarily evoke what I was looking to unearth in the Tate collection.

So I went another path, focusing on the medium.  There was a decent collection of watercolour (intentional European spelling).  And within that a few additional artist representations beyond our good friend Turner.

So this informed the rest of the visualization.  Lucky for me there was a decent amount of distribution date wise, both from a creation and acquisition standpoint.  This allowed me to do some really pretty things with binned time buckets.  And inspired by the Tate logo: I took a very abstract approach to the visualization this week.  The output is intentionally meant for data discovery.  I am not deriving insights for you, I’m building a view for you to explore.

One of my most favorite elements is the small multiples bubble chart.  This is not intended to aid in cognition, this is intended to be artwork of artwork.  I think that pretty much describes the entire visualization if I’m being honest.  Something that could stand alone as a picture perhaps or be drilled deep to the depths of going to each piece’s website and finding out more.

Some oddities with color I explored this week included: using an index and placing that on the color shelf with a diverging color palette (that’s what is coloring the bubble charts).  And also using modulo on the individual asset names to spark some fun visual encoding.  Better than all one color, I felt breaking up the values in a programmatic way would be fun and different.

Perhaps my most favorite of this is the top section with the bubble charts and bar charts below with the binned year ranges between.  Pure data art blots.

Here’s the full visualization on Tableau Public – I promise not to tinker further with the URLs.

#WorkoutWednesday Week 23 – American National Parks

I’m now back in full force from an amazing analytics experience at the Alteryx Inspire conference in Las Vegas.  The week was packed with learning, inspiration, and community – things I adore and am honored to be a part of.  Despite the awesome nature of the event, I have to admit I’m happy to be home and keeping up with my workout routine.

So here goes the “how” of this week’s Workout Wednesday week 23.  Specifications and backstory can be found on Andy’s blog here.

Here’s a picture of my final product and my general assessment of what would be required for approach:

Things you can see from the static image that will be required –

  • Y axis grid lines are on specific demarcations with ordinal indicators
  • X-axis also has specific years marked
  • Colors are for specific parks
  • Bump chart of parks is fairly straight forward, will require index() calculation
  • Labels are only on colored lines – tricky

Now here’s the animated version showing how interactivity works

  • Highlight box has specific actions
    • When ‘none’ is selected, defaults to static image
    • When park of specific color is selected, only that park has different coloration and it is labeled
    • When park of unspecified color is selected, only that park has different coloration (black) and it is labeled

Getting started is the easy part here – building the bump chart.  Based on the data set and instructions it’s important to recognize that this is limited to parks of type ‘National Historical Park’ and ‘National Park.’  Here’s the basic bump chart setup:

and the custom sort for the table calculation:

Describing this is pretty straight for – index (rank) each park by the descending sum of recreation visitors every year.  Once you’ve got that setup, flipping the Y-axis to reversed will get you to the basic layout you’re trying to achieve.

Now – the grid lines and the y-axis header.  Perhaps I’ve been at this game too long, but anytime I notice custom grid lines I immediately think of reference lines.  Adding constant reference lines gives ultimate flexibility in what they’re labelled with and how they’re displayed.  So each of the rank grid lines are reference lines.  You can add the ‘Rank’ header to the axis by creating an ad-hoc calculation of a text string called ‘Rank.’  A quick note on this: if you add dimensions and measures to your sheet be prepared to double check and modify your table calculations.  Sometimes dimensions get incorporated when it wasn’t intended.

Now on to the most challenging part of this visualization: the coloration and labels.  I’ll start by saying there are probably several ways to complete this task and this represents my approach (not necessarily the most efficient one):

First up: making colors for specific parks called out:

(probably should have just used the Grouping functionality, but I’m a fast typer)

Then making a parameter to allow for highlighting:

(you’ll notice here that I had the right subset of parks, this is because I made the Park Type a data source filter and later an extract filter – thus removing them from the domain)

Once the parameter is made, build in functionality for that:

And then I set a calculation to dynamically flip between the two calculations depending on what the parameter was set to.

Looking back on this: I didn’t need the third calculation, it’s exactly the same functionality as the second one.  In fact as I write this, I tested it using the second calculation only and it functions just fine.  I think the over-build speaks to my thought process.

  1. First let’s isolate and color the specific parks
  2. Let’s make all the others a certain color
  3. Adding in the parameter functionality, I need the colors to be there if it is set to ‘(None)’
  4. Otherwise I need it to be black
  5. And just for kicks, let’s ensure that when the parameter is set to ‘(None)’ that I really want it to be the colors I’ve specified in the first calc
  6. Otherwise I want the functionality to follow calc 2

Here’s the last bit of logic to get the labels on the lines.  Essentially I know we’re going to want to label the end point and because of functionality I’m going to have to require all labels to be visible and determine which ones actually have values for the label.  PS: I’m really happy to use that match color functionality on this viz.

And the label setting:

That wraps up the build for this week’s workout with the last components being to add in additional components to the tooltip and to stylize.  A great workout that demonstrates the compelling nature of interactive visualization and the always compelling bump chart.

Interact with the full visualization here on my Tableau Public.

Alteryx Inspire – Day 1

When I went to the Tableau Conference last year, I felt it was important to spend some time documenting my experience.  Anytime I go to a conference related to my professional aspirations I’m always taken by the wealth of knowledge that’s uncovered.

The Alteryx Inspire conference is a pared down conference with about 2,000 attendees.  It is comfortably housed in the Aria hotel across 2 spacious and open floors.  There are escalators that split between level 3 and level 1 – there’s nice flow to it and plenty of natural light.  Events take place over three days: Monday, Tuesday, and Wednesday.  Monday is mostly a product training day and the bulk of sessions are the remainder of the week.  Opening keynote is Tuesday.

This year – being my first – I was extremely fortunate to be able to attend and to do the product training track.  This gives me a firsthand opportunity to see how the product company sells and trains on its tool.  Facilitators are typically great at selling the ‘why’ and ‘how’ behind something.

Today I sat for a full day going through the introduction to Alteryx Designer.  Not because it was my first time using the tool, but because I believe there’s something very powerful about origin stories.  There’s something you learn in the first 30 minutes that someone who doesn’t have the ‘formal training’ may never pick up.  That happened for me today and it was great to see everything in action.

As an advocate for data-informed decision making the tool is indispensable.  Just by listening to the 100+ in my classroom, it’s scary to witness firsthand the youth that exists with businesses accessing data.  Yes, there have been really great strides, but so many people are just at the beginning.  I chuckle when I hear the typical ‘Excel’ analogies, but the overwhelming majority are nodding with how much they relate to the joke.

I’ve always seen Alteryx as a natural companion for a data analyst.  For anyone out there trying to manage data it offers up a solution.  If only for the single act of being able to see a visual output of the thought process and work that went in to producing a data model.  A data model or report that can be shared, saved, printed (please don’t print), and most importantly: be communicated.  For someone doing data prep, blending, gathering – this is how you explain to your boss what you do.  This is the demonstration of what it takes to be the data wrangler.  This is how you share your critical thinking skills.

I’ve just scratched the surface and have 2 more full days of Alteryx.  One that has already been peppered with amazing collaboration opportunities and sharing of enthusiasm.  The vibe is chill, the people are great, and the mission is achievable.

Tomorrow is another day and an opportunity to take the building blocks and dream of skyscrapers.

#MakeoverMonday Week 22 – Internet Usage by Country

This week’s data set demonstrates the number of users per 100 people by country spanning several years.  The original data set and accompanying visualization starts as an interactive map with the ability to animate through the changing values year by year.  Additionally, the interactor can click into a country to see percentage changes or the comparative changes with multiple countries.

Channeling my inner Hans Rosling – I was drawn to play through the animation of the change by year, starting with 1960.  What sort of narrative could I see play out?

Perhaps it was the developer inside of me, but I couldn’t get over the color legend.  For the first 30 years (1960 to 1989) there’s only a few data points, all signifying zero.  Why?  Does this mean that those few countries actually measured this value in those years, or is it just bad data?  Moving past the first 30 years, my mind was starting to try and resolve the rest of usage changes.  However – here again my mind was hurt by the coloration.  The color legend shifts from year to year.  There’s always a green, greenish yellow, yellow, orange, and red.  How am I to ascertain growth or change when I must constantly refer to the legend?  Sure there’s something to say about comparing country to country, but it loses alignment once you start paginating through the years.

Moving past my general take on the visualization – there were certain things I picked up on and wanted to carry forward on my makeover.  The first was the value out of 100 people.  Because I noticed that the color legend was increasing year to year, this meant that overall number of users was increasing.  Similarly, when thinking about comparing the countries, coloration changed, meaning ranks were changing.

I’ll tell you – my mind was originally drawn to the idea of 3 slope charts sitting next to each other.  One representing the first 5 years, then next 5 years, and so on.  Each country as a line.  Well that wasn’t really possible because the data has 1990 to 2000 as the first set of years – so I went down the path of the first 10 years.  It doesn’t tell me much other than something somewhat obvious: internet usage exploded from 1990 to 2000.

Here’s how the full set would have maybe played out:

This is perhaps a bit more interesting, but my mind doesn’t like the 10 year gap between 1990 and 2000, five year gaps from 2000 to 2010, and then annual measurements from 2010 to 2015 (that I didn’t include on this chart).  More to the point, it seems to me that 2000 may be a better starting measurement point.  And it created the inflection point of my narrative.

Looking at this chart – I went ahead and decided my narrative would be to understand not only how much more internet usage there is per country, but to also demonstrate how certain countries have grown throughout the time periods.  I limited the data set to the top 50 in 2015 to eliminate some of the data noise (there were 196 members in the country domain, when I cut it to 100 there were still some 0s in 2000).

To help demonstrate that usage was just overall more prolific, I developed a consistent dimension to block out number of users.  So as you read it – it goes from light gray to blue depending on the value.  The point being that as we get nearer in time, there’s more dark blue, no light gray.

And then I went the route of a bump chart to show how the ranks have changed.  Norway had been at the top of the charts, now it’s Iceland.  When you hover over the lines you can see what happened.  And in some cases it makes sense, a country is already dominating usage, increasing can only go so far.

But there are some amazing stories that can unfold in this data set: check out Andorra.  It went from #33 all the way up to #3.

You can take this visualization and step back into different years and benchmark each country on how prolific internet usage was during the time.  And do direct peer comparatives to boot.

This one deserves time spent focused on the interactivity of the visualization.  That’s part of the reason why it is so dense at first glance.  I’m intentionally trying to get the end user to see 3 things up front: overall internet usage in 2000 (by size and color encoding) and the starting rank of countries, the overall global increase in internet usage (demonstrated by coloration change over the spans), and then who the current usage leader is.

Take some time to play with the visualization here.

Workout Wednesday Week 21 – Part 1 (My approach to existing structure)

This week’s Workout Wednesday had us taking NCAA data and developing a single chart that showed the cumulative progression of a basketball game.  More specifically a line chart where the X axis is countdown of time and the Y axis is current score.  There’s some additional detail in the form of the size of each dot representing 1, 2, or 3 points.  (see cover photo)

Here’s what the underlying data set looks like:

Comparing the data structure to the image and what needs to be produced my brain started to hurt.  Some things I noticed right away:

  • Teams are in separate columns
  • Score is consolidated into one column and only displayed when it changes
  • Time amount is in 20 minute increments and resets each half
  • Flavor text (detail) is in separate columns (the team columns)
  • Event ID restarts each half, seriously.

My mind doesn’t like that there’s a team dimension that’s not in the same column.  It doesn’t like the restarting time either.  It really doesn’t like the way the score is done.  These aren’t numbers I can aggregate together, they are raw outputs that are in a string format.

Nonetheless, my goal for the Workout was to take what I had in that structure and see if I could make the viz.  What I don’t know is this: did Andy do it the same way?

My approach:

First I needed to get the X axis working.  I’ve done a good bit of work with time so I knew a few things needed to happen.  The first part was to convert what was in MM:SS to seconds.  I did this in my mind to change the data to a continuous axis that I could format into MM:SS format.  Here’s the calculation:

I cheated and didn’t write my calculated field for longevity.  I saw that there was a dropped digit in the data and compensated by breaking it up into two parts.  Probably a more holistic way to do this would be to say if it is of length 4 then append a 0 to the string and then go about the same process.  Here’s the described results showing the domain:

Validation check: the time goes from 0 to 20 minutes (0 to 20*60 seconds aka 1200 seconds).  We’re good.

Next I needed to format that time into MM:SS continuous format.  I took that calculation from Jonathan Drummey.  I’ve used this more than once, so my google search is appropriately ‘Jonathan Drummey time formatting.’  So the resultant time ‘measure’ was almost there, but I wasn’t taking into consideration the +20 minutes for the first half and that the time axis was full game duration.  So here’s the two calculations that I made (first is +20 mins, then the formatting):

At this point I felt like I was kind of getting somewhere – almost to the point of making the line chart, but I needed to break apart the teams.  For that bit I leveraged the fact that the individual team fields only have details in them when that team scores.  Here’s the calc:

I still don’t have a lot going on – at best I have a dot plot where I can draw out the event ID and start plotting the individual points.

So to get the score was relatively easy.  I also did this in a custom to the data set kind of way with 3 calculations – find the left score, find the right score, then tag the scores to the teams.

Throwing that on rows, here’s the viz:

All the events are out of order and this is really difficult to understand.  To get closer to the view I did a few things all at once:

  • Reverse the time axis
  • Add Sum of the Team Score to the path
  • Put a combined half + event field on detail (since event restarts per half)

Also – I tried Event & Half separately and my lines weren’t connected (broken at half time; so creating a derived combined field proved useful at connecting the line for me)

Here’s that viz:

It’s looking really good.  Next steps are to get the dots to represent the ball sizes.

One of my last calculations:

That got dropped on size on a duplicated and synchronized “Team Score.”  To get the pesky null to not display from the legend was a simple right click and ‘hide.’  I also had to sort the Ball Size dimensions to align with the perceived sizing.  Also the line size was made super skinny.

Now some cool things happened because of how I did this:  I could leverage the right and left scores for tooltips.  I could also leverage them in the titling of the overall scores UNC = {MAX([LeftScore]}.

Probably the last component was counting the number of baskets (within the scope of making it a single returned value in a title per the specs of the ask).  Those were repeated LODs:

And thankfully the final component of the over sized scores on the last marks could be accomplished by the ‘Always Show’ option.

Now I profess this may not be the most efficient way to develop the result, heck here’s what my final sheet looks like:

All that being said: I definitely accomplished the task.

In Part 2 of this series, I’ll be dissecting how Andy approached it.  We obviously did something different because it seems like he may have used the Attribute function (saw some * in tooltips).  My final viz has all data points and no asterisks ex: 22:03 remaining UNC.  Looking at that part, mine has each individual point and the score at each instantaneous spot, his drops the score.  Could it be that he tiptoed around the data structure in a very different way?

I encourage you to download the workbook and review what I did via Tableau Public.

 

#MakeoverMonday Week 21 – Are Britons Drinking Less?

After some botched attempts at reestablishing routine, #MakeoverMonday week 21 got made within the time-boxed week!  I have one pending makeover and an in-progress blog post to talk about Viz Club and the 4 developed during that special time.  But for now, a quick recap of the how and why behind this week’s viz.

This week’s data set was straightforward – aggregated measures sliced by a few dimensions.  And to what I believe is now becoming an obvious trend on how data is published, it included both aggregated and lower dimensions within the same field (read this as “men,” “women,” “all people”).  The structured side of my doesn’t like it and screams for me to exclude from any visualizations, but this week I figured I’d take a different approach.

The key questions asked related to alcohol consumption frequency by different age and gender combinations (plus those aggregates) – so there was lots of opportunity to compare within those dimensions.  More to that, the original question and how the data was presented begged to rephrase into what became the more direct title (Are Britons Drinking Less?)

The question really informed the visualizations – and more to that point, the phrasing of the original article seemed to dictate to me that this was a “falling measure.”  Meaning it has been declining for years or year-to-year, or now compared to then – you get the idea.

With it being a falling measure and already in percentages, this made the concept of using an “difference from first” table calculation a natural progression.  When using the calculation the first year of the measure would be anchored at zero and subsequent years would be compared to it.  Essentially asking and answering for every year “was it more or less than the first year we asked?”  Here’s the beautiful small multiple:

Here the demographics are set to color, lightest blue being youngest to darkest blue being oldest; red is the ‘all.’  I actually really enjoyed being able to toss the red on there for a comparison and it is really nice to see the natural over/under of the age groups (which mathematically follows if they’re aggregates of the different groups).

One thing I did to add further emphasis was to put positive deltas on size – that is to say to over emphasize (in a very subdued, probably only Ann appreciates the humor behind it way) when it is anti-the trend.  Or more directly stated: draw the readers attention to different points where the percentage response has increased.

Here’s the resultant:

So older demographics are drinking more than they used to and that’s fueled by women.  This becomes more obvious to the point of the original article when looking at the Teetotal groups and seeing many more fat lines.

Here’s the calculation to create the line sizing:

Last up was to make one more view to help sell the message.  I figured a dot plot would mimic champagne bubbles in a very abstract way.  And I also thought open/closed circles in combination with the color encoding would be pleasant for the readers.  Last custom change there was to flip the vertical axis of time to be in reverse.  Time is read top down and you can see it start to push down to the left in some of the different groupings.

If you go the full distance and interact with the dashboard, the last thing I hope you’ll notice and appreciate is the color legend/filter bar at the top.  I hate color legends because they lack utility.  Adding in a treemap version of a legend that does double duty as highlight buttons is my happy medium (and only when I feel like color encoding is not actively communicated enough).

The value of Viz Club

I’ve been very interested in pursuing a concept I originally saw in my twitter feed, a picture of Chris Love at a pub with a blurb about ‘Viz Club.’  Following it further, I want to say that there were some additional details about the logistics: but the concept was simple. A few people get together and collaborate on data viz.

Fast forward to the past 8 months and this has been a concept that has wracked my brain. The Phoenix TUG has space once a month on a Saturday. This had originally started as a “workshop” time slot, but as things evolved it didn’t feel quite right.  There was an overwhelming expectation to have curriculum and a survey of how effective the instruction was. Now let me say this: there is absolutely a time and place for learning Tableau and I am all about empowering individuals with the skills to use Tableau.  However, a voluntary community workshop should not be a surrogate for corporate sponsored training and isn’t conducive to building out a passion and enthusiasm filled community.

Anyway – the workshops weren’t right. They weren’t attracting the right types of people and more importantly it was isolating folks who were more advanced in their journey.  Essentially it needed a format change, one that could be accepting of all stages of the journey and one that attracted the passion.

Thus the idea kept echoing in my mind: Viz Club.

So coming out of April, the leap was made. I’ve been facilitating a series called “Get Tableau Fit” where I reconstruct one or two Workout Wednesday challenges during the first half of the TUG.  People were starting to see the value in the exercises (crazy that it took them so long!) and were getting more and more interested in trying them.  Interested, but still with a small amount of trepidation to participate. The final barrier for folks to participate?  Perhaps it was finding structured time.  Being officially assigned a task and a time to do the task.

Hence Viz Club.

So we held our first club session on Saturday, and I have to say it was very successful. I was very curious to the types of people who would show up, what their problems would be, and how the flow would play out. I couldn’t have been happier with the results.

From a personal productivity standpoint, it was well worth the time investment.  There’s something about being in a collaborative and open environment that eliminates potential mental blocks.  Within the first hour of Viz Club I had done one of my catch-up #MakeoverMonday vizzes.

More important than my productivity was the productivity in the room.  Throughout the day questions were being asked and answered.  One specific community member took a concept all the way through to a dashboard – significant breakthroughs within the time box of our 5 hours together.  I know everyone involved felt the power of the time we spent together and it’s something we’re going to continue.