Workout Wednesday 14 | Guest Post | Frequency Matrix

Earlier in the month Luke Stanke asked if I would write a guest post and workout.  As someone who completed all 52 workouts in 2017, the answer was obviously YES!

This week I thought I’d take heavy influence from a neat little chart made to accompany Makeover Monday (w36y2017) – the Frequency Matrix.

I call it a Frequency Matrix, you can call it what you will – the intention is this: use color to represent the frequency (intensity) of two things.  So for this week you’ll be creating a Frequency Matrix showing the number of orders within pairs of sub-categories.

click to view on Tableau Public

Primary question of the visualization: Which sub-categories are often ordered together?
Secondary question of the visualization: How much on average is spent per order for the sub-categories.
Tertiary question: Which sub-category combination causes the most average spend per order?

Requirements
  • Use sub-categories
  • Dashboard size is 1000 x 900; tiled; 1 sheet
  • Distinctly count the number of orders that have purchases from both sub-categories
  • Sort the categories from highest to lowest frequency
  • White out when the sub-category matches and include the number of orders
  • Calculate the average sales per order for each sub-category
  • Identify in the tooltip the highest average spend per sub-category (see Phones & Tables)
  • If it’s the highest average spend for both sub-categories, identify with a dot in the square
  • Match formatting & tooltips – special emphasis on tooltip verbiage

This week uses the superstore dataset.  You can get it here at data.world

After you finish your workout, share on Twitter using the hashtag #WorkoutWednesday and tag @AnnUJackson, @LukeStanke, and @RodyZakovich.  (Tag @VizWizBI too – he would REALLY love to see your work!)

Also, don’t forget to track your progress using this Workout Wednesday form.

Hints & Detail
  • You may not want to use the WDC
  • Purple is from hue circle
  • You’ll be using both LODs and Table Calculations
  • I won’t be offended if you change the order of the sub-category labels in the tooltips
  • Dot is ●
  • Have fun!

Who Gets an Olympic Medal | #MakeoverMonday Week 7

At the time of writing the 2018 Winter Olympic Games are in full force, so it seems only natural that the #MakeoverMonday topic for Week 7 of this year is record level results of Winter Games medal wins.

I have to say that I was particularly excited to dive into this data set.  Here’s what a few rows of data look like:

I always find with this level of data there are so many interesting things that can be done that it gets really hard to focus.  The trouble is that all of the rows are interesting, so as a creator I’m immediately drawn to organizing “all the data” and want put to it ALL on display.  And that’s where the first 20 minutes of my development were headed.

I’d started with a concept of showing all the medals and more specifically showing the addition of new sports over time.   As I was building, the result was quite clearly going to be a giant poster form viz.  Not what I was going for.

To move past that my mind shifted to female sports at the Winter Olympics.  And if you look through the data set you’ll see there are some interesting points.  Specifically that it took about 60 years for women to get to a similar number of events/medals as men.  (yellow = men, purple = women, gray = mixed)

I spent some time stuck on this – thinking through how I could segment by different sports and to extract out some of the noise of the different years and come up with a slope chart.  Ultimately I found myself disappointed with all of these pursuits – so my thoughts shifted.

So I switched gears and stumbled on this chart:

Which as you look through it is REALLY interesting.  I had just watched the Opening Ceremonies and knew there were 91 delegations (countries) represented in 2018.  To know that in 2014 the number was probably similar, yet only 26 reached a podium seemed to be a sticking point in my mind.

So – that led to a quick adventure over to http://www.olympic.org to add context to the number of countries represented at the games over the years.  They actually have really nice summary pages for each set of games that made gathering data simple.  Here’s a snapshot of 1980 – Lake Placid:

Using the ribbon of information at the bottom I went about collecting and enriching the data set.  Because what was missing from our original #MakeoverMonday data was the NULLs.

Sufficiently enriched I was able to come up with a calculation for the percentage of delegations medalling at each set of games.  Of course I suspected that this would not be close to 100%, only by virtue of knowing that we’ve got 91 delegations in 2018.  Here’s the chart:

So – now the story is unfolding, but I wanted to take it a few steps further.  My main beef: I want to also see how many additional delegations are bringing athletes to the games.  Specifically at the first data point I’d think that it was a small number of countries because the game were new. Essentially the opportunity for medalling would perhaps be greater.  Hence settling on what ended up being my final submission for the week:

click to view on Tableau Public

What are you looking at?  Medals are clearly parsed out into Gold, Silver, and Bronze.  Each bar represents a Winter Games.  The width of the bar is the # of countries/delegations, the height of the bar is the % of countries who medalled in that respective color.  I concede in this that eliminating the dimensionality of medals may have made for a more consolidated view, but I selfishly wanted to use the different colors.

Here’s the non-medalled version:

Less abstracted, more analytically presented:

Ultimately for the sake of the exercise I went with continuous bar sizing representing the number of delegations at each Winter Games.  And my “why” is because this isn’t often seen and within the confines of this visualization it would be a great usage.  Explaining this aloud should facilitate easy cognition.  The wider bars means more countries participating (reinforced by our general knowledge of the games).  And then the height of the bars can cleanly represent the percentage of those getting medals.  Plus – per usual – the tooltip divulges all this in well articulated detail.  (++ bars allow for chronology of time)

I’m quite pleased with this one.  Maybe because I am the designer, but I was delighted with the final representation both from a visual perspective and an analytical presentation perspective.  There is a certain amount of salience in having both the bars gets larger over time (and repeating that 3 times) and the colors of the medals being represented within a single worksheet.

#Workout Wednesday – Fiscal Years + Running Sums

As a big advocate of #WorkoutWednesday I am excited to see that it is continuing on in 2018.  I champion the initiative because it offers people a constructive way to problem solve, learn, and grow using Tableau.

I was listening to this lecture yesterday and there was a great snippet “context is required to spark curiosity.”  We see this over and over again in our domain – everyone wants to solve problems, but unless there is a presented problem it can be hard to channel energy and constructively approach something.

Last pre-context paragraph before getting into the weeds of the build.  I enjoy the exercise of explaining how something works.  It helps cement in my mind concepts and techniques used during the build.  Being able to explain the “why” and “how” are crucial.

Let’s get started.

High level problem statement or requirement: build out a dashboard that shows the running total of sales and allows for the user to dynamically change the start of a fiscal year.  The date axis should begin at the first month of the fiscal year. Here’s the embedded tweet for the challenge:

 

Okay – so the challenge is set. Now on to the build. I like to tackle as many of the small problems that I know the answer to immediately. Once I get a framework of things to build from I can then work through making adjustments to get to the end goal.

  • numeric parameter 1 to 12 for the fiscal year start
  • calculation to push a date to the right fiscal year
  • dimension for the fiscal year
  • transformation of date to the correct year
  • running sum of sales

I’ll spare you the parameter build and go directly into the calculations that deal with the fiscal year components.

Logic as follows – if the month of the order date is less than the fiscal year start, then subtract one from the year, otherwise it’s the year.  I can immediately use this as a dimension on color to break apart the data.

The next step would be to use that newly defined year with other elements of the original date to complete the “fiscal year” transformation.  Based on the tooltips – the year for each order date should be the fiscal year.

Now that the foundation is in place, the harder part is building a continuous axis, and particularly a continuous axis of time.  Dates span from 2013 to 2017 (depending on how you’ve got your FY set up), so if we plotted all the native dates we’d expect it to go from 2013 to 2017.  But that’s not really want we want.  We want a timeline that spans a single year (or more appropriately 365 days).

So my first step was to build out a dummy date that had the SAME year for all the dates.  The dates are already broken out with the FY, so as long as the year isn’t shown on the continuous axis, it will display correctly.  Here’s my first pass at the date calculation:

That gets me the ability to produce this chart – which is SO CLOSE!

The part that isn’t working is my continuous axis of time is always starting at 1/1.  And that makes sense because all the dates are for the same year and there’s always 1/1 data in there.  The takeaway: ordering the dates is what I need to figure out.

The workaround?  Well the time needs to span more than one year and specifically the start of the axis should be “earlier” in time.  To achieve this – instead of hard coding a single year 2000 in my case, I changed the dummy year to be dependent on the parameter.  Here’s the result:

Basically offset everything that’s less than the chosen FY start to the next year (put it at the end).  (Remember that we’re already slicing apart the data to the correct FY using the previous calculations.)  The combination of these two calculations then changes the chart to this guy which is now mere formatting steps away from completion.

The full workbook is available to download here.

The Remaining 25 Weeks of #WorkoutWednesday

Back in July I wrote the first half of this blog post – it was about the first 27 weeks of #WorkoutWednesday.  The important parts to remember (if the read is too long) are that I made a commitment to follow through and complete every #MakeoverMonday and #WorkoutWednesday in 2017.  The reason was pretty straightforward – I wanted a constructive way to challenge myself and tangible, realistic goals.

Now that we’re 3 days into 2018 – it’s the perfect opportunity to go through the same process of sharing the impact each workout (the remaining 25) has had on me.

Week 28 | Insights & Annotations
The focus on this workout was adding context/insights/summarization to existing visualizations.  Something that is often asked of those creating dashboards or presenting data.  I enjoyed this workout tremendously because it was a great example of using a feature within Tableau in a way I hadn’t thought about.  The premise is pretty simple – allow users the ability to input insights/findings and customize the summary output.  Very clever use that is a great way to provide a dashboard to someone who has little time or is skeptical of self-service analytics.

Week 29 | Who sits where at the Data School?
I hated this workout – because of donut charts.  Donut charts in Tableau are the level one “look how cool I am” or “I think I’m pretty awesome with the tool” things that people make.  Yes – they are cuter than pie charts (which I don’t mind) – but I strongly hate how these are implemented in Tableau.  Pushing aside my dissatisfaction for donuts – I veered off requirements for this dashboard.  I particularly changed from the original build results because of how seat favorites were computed – more specifically I ended up with more “No Favorite” than the original.  The great points about this workout – the sophistication required to calculate several of the numbers shown.

Week 30 | Loads of LODs
As described this workout had a few LODs.  The most distinct thing I remember about this build relates to the region filter.  You need to decide early on how you’re going to implement this.  I believe Emma used a parameter where I used a filter.  The choice made here will have consequences on the visualizations and is a great way to start understanding the order of operations within Tableau.

Week 31 | The Timing of Baby Making
Ah – a favorite visualization of mine, the step chart/plot.  The two major gotcha moments here are: implementing the step chart and the dropdowns/filters to set highlighting.  This one is solvable in multiple ways and I went the Joe Mako route of unioning my data set.  During the build this seemed like the easier solution to me, but it does have implications later on.  I believe it’s worth the effort to go this route – you will learn a lot about how unioning your data on itself can be useful.

Week 32 | Continuous Dates are Tricky
This is classic Emma style – particularly down to the requirement of a dynamic title that would update based on newer data availability.  The premise itself is pretty straightforward – how can you plot a continuous month line, but have years broken out.  By definition that concept should break your brain a bit, because continuous means it’s the entire date, so how are you plotting different years on a continuous axis?  And hence the challenge of the workout!

Week 33 | How Have Home Prices Changed?
Andy mentioned that this visualization was inspired by Curtis Harris.  And as I write this – it is probably my top visualization in terms of design from the 2017 #WorkoutWednesday collection.  Something about the strategic use of capitalization in conjunction to the color choices resonated with me and has left a lasting impact on my viz style.  This is a stunningly beautiful example of making Tableau a very self-service analytics tool, having dense data, but still being deceptively simple and clean from a design perspective.  Plus you’re practicing dynamic titles again – which I find to be a requirement for most serious builds.

Week 34 | Disney’s Domination
This workout was my first waffle chart.  I’d successfully avoided the waffle (as a general rule I don’t like high carbohydrate visualizations) up to this point.  More than the waffle – was the requirement to data blend.  I’m not a big data blending fan because it is very easy for things to go sideways.  However – the icky feeling you get from data blending is exactly why this is a great exercise to work through.  And also because I believe I did this entire visualization (up to a point) using LODs and had to switch to table calculations.  I learned how to make an LOD into a table calculation (probably the reverse practice for more tenured Tableau folks).

Week 35 | Average Latitude of Solar Eclipses by Century
This is another visualization with design that I find very pleasing – particularly the use of reference lines.  I strongly remember learning so much about the Path shelf this week.  Specifically how to use path to your advantage.  You don’t often see people create something that would otherwise be a line chart, but instead has vertical bars/gantts/lines to focus the eye vertically and then across.  A great exercise and thought-starter on additional visualizations to make.

Week 36 | Which UK Airport Should You Fly From?
This workout is the perfect hands-on exercise on taking continuous sizing on bar charts offered up as a new feature in the past year.  Beyond knowing that you CAN do something, knowing HOW to build that something is the key (at least for me) to be able to iterate and ideate.  This one is more complex than it seems at first glance.

Week 37 | Killings of Blacks by Whites Are Far More Likely to Be Ruled ‘Justifiable’
A viz I don’t want to remember – this one is 100% about formatting.  It took me a considerable chunk of time to complete.  And probably more maddening – this viz needs to be built in one session, otherwise you’ll forget all the intricate details required to make it look “just so.”  I was cursing my PC the entire build – and worse than that I think I restarted it at a certain point because things weren’t lining up how I wanted.  Only complete if you enjoy being tormented.  The upside?  Going through this workout will make you intimately aware of all the gaps and limitations Tableau has as it relates designing.  Also – this was done before changing padding was a feature.  Thanks guys.

Week 38 | (It Takes) All Sorts
This is another “looks simple” but has “some tricks” workout.  I remember someone at our user group asking about this over the summer and if I knew how to build it.  I didn’t have an answer readily available within 30 seconds, so I knew there was more going on.  I highly encourage this build because it demonstrates how sorting works and how multiple sorts interact with each other.  Also – I think whatever sorting I ended up was some sort of mathematical manipulation on my part.

Week 39 | Are the contributions of top sellers increasing throughout the year?
Another trellis chart!   More than the trellis – check out what is being analyzed.  This isn’t superficial or first-pass reading of data.  This is second and third level thought on finding deeper insights and answers to questions within a data set.  So naturally it requires more layers of calculations to resolve.  And of course – the “just so” placement of the far right label.  This is a perfect example of taking a question and turning it into a visualization that shows the answer.

Week 40 | All Sorts Part 2
As advertised and named – the second half of Emma’s sorting workout.  This may actually have been the dashboard where I did some mathematical magic to require the first position to be first and retain any additional sorting.  Also – the devil is in the details.  When you change sort order, notice that the bottom visualization always changes to be the subcategory chosen.  Sounds easy, but takes some thought to implement.

Week 41 | State to City Drill Down
As I look back on my tracker – I realize that I did 38 through 41 in the same day.  And naturally I approach backlog in the fashion of oldest gets built first.  So this was the 4th on a particular day – but I championed this guy hardcore.  I will say it again – YOU NEED TO DO THIS WORKOUT.  The concepts on execution here are next level.  I know it sounds kind of trivial – but it will help unlock your mind to the possibilities of using filters and the art of the possible.  Plus this is a question that ALWAYS gets asked.  “Can I click a state and have it automatically change the view to cities.”  This does that.  Also – this build took me 30 minutes tops.

Week 42 | Market Basket Analysis
I won’t forget this workout anytime soon because it required the legacy JET connector and thinking about how data gets joined back on itself.  This type of analysis is something people often want to have done – so knowing the steps on creation using an Excel data source (or other sources for that matter) makes this guy worth the build.  Follow Emma’s advice closely.

Week 43 | The Seasonality of Superstore
A great viz once again demonstrating how powerful parameters can be – how you can use them in multiple places – and also things you can do to make visualizations more user/reader friendly.  You’re definitely using a table calculation somewhere in here – and you definitely will get angry when trying to recreate the smoothing (particularly dealing with endpoints of the chosen time range).

Week 44 | Customer Cohorts
When dealing with cohort analysis you’re very likely to encounter LODs – that’s par for the course for this workout.  But again – Emma is so clever at taking something that seems straightforward and challenging you to implement.  If you look closely you’ll have to dynamically change the bottom visualization based on where a user clicks.  I remember spending the majority of my time on the dynamic title.

 

Week 45 | Stock Portfolio
This is one sheet.  Just remember that – everything is on one sheet.  And more than that – think about how this is implemented from a numerical perspective – there’s some serious normalization going on to make things show up in context to one another.  If you’re not a math lover – this will be a great way to play with numbers and have them bend to your advantage.  Also – I remember being annoyed because one of the stocks had a maximum value greater than the recorded max (which is it’s own measure) – and that irritated me.

Week 46 | Top N Customers
Think of this as a different way of implementing sets.  It has a lot of similar functionality between IN/OUT and showing members of a set.  And also there are some key takeaways in terms of aggregating dimensions.  Not super flashy on design, but very useful in terms of implementation.

Week 47 | Fun with Formatting
Another visualization where you’re required to do everything in a single sheet.  This will put all that table calculation sweat to action.  I really enjoyed this one.  There is something very satisfying about ranking/indexing things multiple ways in one view.  Also it uses the Caption guys.

Week 48 | Treemap Drilldown
Same concept as week 41, but executed as a treemap.  I think I even opened up week 41 to use as influence on where to go.  Same concepts are repeated, but in a different format.  The automagic of this one doesn’t get old – also carefully look at how things are sorted.

Week 49 | Position of Letter Occurrences in Baby Names
When you say out loud what you’re trying to do – particularly “find the nth occurrence” of a specific letter (we can generalize as substring) in a specific string – it sounds really really hard.  But guess what – there’s a built in function!  The fact that it’s built in made this visualization super straightforward to complete.  You should build this to introduce yourself to a function you’ve probably never used before.

Week 50 | Rocket ship Chart
I very much enjoy this type of chart from an analytical perspective.  It’s a great way to normalize things that are bound to time.  You see immediate inferred rank and results.  Emma put in some requirements to ensure that as data changed this chart would stay accurate.

Week 51 | State by State Profit Ratio
If you want several of the lessons Andy built into multiple workouts all in one place – this workout is for you.  It’s got so many classic Kriebel “gotcha” moments in it.  As I was building this I really felt like it was a specially designed final to test what I’ve learned.  Also this is probably my first tilemap (unless we made one in another workout).  I don’t use them often – so it’s a great refresher on how to implement.  And also – you get to use a join calculation.

Week 52 | UK’s Favourite Christmas Chocolates
When I was building this one someone asked me why I was making it – specifically where was the challenge.  I explained that it was all in one sheet opposed to 4 different sheets.  A natural next question occurred which was “why would you want to do it in one sheet.”  I though that was a very interesting question and one that I explained by saying that for me personally knowing multiple ways to do things is important.  And more specifically as I know to be true of these types of builds – if you can do it in one sheet, it shows a level of mastery on making Tableau do exactly what you want (which is LOTS of things).

And that wraps up 2017 beautifully.  Comparing the retrospective of this half of the year vs. the first half – there are stark differences from my perspective.  I can honestly say that each build got easier as time went on.  Once I got to the last few challenges – I was timing completion to be about 20 minutes.  Contrast that with the first few weeks where I spent hours (over multiple sessions) making my way through each build.

Beyond building out my portfolio, having concrete examples of specific types of analysis, and fulfilling my own goals – #WorkoutWednesday has given me such depth of knowledge in Tableau that it’s ridiculous.  You know how people say things like “that’s just funny Tableau behavior” – well I can (for the most part) now verbally articulate what and why that funny behavior is.  And more than that – I know how to maximize the behavior which is really working as designed and use it to my own advantage.

The last part of this blog series is going to be a ranking of each workout – aimed to help those who are interesting in completing the challenges approach these without getting too discouraged or burnt out on some of the builds that are still (to this day) hard.  Be on the lookout.

Exploring Python + Tableau

This month I’ve been taking a night class at Galvanize aimed at being an introductory to Python and Data Science.  It’s a 4 night/week boot camp with a mix of lecture and hands-on assignments.  My primary interest with the course was the data science component, but I also felt extremely strong about the necessity to have a programming language in my toolkit.

To add context – I’ve taken formal classes in C++ and Java, know enough HTML/Javascript to be dangerous, and can leverage SQL when forced.  I’ve always had very positive retrospective moments about learning C++ and particularly think the act of writing pseudo-code has been a skill I’m lucky to have.

When learning anything new I think it’s required to put it in action.  So I have some small immediate goals related to Python that focus on integration or utility to Tableau.  In fact earlier this year I dabbled in the topic and spun up a Linux box to go through the published exercise of using Python+NLP.  Specific to that exercise I actually did the NLP processing within Python and exported the results.

This leads me to Wednesday, where I was presented with a new opportunity to combine skill sets.  The premise was pretty straightforward – take concepts of Monte Carlo scenarios/simulations/method for corporate finance and use it to develop visualizations.  The key to this is that a historic input or perhaps expected input (likely derived through some analysis) and it’s corresponding standard deviation are used along with the probability component of random numbers bounded to a normal distribution.  A fancy way of saying (at least for this example) – generate numbers based on X, it’s standard deviation, and random numbers.  Specifically of the format Result = X + (SD*RandomNumber).  And several results are computed.  It’s like rolling dice over and over again and seeing what happens.

So this initially was conceived and in working order in Excel – and the kicker of why it works in Excel is a function that does that tidbit I mentioned above “random numbers bounded to a normal distribution.”  The functions used to generate that aren’t available in Tableau.  And this is where Python came to the rescue.

Because Python has depth in mathematical libraries it seemed like a natural choice to outsource the computation of these random numbers to it.

To get started I read through this community article: Tableau Integration with Python – Step by Step.  It provides the pertinent components of how to get Tableau and Python to talk – most specifically how to use the SCRIPT_() functions in Tableau to pass through Python code to TabPy.

Once that was done I needed to develop a data set to use under the hood of the workbook (remember pretty much everything is parameterized – so there’s no data to read in).  I ended up making an Excel spreadsheet with 1,000 rows – each with a row number (1 to 1,000).

The simulation results are this calculation:

So you can see – 2 parameters (purple) and then one calculated field.  The calculated field is where Python jumps on the scene.

Breaking this down a bit further:

  • SCRIPT_REAL() – takes in blocks of code within quotation marks followed by arguments (that must be aggregated).   It outputs a real number (opposed to integer, string, or boolean)
  • Python script is pretty basic: import 2 different libraries
    • norm from scipy.stats – this gives me the function norm.ppf() that allows the conversion of a random number to be in that normal distribution format
    • random.random() – just a random number function
    • Named n as a list and set it equal to the output of my functions
    • returned the value n

I noticed when I did this I had to supply an argument (essentially a comma with SOMETHING after it to avoid errors in my calculated field).  So I added on a simple SUM(1).  This calculation is being called at a row level, so it’s just 1 and not used anywhere.  If I were going to use it, I’d need to include _arg1 (or _arg2, _argN) and follow up with those input arguments separated by commas.

SCRIPT() calculations always come on as Table Calculations – so you’ll see the icon and have all the nuance of Table Calculations.  Namely that they’re computed in the view and you can’t make something like bins with them or use them for sorting.

Here’s the final dashboard (in it’s initial state):

You’ll see the different inputs and the ability to set the number of simulations.  This is simply a parameter that limits the row number of the underlying data.

As I was playing around this – one of the keys to this analysis is to show the spread of the different results (ideally maybe a histogram).  So I stuck with confetti for now, but added on reference bands and lines.  Additionally I colored the simulation results using a custom binning technique where I find the range between Min & Max, divide that by 10 and then create bins with the Min+ multiple of range/10.  Here’s the calculated fields:

This allows the end user to click on the color legend to see the number of simulations within each “bin.”  And to see through random chance how the simulations really distributed.

I also made this sheet, which has some advantages of expressing that concept a bit more clearly.  Through 110 simulations – here’s where the results fell out.

In terms of the visual outputs – I’d still say I’m at an exploration stage of finding what works best.  However, from a Python/TabPy angle – I was pretty enthused to put it all in action.  And most specifically one thing I appreciate about this is that when the simulation number is changed, it’s the act of running a “new simulation.”  Same with all of the parameters – but something very satisfying about the act of data changing on the fly.

Next week we’re finally tackling NumPy and Pandas – so I expect some inspiration to come.

 

The Importance of Certification

I’ve long been a huge advocate of certification in technical fields.  I think it is a great way to actively communicate and demonstrate the skill level you have in a particular area.  Even further in my mind it represents the ability to set a foundation to build off of with others.

I personally started my Tableau certification journey last year (more like 18 to 20 months ago).  I was becoming much more heavily involved in my local Tableau user group and felt that I needed a way to benchmark or assess my skills.  I knew I had much more to contribute to my local community and I thought that going through the certification journey and sharing that with the TUG would be beneficial.

So I started by getting ready for the Desktop Qualified Exam.  Independently I went through all the existing documentation and searched for knowledge nuggets that would set me on the right path.  I took the stance of developing out all of my own knowledge gaining to a format that could be digested by a larger audience.  I walked through all of the practice exam questions and did an analysis of the different certification levels to the user group at least 3 times.

I passed the Desktop Qualified Associate around April or May of 2016.  It was a great achievement – and I was able to add definition to what that exam means.  Having the Desktop Qualified Associate certification means that you are technically proficient in the features and functions of Tableau Desktop.  It means that you can answer thoughtful questions using built in features and that you have depth of understanding best practices or efficient ways to get to the results.  If I were to equate it to a different skill – I would say that it means you know how and when to use different tools in a toolbox.  What it doesn’t mean: that you are a masterful architect or that you can build a stunningly beautiful home.

To get to the next level of mastery and understanding, that means you’ll need the Certified Professional.  If you take a look at the specific components that are tested you’ll quickly realize that advanced technical skill is weighted less than storytelling or analysis.  The purpose of the Desktop Certified Professional is to demonstrate that you have a deep understanding of data visualization, using data visualization to tell a story, and how level two (or three or four) analytics or analysis are necessary to start answering deeper and more important (read as: higher impact) questions.

For me to work on preparation here – the exam prep guide was only the beginning.  It assists in the structural components of 1) knowing how many questions there will be 2) estimating time available to spend on each question 3) examples of analytical/presentation depth required to demonstrate proficiency 4) variety of question types.

Probably the most intriguing questions for me are those where you have to assess a particular visualization, give and justify a critique (and specifically how it relates to a described objective) and then provide an alternative solution (also justifying verbally the importance).  This skill is much different than knowing how to hammer a nail into a post.  It is defending why you chose to put a porch on the northeast corner of a home.  It’s a lot about feel.

I had such an awesome time taking the exam.  There are a lot of real world constraints that required me to distill down the most important components of each question.  It’s interesting because for most items there isn’t a single right answer.  There are definitely lots of wrong answers, but right is a spectrum that is somewhat dependent on your ability to communicate out the completeness of your point of view.

I’ve had the title of Tableau Desktop Certified Professional for just over a year now – so I can tell you with a decent amount of retrospective thought what it has done for me.  Just as I am able to describe the test and purpose it served in this blog post – I can do the same thing in all of my interactions.  It keeps me humble to knowing that the PURPOSE behind a visual display is more important than fancy widgets or cool tricks.  That to a large extent my role is to work through the semantics of a situation and get to the root of it.  The root of the question or questions, the heart of concern, the why behind the visualization.  And also the artistry (yes I use this word) behind what it takes to get there.  We have all felt the difference between a perfectly acceptable visualization and the right visualization.  The end user experiences something different.  I firmly believe that deeper understanding can be achieved by spending that extra thoughtfulness and approach to iteration.

So let’s now fast forward to the other certification path – the more recent one: Tableau Server.  What’s interesting is that because my strengths have been built out on the visualization end, I haven’t planted myself in an opportunity to understand the deeper technical components of Tableau Server.  I have always understood and had great depth of knowledge in Site Administration.  That is to say acknowledging and abiding by best practices for sharing, permissions, and managing content.  But – the part that I had not spent time on is creating a sustainable platform to have the vision continuously executed.

So to overcome that minor blind spot – I went on a mission to learn more, to shine light on the unknown.  You’ve seen that play out here on my blog – going on a self directed adventure to deploy a Server on Azure.  Nobody told me to do that – I was internally compelled.  (I should also mention I was honored to have a friend go on the journey with me!)

I’m probably rehashing at this point – but anytime you grow knowledge in a particular area (more specifically technical) it gives you such breadth and depth of vocabulary to be able to connect to other individuals.  You find that communication barriers that were preventing the success of a project are diminished because you now speak the same language.  As I write this I can hear Seth Godin saying that the more knowledge someone has in a particular subject area the more ABSTRACT their language is around it.  Which means that it is extremely difficult to communicate with that SME unless significant effort is taken on the part of both parties to bridge the gap.

So that’s what Tableau Server qualification has done for me.  It’s the first step on a journey to get to the point where I imagine the next level to Server Certified Professional is the act of execution.  It’s less knowledge and verbiage and more tactical.  Also likely there’s more ambiguity – not a right answer, rather a spectrum of right where you communicate your why.

As I wind down this post – I must shout to you “go get certified.”  Ignore the naysayers.  It’s easy to not do something, but you know what is hard?  Doing something.  Being tested.  And why is that?  Because you can fail.  Get over failure – push through that mindset.  The alternative is much more pleasant and unlocks all the potential the universe has to offer.

 

Star Trek The Next Generation: Every Episode (#IronViz 3)

It’s that time again – Iron Viz feeder contest!  The third and final round for a chance to battle at conference in a chef coat is upon us.  This round the focus was on anything ‘Silver Screen.’

With a limitless topic I was certain that I would find myself in a creative rut that would likely result in submitting something at the end of the submission time period (August 13th).  So I am as shocked as anyone else that I have a fully formed submission way before deadline.

So what’s the topic and what got me unstuck?  Star Trek of course!  The backstory here is amazing – I went to a belated wedding shower for a few friends and they mentioned to me that they were going to the annual Star Trek convention.  And more specifically there was a special celebration occurring – the 30th anniversary of Star Trek: The Next Generation.  Not even up for debate – it just IS the best incarnation of the Star Trek universe.

So I decided to take a moment to do some research on finding TNG data.  It didn’t take me long to unearth this fantastic data set on GitHub that includes each episode’s script parsed out by character.

Really inspired by the thought of seeing each word of each episode visualized – I set forth on my mission.  As I got started there was one component that was mission critical: the bold colors present throughout the world of Star Trek.  The bold and moody colors of Star Trek are fantastic – especially paired with a black background.  And working with individual scripts meant that I could use color to accentuate different characters – much like their uniforms do in the episodes.

The next component that I wanted to invoke on this (again – design focused here) was the electronics and computer interfaces.  I particularly like the rounded edges and strong geometric shapes that are on the computer screens across all iterations of Star Trek.  So that describes most of the design – the choice of colors and how some of the visualizations were setup.

Now on to the next important component here: analysis.  When you see this visualization you may find yourself realizing that I don’t draw any conclusions.  For this collection of visualizations I am playing the role of curator.  I am developing a visual world for you to interact with, to go deep and wide in your understanding.  I am not attempting to summarize data for you or force conclusions upon you.  I am inviting you to come into the world of Star Trek, unearth who speaks during each episode, find out what that character is saying.  I want there to be an unending number of takeaways and perceptions generated from this.

And the last part you need to understand is the story telling.  This entire visualization has an untold number of stories in it by virtue of it being a visualization of the entire series.  If you want a meta-story to tell it’s simply this: Star Trek The Next Generation is such a deep and rich world that you should go get lost.  And while you’re on the path of getting lost do me a favor: retain some leadership tidbits from Picard and sprinkle in some logical takeaways from Data.

 

#WorkoutWednesday Week 24 – Math Musings

The Workout Wednesday for week 24 is a great way to represent where a result for a particular value falls with respect to a broader collection.  I’ve used a spine chart recently on a project where most data was centered around certain points and I wanted to show the range.  Propagating maximums, minimums, averages, quartiles, and (when appropriate) medians can help to profile data very effectively.

So I started off really enjoying where this visualization was going.  Also because the spine chart I made on a recent project was before I even knew the thing I developed had already been named.  (Sad on my part, I should read more!)

My enjoyment turned into caution really quickly once I saw the data set.  There are several ratios in the data set and very few counts/sums of things.  My math brain screams trap!  Especially when we start tiptoeing into the world of what we semantically call “average of all” or “overall average” or something that somehow represents a larger collective (“everybody”).  There is a lot of open-ended interpretation that goes into this particular calculation and when you’re working with pre-computed ratios it gets really tricky really quickly.

Here’s a picture of the underlying data set:

 

Some things to notice right away – the ratios for each response are pre-computed.  The number of responses is different for each institution.  (To simplify this view, I’m on one year and one question).

So the heart of the initial question is this: if I want to compare my results to the overall results, how would I do that?  Now there are probably 2 distinct camps here.  1: take the average of one of the columns and use that to represent the “overall average”.  Let’s be clear on what that is: it is the average pre-computed ratio of a survey.  It is NOT the observed percentage of all individuals surveyed.  That would be option 2: the weighted average.  For the weighted average or to calculate a representation of all respondents we could add up all the qualifying respondents answering ‘agree’ and divide it by the total respondents.

Now we all know this concept of average of an average vs. weighted average can cause issues.  Specifically we’d feel the friction immediately if there were several low-end responses commingled with several higher response capturing entities.  EX: Place A: 2 people out of 2 answered yes (100%) and  Place B: 5 out of 100 answered ‘yes’ (5%).  If we average 100% and 5% we’ll get 52.5%.  But what if we take 7 out of 102, that’s 6.86% – a way different number.  (Intentionally extreme example.)

So my math brain was convinced that the “overall average” or “ratio for all” should be inclusive of the weights of each Institution.  That was fairly easy to compensate for: take each ratio and multiply it by the number of respondents to get raw counts and then add those all back up together.

The next sort of messy thing to deal with was finding the minimums and maximums of these values.  It seems straightforward, but when reviewing the data set and the specifications of what is being displayed there’s caution to throw with regard to level of aggregation and how the data is filtered.  As an example, depending on how the ratios are leveraged, you could end up finding the minimum of 3 differently weighted subjects to a subject group.  You could also probably find the minimum Institution + subject result at the subject level of all the subjects within a group.  Again I think the best bet here is to tread cautiously over the ratios and get into raw counts as quickly as possible.

So what does this all mean?  To me it means tread carefully and ask clear questions about what people are trying to measure.  This is also where I will go the distance and include calculations in tool tips to help demonstrate what the values I am calculating represent.  Ratios are tricky and averaging them is even trickier.  There likely isn’t a perfect way to deal with them and it’s something we all witness consistently throughout our professional lives (how many of us have averaged a pre-computed average handle time?).

Beyond the math tangent – I want to reiterate how great a visualization I think this is.  I also want to highlight that because I went deep-end math on it that I decided to go deep end development different.

The main difference from the development perspective?  Instead of using reference bands, I used a gannt bar as the IQR.  I really like using the bar because it gives users an easier target to hover over.  It also reduce some of the noise of the default labeling that occurs with reference lines.  To create the gannt bar – simply compute the IQR as a calculated field and use it as the size.  You can select one of the percentile points to be the start of the mark.

#WorkoutWednesday Week 23 – American National Parks

I’m now back in full force from an amazing analytics experience at the Alteryx Inspire conference in Las Vegas.  The week was packed with learning, inspiration, and community – things I adore and am honored to be a part of.  Despite the awesome nature of the event, I have to admit I’m happy to be home and keeping up with my workout routine.

So here goes the “how” of this week’s Workout Wednesday week 23.  Specifications and backstory can be found on Andy’s blog here.

Here’s a picture of my final product and my general assessment of what would be required for approach:

Things you can see from the static image that will be required –

  • Y axis grid lines are on specific demarcations with ordinal indicators
  • X-axis also has specific years marked
  • Colors are for specific parks
  • Bump chart of parks is fairly straight forward, will require index() calculation
  • Labels are only on colored lines – tricky

Now here’s the animated version showing how interactivity works

  • Highlight box has specific actions
    • When ‘none’ is selected, defaults to static image
    • When park of specific color is selected, only that park has different coloration and it is labeled
    • When park of unspecified color is selected, only that park has different coloration (black) and it is labeled

Getting started is the easy part here – building the bump chart.  Based on the data set and instructions it’s important to recognize that this is limited to parks of type ‘National Historical Park’ and ‘National Park.’  Here’s the basic bump chart setup:

and the custom sort for the table calculation:

Describing this is pretty straight for – index (rank) each park by the descending sum of recreation visitors every year.  Once you’ve got that setup, flipping the Y-axis to reversed will get you to the basic layout you’re trying to achieve.

Now – the grid lines and the y-axis header.  Perhaps I’ve been at this game too long, but anytime I notice custom grid lines I immediately think of reference lines.  Adding constant reference lines gives ultimate flexibility in what they’re labelled with and how they’re displayed.  So each of the rank grid lines are reference lines.  You can add the ‘Rank’ header to the axis by creating an ad-hoc calculation of a text string called ‘Rank.’  A quick note on this: if you add dimensions and measures to your sheet be prepared to double check and modify your table calculations.  Sometimes dimensions get incorporated when it wasn’t intended.

Now on to the most challenging part of this visualization: the coloration and labels.  I’ll start by saying there are probably several ways to complete this task and this represents my approach (not necessarily the most efficient one):

First up: making colors for specific parks called out:

(probably should have just used the Grouping functionality, but I’m a fast typer)

Then making a parameter to allow for highlighting:

(you’ll notice here that I had the right subset of parks, this is because I made the Park Type a data source filter and later an extract filter – thus removing them from the domain)

Once the parameter is made, build in functionality for that:

And then I set a calculation to dynamically flip between the two calculations depending on what the parameter was set to.

Looking back on this: I didn’t need the third calculation, it’s exactly the same functionality as the second one.  In fact as I write this, I tested it using the second calculation only and it functions just fine.  I think the over-build speaks to my thought process.

  1. First let’s isolate and color the specific parks
  2. Let’s make all the others a certain color
  3. Adding in the parameter functionality, I need the colors to be there if it is set to ‘(None)’
  4. Otherwise I need it to be black
  5. And just for kicks, let’s ensure that when the parameter is set to ‘(None)’ that I really want it to be the colors I’ve specified in the first calc
  6. Otherwise I want the functionality to follow calc 2

Here’s the last bit of logic to get the labels on the lines.  Essentially I know we’re going to want to label the end point and because of functionality I’m going to have to require all labels to be visible and determine which ones actually have values for the label.  PS: I’m really happy to use that match color functionality on this viz.

And the label setting:

That wraps up the build for this week’s workout with the last components being to add in additional components to the tooltip and to stylize.  A great workout that demonstrates the compelling nature of interactive visualization and the always compelling bump chart.

Interact with the full visualization here on my Tableau Public.

#MakeoverMonday Week 22 – Internet Usage by Country

This week’s data set demonstrates the number of users per 100 people by country spanning several years.  The original data set and accompanying visualization starts as an interactive map with the ability to animate through the changing values year by year.  Additionally, the interactor can click into a country to see percentage changes or the comparative changes with multiple countries.

Channeling my inner Hans Rosling – I was drawn to play through the animation of the change by year, starting with 1960.  What sort of narrative could I see play out?

Perhaps it was the developer inside of me, but I couldn’t get over the color legend.  For the first 30 years (1960 to 1989) there’s only a few data points, all signifying zero.  Why?  Does this mean that those few countries actually measured this value in those years, or is it just bad data?  Moving past the first 30 years, my mind was starting to try and resolve the rest of usage changes.  However – here again my mind was hurt by the coloration.  The color legend shifts from year to year.  There’s always a green, greenish yellow, yellow, orange, and red.  How am I to ascertain growth or change when I must constantly refer to the legend?  Sure there’s something to say about comparing country to country, but it loses alignment once you start paginating through the years.

Moving past my general take on the visualization – there were certain things I picked up on and wanted to carry forward on my makeover.  The first was the value out of 100 people.  Because I noticed that the color legend was increasing year to year, this meant that overall number of users was increasing.  Similarly, when thinking about comparing the countries, coloration changed, meaning ranks were changing.

I’ll tell you – my mind was originally drawn to the idea of 3 slope charts sitting next to each other.  One representing the first 5 years, then next 5 years, and so on.  Each country as a line.  Well that wasn’t really possible because the data has 1990 to 2000 as the first set of years – so I went down the path of the first 10 years.  It doesn’t tell me much other than something somewhat obvious: internet usage exploded from 1990 to 2000.

Here’s how the full set would have maybe played out:

This is perhaps a bit more interesting, but my mind doesn’t like the 10 year gap between 1990 and 2000, five year gaps from 2000 to 2010, and then annual measurements from 2010 to 2015 (that I didn’t include on this chart).  More to the point, it seems to me that 2000 may be a better starting measurement point.  And it created the inflection point of my narrative.

Looking at this chart – I went ahead and decided my narrative would be to understand not only how much more internet usage there is per country, but to also demonstrate how certain countries have grown throughout the time periods.  I limited the data set to the top 50 in 2015 to eliminate some of the data noise (there were 196 members in the country domain, when I cut it to 100 there were still some 0s in 2000).

To help demonstrate that usage was just overall more prolific, I developed a consistent dimension to block out number of users.  So as you read it – it goes from light gray to blue depending on the value.  The point being that as we get nearer in time, there’s more dark blue, no light gray.

And then I went the route of a bump chart to show how the ranks have changed.  Norway had been at the top of the charts, now it’s Iceland.  When you hover over the lines you can see what happened.  And in some cases it makes sense, a country is already dominating usage, increasing can only go so far.

But there are some amazing stories that can unfold in this data set: check out Andorra.  It went from #33 all the way up to #3.

You can take this visualization and step back into different years and benchmark each country on how prolific internet usage was during the time.  And do direct peer comparatives to boot.

This one deserves time spent focused on the interactivity of the visualization.  That’s part of the reason why it is so dense at first glance.  I’m intentionally trying to get the end user to see 3 things up front: overall internet usage in 2000 (by size and color encoding) and the starting rank of countries, the overall global increase in internet usage (demonstrated by coloration change over the spans), and then who the current usage leader is.

Take some time to play with the visualization here.