Tag: statistics

  • Dynamic Quantile Map Coloring in Tableau Desktop

    Dynamic Quantile Map Coloring in Tableau Desktop

    Last week at Tableau’s customer conference (TC18) in New Orleans I had the pleasure of speaking in three different sessions, all extremely hands on in Tableau Desktop.  Two of the sessions were focused exclusively on tips and tricks (to make you smarter and faster), so I wanted to take the time to slow down and share with you the how of my favorite mapping tip.  And that tip just so happens to be: how to create dynamic coloring based on quantiles for maps.

    First, a refresher on what quantiles are.  Quantiles are points that you can make within a data distribution to evenly cut it into equal intervals.  The most popular of the quantiles is the quartile, which partitions out data into 0 to 25%, 25 to 50%, 50 to 75%, and 75 to 100%.  We see quartiles all the time with boxplots and it’s something we’re quite comfortable with.  The reason the quantile is valuable is that it lines up all the measurements from smallest to largest and buckets them into groups – so when you use something like color, it no longer represents the actual value of a measurement, but instead the bucket (quantile) that the measurement falls into.  These are particularly useful when measurements are either widely dispersed or very tightly packed.

    Here’s my starting point example – this is a map showing the number of businesses per US county circa 2016.

    The range of number of businesses per county is quite large, going from 1 all the way to about 270k.  And since there is such a wide variety in my data set, it’s hard to understand more nuanced trends or truly answer the question “which areas in the US have more businesses?”

    A good first step would be to normalize by the population to create a per capita measurement.  Here’s the updated visualization – notice that while it’s improved, now I’m running into a new issue – all my color is concentrated around the middle.

    The trend or data story has changed, my eyes are now drawn toward the dark blue in Colorado and Wyoming, but I am still having a hard time drawing distinctions and giving direction on my question of “which areas in the US have the most businesses?”

    So as a final step I can adjust my measurements to percentiles and bucket them into quantiles.  Here’s the same normalized data set now turned into quartiles.

    I now have 4 distinct color buckets and a much richer data display to answer my question.  Furthermore I can make the legend dynamic (leading back to the title of this blog post) by using a parameter.  The process to make the quantiles dynamic involves 3 steps:

    1. Turn your original metric (the normalized per capita in my example) into a percentile by creating a “Percentile” Quick Table Calculation.  Save the percentile calculation for later use.

    2. Determine what quantiles you will allow (I chose between 4 and 10).  Create an integer parameter that matches your specification.

    3. Create a calculated field that will bucket your data into the desired quantile based on the parameter.

    You’ll notice that the Quantile Color calculation depends on the number of quantiles in your parameter and will need to be adjusted if you go above 10.

    Now you have all the pieces in place to make your dynamic quantile color legend.  Here’s a quick animation showing the progression from quartiles to deciles.

    The next time you have data where you’re using color to represent a measure (particularly on a map) and you’re not finding much value in the visual, consider creating static or dynamic quantiles.  You’ll be able to unearth hidden insights and help segment your data to make it easier to focus on the interesting parts.

    And if you’re interested in downloading the workbook you can find it here on my Tableau Public.

     

  • Statistical Process Control Charts

    I’ve had this idea for a while now – create a blog post and video tutorial discussing what Statistical Process Control is and how to use different Control Chart “tests” in Tableau.

    I’ve spent a significant portion of my professional career in business process improvement and always like it when I can integrate techniques learned from a discipline derived from industrial engineering and apply it in a broader sense.

    It also gives me a great chance to brush up on my knowledge and learn how to order my thoughts for presenting to a wide audience.  And let’s not forget: an opportunity to showcase data visualization and Tableau as the delivery mechanism of these insights to my end users.

    So why Statistical Process Control?  Well it’s a great way to use the data you have and apply different tests to start early detection.  Several of the rules out there are aimed at finding “out-of-control,” non-normal, or repetitive parts within a stream of data.  Different rules have been developed based on how we might be able to detect them.

    The video tutorial above goes through the first 3 Western Electric rules.  Full details on Western Electric via Wikipedia: here.

    Rule 1: Very basic, uses the principle of a bell curve to put a spotlight on points that are above or below the Upper Control Limit (UCL) or Lower Control Limit (LCL) also known as +/- 3 standard deviations from the mean.  These are essentially outlier data points that don’t fall within our typical span of 99.7%.

    Rule 2: Takes into consideration surrounding observations.  Looking at 3 consecutive observations are 2 out of 3 above or below the 2 SD mark from the average.  In this rule the observations must be on the same side of the average line when beyond 2 SD.  Since we’re at 95% at 2 SD, having 2 out of 3 in a set in that range could signal an issue.

    Rule 3: Starts to consider even more data points within a collection of observations.  In this scenario we’re now looking for 4 out of 5 observations +/- 1 SD from the average.  Again, we’re retaining the positioning above/below the average line throughout the 5 points.  This one really shows the emergence of a trend.

    I applied the first 3 rules to my own calorie data to see detect any potential issues.  It’s very interesting to see the results.  For my own particular data set, Rule 3 was of significant value.  Having it in line as the new daily data funnels in could prevent me from going on a “streak” of either over or under consuming.

     

    Interact with the full version on my Tableau Public profile here.

  • Funnel Plots

    As I continue to read through Stephen Few’s “Signal: Understanding What Matters in a World of Noise” there have been some new charts or techniques I’ve come across.

    In an attempt to understand their purpose on a deeper level (and implement them in my professional life), I’m on a mission to recreate them in Tableau.

    First up is a funnel plot. Stephen explains that funnel plots are good when we may need to adjust something before an accurate comparison can be made. In the example video, I adjust how we’re looking at the average profit per item on a given order compared to all of the orders.

    What’s interesting is that in tandem with this exercise, I’m working on an quantitative analysis class for my MBA, so there was quite a bit of intersection. I actually quickly pulled the confidence interval calculation (in particular the standard error equation) from the coursework.

    I find that overall statistical jargon is really sub-par in explaining what is going on, and all the resources I used left me oscillating between “oh I totally get this” and “I have no idea what this means.” To that end, I’m open to any comments or feedback to the verbiage used in the video or expert knowledge you’d like to share.

    Link to full workbook on Tableau public for calculated fields: https://public.tableau.com/views/FunnelPlot10_2_16/Results?:embed=y&:display_count=yes