#Workout Wednesday – Fiscal Years + Running Sums

As a big advocate of #WorkoutWednesday I am excited to see that it is continuing on in 2018.  I champion the initiative because it offers people a constructive way to problem solve, learn, and grow using Tableau.

I was listening to this lecture yesterday and there was a great snippet “context is required to spark curiosity.”  We see this over and over again in our domain – everyone wants to solve problems, but unless there is a presented problem it can be hard to channel energy and constructively approach something.

Last pre-context paragraph before getting into the weeds of the build.  I enjoy the exercise of explaining how something works.  It helps cement in my mind concepts and techniques used during the build.  Being able to explain the “why” and “how” are crucial.

Let’s get started.

High level problem statement or requirement: build out a dashboard that shows the running total of sales and allows for the user to dynamically change the start of a fiscal year.  The date axis should begin at the first month of the fiscal year. Here’s the embedded tweet for the challenge:

 

Okay – so the challenge is set. Now on to the build. I like to tackle as many of the small problems that I know the answer to immediately. Once I get a framework of things to build from I can then work through making adjustments to get to the end goal.

  • numeric parameter 1 to 12 for the fiscal year start
  • calculation to push a date to the right fiscal year
  • dimension for the fiscal year
  • transformation of date to the correct year
  • running sum of sales

I’ll spare you the parameter build and go directly into the calculations that deal with the fiscal year components.

Logic as follows – if the month of the order date is less than the fiscal year start, then subtract one from the year, otherwise it’s the year.  I can immediately use this as a dimension on color to break apart the data.

The next step would be to use that newly defined year with other elements of the original date to complete the “fiscal year” transformation.  Based on the tooltips – the year for each order date should be the fiscal year.

Now that the foundation is in place, the harder part is building a continuous axis, and particularly a continuous axis of time.  Dates span from 2013 to 2017 (depending on how you’ve got your FY set up), so if we plotted all the native dates we’d expect it to go from 2013 to 2017.  But that’s not really want we want.  We want a timeline that spans a single year (or more appropriately 365 days).

So my first step was to build out a dummy date that had the SAME year for all the dates.  The dates are already broken out with the FY, so as long as the year isn’t shown on the continuous axis, it will display correctly.  Here’s my first pass at the date calculation:

That gets me the ability to produce this chart – which is SO CLOSE!

The part that isn’t working is my continuous axis of time is always starting at 1/1.  And that makes sense because all the dates are for the same year and there’s always 1/1 data in there.  The takeaway: ordering the dates is what I need to figure out.

The workaround?  Well the time needs to span more than one year and specifically the start of the axis should be “earlier” in time.  To achieve this – instead of hard coding a single year 2000 in my case, I changed the dummy year to be dependent on the parameter.  Here’s the result:

Basically offset everything that’s less than the chosen FY start to the next year (put it at the end).  (Remember that we’re already slicing apart the data to the correct FY using the previous calculations.)  The combination of these two calculations then changes the chart to this guy which is now mere formatting steps away from completion.

The full workbook is available to download here.

Tableau + Python + IoT

You’ve asked and it’s time to answer.  About a week ago I posted a teaser video on Twitter that showed how I integrated my home automation system with a Tableau dashboard.  The outcome of the video was dynamically changing three different lights in conjunction with KPI results.

So beyond the “hey this is really cool” factor of connecting lighting fixtures (or anything else for that matter) to a Tableau dashboard – there are some great practical reasons to do this.  A few great replies to the tweet were of the same sentiment of the thoughts coursing through my mind.  Use something like a light to easily indicate a status – it’s that step back from the KPI shape or the BAN (big ass number).  It’s something visible that you can glance at to immediately understand performance.

Next comes the technical components of how this gets put into place.  To achieve integration between my home automation system and Tableau I used Python.  It serves as a great integrator and took an idea to proof of concept quickly.  Python benefits this situation because several people have developed libraries and published them on GitHub to share.  And the other reason why Python was a great natural choice is because of TabPy.  TabPy allows you to connect to a Python server and run native Python code through calculated fields.  These calculated fields are served up on a Tableau sheet as a table calculations.  And in general when you’re using Python with Tableau it’s to get a number/string/boolean result out.

So just because you are supposed to use TabPy for something doesn’t mean you can’t extend it’s capabilities and use it in a completely different manner.  And that’s exactly what this integration project was about.

This project is also about taking a continuous project I have around home automation and folding in something new I learned (Python) with something I already love (Tableau).

My final caveat is this: if you’re looking to emulate what I did you’re going to need some hardware aka Philips Hue lights and a bridge.  In theory you can change this out with the home automation pieces of your choice.  If you’re unfamiliar with what home automation is – it basically works by having a controller (often called a bridge) and then different connected elements.  The controller is responsible for issuing commands to things like lightbulbs, locks, and your thermostat.  It is also the vehicle that takes how you interface with your home automation system (which is typically a mobile app) and translates those commands, routines, or programs to the elements in the system.

And specific to this project, I’m interfacing with my bridge (controller) using a Python library.  The library I chose to use was Phue, a Python library designed to fully interact with the Hue API.

First I downloaded and installed Phue into my Tableau-Python-Server environment (aka the one used for TabPy).  That allows me to call the library in my Python scripts that are passed through Tableau.  During the install process I also ran a simple script to connect the library to my bridge – a necessary dependency to get this all to work.  It makes sense that you’d have to be able to establish a link between the communication component and the lights you want to change.

The next parts are all reading through documentation, trial and error with Python, and fun with Tableau.  Here’s the final calculated fields that are driving results:

Profit Ratio KPI – this is something we’re all used to.  I’ve taken the basic Profit Ratio calculation and had it evaluate against a parameter.  Based on the evaluation the result will be either 0 or 25000.  The numbers 0 and 25000 correspond to the color hue input required for the lights.  0 is red, 25000 is green.

And here’s the calculated field that is pure Python.  In this scenario I’m not actually using Python to crunch anything and return a result, so you’ll notice that the return argument is a dummy number 1.

You will see it in the calculated field description in the editor within Tableau – to use SCRIPT_INT (or any of the SCRIPT functions) the format is: put your code in quotes and any inputs or arguments being used are called using _arg1, _arg2, _argN.  Those arguments follow the comma after the script.  In my example I have one argument, [Profit Ratio KPI] and that is being assigned to a in the code.

  • The basic outline of the code is as follows:
  • set [Profit Ratio KPI] to be a
  • import phue library so we can use it
  • use my bridge (that’s the bridge’s IP address on my internal network)
  • set the saturation of the lights to 254 – this is extra code, but ensures that the light has full saturation
  • set the light hue to the value of a
  • return 1 – just for fun!

If you’re extra interested, the functions from Phue require you to select which light you’re going to be controlling.  The Profit Ratio KPI light was number 25.

A trial and error moment here.  As I was testing this code I had a Jupyter notebook up and was entering the same code snippets to test it outside of Tableau.  Under normal circumstances I was able to assign a number to a without any issue, execute the code, and have the lights change colors.  Inside of the Tableau space I had to call the first index “[0]” of the argument.  I haven’t looked through the documentation for TabPy in enough detail, but it leads me to believe that arguments come in as lists instead of single variables.  So to isolate out the one result I wanted, I had to specify which element I wanted in my list of one variable.  So my KPI result is really like [25000] in Python notation and has to be called as _arg1[0] (non-Python folk: Python indexes start at 0).

Literally everything else about this was setup and build of the Tableau components.  Building out the BANs and adding parameters to dynamically input the goal variables.  Everything else was connected and working after the correct code was entered.

Here’s what one of the BAN sheets looks like:

The calculated field is present on detail, but not used anywhere else in the view.  (Which if you hovered over a default tooltip would show “1”).

Once the visualizations are set it’s just a matter of interacting with the dashboard to start seeing this integration take effect.  You’ll notice during the video that it’s not super fast – there is definitely an element of time.  And that makes sense – there is a bunch of overhead to take an argument from Tableau that has to be computed, push it through a Python server, and wait for a result to return.

This is a beautifully simple integration and demonstration of how to take the mantra of seeing and understanding data to the physical world.  With a few lines of code data-driven performance can exist in the physical world.

The value of Viz Club

I’ve been very interested in pursuing a concept I originally saw in my twitter feed, a picture of Chris Love at a pub with a blurb about ‘Viz Club.’  Following it further, I want to say that there were some additional details about the logistics: but the concept was simple. A few people get together and collaborate on data viz.

Fast forward to the past 8 months and this has been a concept that has wracked my brain. The Phoenix TUG has space once a month on a Saturday. This had originally started as a “workshop” time slot, but as things evolved it didn’t feel quite right.  There was an overwhelming expectation to have curriculum and a survey of how effective the instruction was. Now let me say this: there is absolutely a time and place for learning Tableau and I am all about empowering individuals with the skills to use Tableau.  However, a voluntary community workshop should not be a surrogate for corporate sponsored training and isn’t conducive to building out a passion and enthusiasm filled community.

Anyway – the workshops weren’t right. They weren’t attracting the right types of people and more importantly it was isolating folks who were more advanced in their journey.  Essentially it needed a format change, one that could be accepting of all stages of the journey and one that attracted the passion.

Thus the idea kept echoing in my mind: Viz Club.

So coming out of April, the leap was made. I’ve been facilitating a series called “Get Tableau Fit” where I reconstruct one or two Workout Wednesday challenges during the first half of the TUG.  People were starting to see the value in the exercises (crazy that it took them so long!) and were getting more and more interested in trying them.  Interested, but still with a small amount of trepidation to participate. The final barrier for folks to participate?  Perhaps it was finding structured time.  Being officially assigned a task and a time to do the task.

Hence Viz Club.

So we held our first club session on Saturday, and I have to say it was very successful. I was very curious to the types of people who would show up, what their problems would be, and how the flow would play out. I couldn’t have been happier with the results.

From a personal productivity standpoint, it was well worth the time investment.  There’s something about being in a collaborative and open environment that eliminates potential mental blocks.  Within the first hour of Viz Club I had done one of my catch-up #MakeoverMonday vizzes.

More important than my productivity was the productivity in the room.  Throughout the day questions were being asked and answered.  One specific community member took a concept all the way through to a dashboard – significant breakthroughs within the time box of our 5 hours together.  I know everyone involved felt the power of the time we spent together and it’s something we’re going to continue.

Dealing with data density

Recently I was on a project that involved working with data centered around one value. You can imagine the type: something where there is an intended value, or a maximum expected value. A good example from the Superstore data set may be something like “days between order date and ship date.”

Typically you’ll come in to a data set of this type and the first thing to do is try to survey it. Describe it in a visual format to set the foundation for your data story audience.  Traditionally you’d probably want to go for something like a histogram, but a histogram is going to look really pointless. I’m sure you can imagine it now, 95-99% of data centered on one INTEGER value, the rest potentially obscuring the scene due to poor data entry or capture. Not the best way to start your story.

What’s potentially worse than the density around a given number is the notion that you’re looking for something within the confines of a narrow range that may be worth investigating further. So in this world, an ugly histogram isn’t going to drive you down the path of further enlightenment or relevant question asking and answering.

To combat the curse of this type of data, I went on a mission to make some alternative visualizations. Those beyond the histogram and boxplot- my self admitted favorite tools to quickly understand data sets.

My first choice was to add jitter. Jittering is a great tool that helps to showcase the density of data. You’re breaking up a traditional box or dot plot where values are plotted at the same point and adding a position dimension to the open axis. It can be really helpful for opening conversations with those seeing their data for the first time. And even more powerful when you add on a dimension to show small multiples of the same measurement and how the numeric outputs are influenced by the segmentation of your chosen dimension.

After attempting jittering, I noticed that because of the nature of my data – the natural indexing that was applied got skewed by the unique key I was indexing on.  Essentially there was an opportunity for an unintended inference from the jitter.  This led me to do restart the index for every value, making a histogram next to a box plot.  The dots were stacked on each other, mimicking bars.  This was pretty effective.  The centered nature of the data could be easily felt.

Last move was to make something even more abstract – what you may think of as a violin plot.  How I attempted this was from the perspective of a dot plot on an X axis.  However each dot is the binned result of the data values, with the number of data points within that bin representing the dot’s size.  I added on some transparency for additional visual aid – I think this can either be done from a reinforce the size of the dot perspective, or if your X axis has a natural good vs. bad association.  Harping back on the example of turn around time, you could envision that if you went from in control to out of control how powerful it could be.  I also added on different percentile lines to help affix the data spread.

Consider this post a draft in progress that will be updated with visuals.

Makeover Monday Week 6 – Chicago Taxis

This week’s data set presented itself with a new and unique challenge – 100 million plus records and a slight nod to #IronViz 2016.

 

Yep – Taxi data and lots of it, this time originating from Chicago.  The city of Chicago recently released 2013 to 2016 data on taxi rides and the kind folks at EXASOL took the opportunity to ingest the massive data set and make it available for #MakeoverMonday.  Synergy all around is felt: from EXASOL getting free visualizations, to data vizzers getting the opportunity to work in a data storage platform designed to make massive data vizzing possible, to understanding the unique challenges seen once you get such an explosively large data set into Tableau.

So how did I take on this challenge and what are some things I learned?  Well, first I had an inkling going in that this was going to be a difficult analysis.  I had some insight from an #IronViz contestant that maybe taxi data isn’t super interesting (and who would expect it to be).  There’s usual things we all likely see that mirror human events (spikes around holidays, data collected around popular locations, increases in $$), but beyond that it can be tough to approach this data set looking for a ‘wow’ moment.

To offset this, my approach was exploration.  Organize, separate, quantify, and educate the data interactor into the world of Chicago taxis.  Let’s use the size of the aggregations to understand what Chicago is like.  Essentially my end design was built to allow someone to safely interact with the data and get a sense for how things change based on several (strategically) predetermined dimensions.

The dimensions I chose to stick with for this visualization: focus on the community areas (pick up and drop off points), separate shorter trip lengths into 5 mile increments, and bucket time into 6 different chunks throughout the day.  (Admittedly the chunking of time was a rip off my own week 3 makeover where I used the same calculation to bucket time.)

Having the basic parameters of how the data was being organized, it was time to go about setting up the view.  I tried several combinations of both size, color, shape to facilitate a view I was committed to – using circles to represent data points for pickup locations by some time constraint.  I arrived at this as the final:

 

What I really like about it: red exposes the maximum for each year and ride length.  Demonstrates very quickly where “most rides” are.  There’s visual interest in the 10 to 15 mile range category.  There’s interest in the blank space – the void of data.  And this also makes a great facilitator of interactivity.  Consider each circle a button enticing you to click and use it to filter additional components on the right side.

Next were the burly maps.  Word of caution: do not use the unique identifier “trip id” and go plotting points.  You’ll get stuck quickly.  I’m not saying it is impossible, but I think to render the initial visualization it would take about an hour (based on my trying).  So instead of going to the trip id level, I took a few stabs at getting granular.  First step was, let’s just plot each individual day (using average lat/lon for pickup/dropoff spots).  This plotted well.  Then it was a continuation of adding in more and more detail to get something that sufficiently represented the data set, without crippling the entire work product.  I accomplished this by adding in both the company ID (a dimension of 121 members) and the accompanying areas (pickup/drop off area, 78 member dimension).  This created the final effect and should represent the data with a decent amount of precision.

Finally I went ahead and added in a few more visually interesting dot plots that highlighted maximums for specific areas.  This is ideally used in conjunction with the large table at left to start gathering more understanding.

I have to say – I am somewhat pleased with the result of this dashboard.  I’m not sure it will speak to everyone (thinking the massive left side table may turn folks off or have whispers of “wasted opportunity” to encode MORE information).  However – I am committed to the simplicity of the display.  It accomplishes for me something I was aiming to achieve – understanding several different geographical locales of Chicago and making something that could stand alone and provide “aha” moments.  In particular I’d like to know why Garfield Ridge overtook O’Hare for 10-15 mile fares in ’14 and ’15.  One for Google to answer.

Makeover Monday 2017 – Week 4 New Zealand Tourism

This week’s Makeover was addressing Domestic and International tourism trend in New Zealand.  No commentary provided with the data set, the original was just 2 charts left to the user to interpret.  See Eva’s tweet for the originals:

Going back to basics this week with what I like and dislike about it:

  • Titles are clear, bar chart isn’t too busy (like)
  • Not too many grid lines (like)
  • It’s easy to see the shape of the data and seasonality (like)
  • The scales are different between International & Domestic (dislike)
  • 3 years for easy comparison (like)
  • Eva chose this to promote her home country (like)

I think this was a good data set for week 4.  No data story to rewrite, special attention was made by Eva to mitigate data misinterpretation, and she added on a bonus of geospatial data for New Zealand.

My process really began with the geospatial part.  I haven’t yet had a chance to work with geospatial particulars developed and appended to data sets.  My experience in this has been limited to using Tableau’s functionality to manually add in latitude and longitude for unclear/missing/invalid data points.

So as I got started, I had no idea how to use the data.  There were a few fields that certainly pointed me in the right direction.  The first was “Point Order.”  Immediately I figured that needed to be used on “path” to determine where each data point fell.  That got me to this really cute outlined version of NZ (which looks like an upside down boot):

So I knew something additional needed to be done to get to a filled map.  That’s when I discovered the “PolygonNumber” field.  Throwing that onto detail, changing my marks to polygon and voila – New Zealand.  Here’s a Google image result for a comparison:

Eva did a great job trying to explain how NZ is broken up in terms of regions/territories/areas, but I have to admit I got a little lost.  I think what’s clear from the two pictures is I took the most granular approach to dissecting the country.

I’m super thrilled that I got this hands on opportunity to implement.  Geospatial is one of those areas of analytics that everyone wants to go and by including this – I feel much more equipped for challenges in the future.

Next up was the top viz: I’ve been wanting to try out a barbell/DNA chart for a long time.  I’ve made them in the past, but nothing that’s landed in a final viz.  I felt like there was an opportunity to try this out with the data set based on the original charts.  I quickly threw that together (using Andy Kriebel’s video tutorial) and really enjoyed the pattern that emerged.

The shape of the data really is what kicked off the path that the final viz ended up taking on.  I liked the stratification of domestic vs. international and wanted to carry that throughout.  This is also where I chose a colors.

The bottom left chart started out it’s life as a slope chart.  I originally did the first data point vs. the last data point (January 2008 vs. April 2016) for both of the types of tourism.  It came out to be VERY misleading – international had plummeted.  When I switched over to an annual aggregation, the story was much different.

International NZ tourism is failing!
Things look less scary and International is improving!

Good lesson in making sure to holistically look at the data.  Not to go super macro and get it wrong.  Find the right level of aggregation that keeps the message intact.

The last viz was really taking the geospatial component and adding in the tourism part.  I am on a small multiples kick and loved the novelty of having NZ on there more than once.  Knowing that I could repeat the colors again by doing a dual axis map got me sold.

All that was left was to add in interactivity.  Interactivity that originally was based on the barbell and line chart for the maps, but wasn’t quite clear.  I HATE filter drop downs for something that is going to be a static presentation (Twitter picture), so I wanted to come up with a way to give the user a filter option for the maps (because the shading does change over time), but have it be less tied to the static companion vizzes.  This is where I decided to make a nice filter sheet of the years and drop a nice diverging color gradient to add a little more beauty.  I’m really pleased with how that turned out.

My last little cute moment is the data sourcing.  The URLs are gigantic and cluttered the viz.  So instead I made a basic sheet with URL actions to quickly get to both data sets.

A fun week and one that I topped off by spelling Tourism wrong in the initial Tweet (haha).  Have to keep things fun and not super serious.

Full dashboard here.

Blog post ideas

Ann

  • How Apple Watch has changed my life
  • How tracking calories has changed my life
  • Why is my life always changing (har har)
  • New books
  • Walk through improvements on micro viz
  • Revamp D+W posts
  • Tableau – continuous fields from discrete values for heat maps

Josh

  • Starting a home automation system
  • Hobby photography
  • Stretch project  – IFTTT our data
  • FLIR