A follow up to The Women of #IronViz

It’s now 5 days removed from the Tableau Conference (#data17) and the topic of women in data visualization and the particularly pointed topic of women competing in Tableau’s #IronViz competition is still fresh on everyone’s mind.

First – I think it’s important to recognize how awesome the community reception of this topic has been.  Putting together a visualization that highlights a certain subsection of our community is not without risk.  While going through the build process, I wanted to keep the visualization in the vein of highlighting the contributions of women in the community.  It wasn’t meant to be selective or exclusive, instead, a visual display of something I was interested in understanding more about.  Despite being 5 days removed from the conference, the conversations I’ve been involved in (and observed from a distance) have all remained inclusive and positive.  I’ve seen plenty of people searching for understanding and hunting for more data points.  I’ve also seen a lot of collaboration around solutions and collecting the data we all seek.  What I’m thankful that I have not witnessed is blame or avoidance.  In my mind this speaks volumes to the brilliant and refined members of our community and their general openness and acceptance of change, feedback, and improvement.

One thing circling the rounds that I felt compelled to iterate off of, is @visualibrarian’s recent blog post that has interview style questions and answers around the topic.  I am a big believer in self reflection and exploration and was drawn to her call to action (maybe it was the punny and sarcastic nature of the ask) to answer the questions she put forth.

1. Tell me about yourself. What is your professional background? When did you participate in Iron Viz?

My professional background is that of a data analyst.  Although I have a bachelor’s degree in Mathematics, my first professional role was as a Pharmacy Technician entering prescriptions.  That quickly morphed into someone dedicated to reducing prescription entry errors and built on itself over and over to be put in roles like quality improvement and process engineering.  I’ve always been very reliant on data and data communication (in my early days as PowerPoint) to help change people and processes.  About 2 or 3 years ago I got fed up with being the business user at the mercy of traditional data management or data owners and decided to brute force my way into the “IT” side of things.  I was drawn to doing more with data and having better access to it.  Fast-forward to the role I’ve had for a little over 8 months as a Data Visualization Consultant.  Which essentially means I spend a significant amount of my time partnering with organizations to either enable them to use visual analytics, improve the platforms that they are currently using, or overcoming any developmental obstacles they may have.  It also means I spend a significant amount of time championing the power of data visualization and sharing “best practices” on the topic.  I often call myself a “data champion” because I seek simply to be the voice of the data sets I’m working with.  I’m there to help people understand what they’re seeing.

In terms of Iron Viz – I first participated in 2016’s 3rd round feeder, Mobile Iron Viz.  I’ve since participated in every feeder round since.  And that’s the general plan on my end, continue to participate until I make it on stage or they tell me to stop 🙂

2. Is Tableau a part of your job/professional identity?

Yes – see answer to question #1.  It’s pretty much my main jam right now.  But I want to be very clear on this point – I consider my trade visual analytics, data visualization, and data analytics.  Tableau is to me the BEST tool to use within my trade.  By no means the only tool I use, but the most important one for my role.

3. How did you find out about Iron Viz?

When I first started getting more deeply involved in my local User Group, I found out about the competition.  Over time I became the leader of my user group and a natural advocate for the competition.  Once I became a part of the social community (via Twitter) it was easy to keep up with the ins and outs of the competition.

4. Did you have any reservations about participating in Iron Viz?

Absolutely – I still have reservations.  The first one I participated in was sort of on the off chance because I found something that I want to re-visualize in a very pared down elegant, simplistic way.  I ended up putting together the visualization in a very short period of time and after comparing it to the other entries I felt my entry was very out of place.  I tend to shy away from putting text heavy explanations within my visualizations, so I’ve felt very self-conscience that my designs don’t score well on “story telling.”  It was also very hard in 2016 and the beginning of 2017.  Votes were based off of Twitter.  You could literally search for your hashtag and see how many people liked your viz.  It’s a very humbling and crushing experience when you don’t see any tweets in your favor.

5. Talk me through your favorite submission to Iron Viz. What did you like about it? Why?

Ah – they are all my favorite for different reasons.  For each entry I’ve always remained committed and deeply involved in what the data represents.  Independent of social response, I have always been very proud of everything I’ve developed.  For no other reason than the challenge of understanding a data set further and for bringing a new way to visually display it.  My mobile entry was devastatingly simple – I love it to death because it is so pared down (the mobile version).  For geospatial I made custom shapes for each of the different diamond grades.  It’s something I don’t think anyone in the world knows I did – and for me it really brought home the lack of interest I have in diamonds as rare coveted items.

6. What else do you remember about participating in Iron Viz?

The general anxiety around it.  For geospatial 2017 I procrastinated around the topic so much.  My parents actually came to visit me and I took time away from being with them to complete.  I remember my mom consoling me because I was so adamant that I needed to participate.

Safari and Silver Screen were different experiences for me.  I immediately locked in on data sets on subjects I loved, so there was less stress.  When I did the Star Trek entry I focused on look and feel of the design and was so stoked that the data set even existed.  Right now I am watching The Next Generation nightly and I go back to that visualization to see how it compares to my actual perception of each episode (in terms of speaking pace and flow).

7. Which Iron Viz competitions did you participate in, and why?

Everything since 2016 feeder round 3.  I felt a personal obligation and an obligation to my community to participate.  It was also a great way for me to practice a lot of what I tell others – face your fears and greet them as an awesome challenge.  Remain enthusiastic and excited about the unknown.  It’s not always easy to practice, but it makes the results so worth it.

8. What competitions did you not participate in, and why?

Anything before mobile – and only because I (most likely) didn’t know about it.  Or maybe more appropriately stated – I wasn’t connected enough to the community to know of it’s existence or how to participate.

9. Do you participate in any other (non Iron Viz) Tableau community events?

Yes – I participate in #MakeoverMonday and #WorkoutWednesday.  My goal for the end of 2017 is to have all 52 for each completed.  Admittedly I am a bit off track right now, but I plan on closing that gap soon.  I also participate in #VizForSocialGood and have participated in past User Group viz contests.  I like to collect things and am a completionist – so these are initiatives that I’ve easily gotten hooked on.  I’ve also reaped so many benefits from participation.  Not just the growth that’s occurred, but the opportunity to connect with like-minded individuals across the globe.  It’s given me the opportunity to have peers that can challenge me and to be surrounded by folks that I aspire to be more like.  It keeps me excited about getting better and knowing more about our field.  It’s a much richer and deeper environment than I have ever found working within a single organization.

10. Do you have any suggestions for improving representation in Iron Viz?

  • Make it more representative of the actual stage contest
  • Single data set
  • Everyone submits on the same day
  • People don’t tweet or reveal submissions until contest closes
  • Judges provide scoring results to individual participants
  • The opportunity to present analysis/results, the “why”
  • Blind submissions – don’t reveal participants until results are posted
  • Incentives for participation!  It would be nice to have swag or badges or a gallery of all the submissions afterward

And in case you just came here to see the visualization that’s set as the featured image, here’s the link.

#data17 Recap – A quick top 5

Now that Tableau Conference 2017 has come to a close it’s time to reflect back on my favorite and most memorable moments.  I’ll preface by saying that I had very lofty goals for this conference.  It started after #data16 – immediately after the conference I did some significant thought work on what I wanted my year to look like and HOW I wanted to shape it.  It began by deeply examining my why.  My why is a personal mission that transcends my professional career.  I firmly believe that visualizing data is the BEST way to get closer to the truth and to grow, learn, and improve.  I also strongly feel that every individual should use analytics in making decisions.

Without further ado – here’s my top 5.

#1 – Participating in #MakeoverMonday live

This is my number one because it represents the culmination of a lot of my personal growth and effort this year.  Coming out of #data16 I committed myself to doing every #MakeoverMonday and most specifically participating in this event.  I’ll admit I have a few weeks still outstanding, but I’m on track to have a full portfolio by the end of 2017.  This was also the moment I was most anxious about.  Would I be able to develop something in an hour AND feel comfortable enough to share it with a large audience?  Well at the end of the hour I accomplished everything I had been anticipating and more.  With the support of the amazing #MakeoverMonday family and those around me I got up and presented my viz.  And to boot it became a Viz of the Day (VOTD) the following day.  Talk about a crowning achievement.  Taking something that I had a bit of nerves about and turning it into a super proud moment.

#2 – Community Battle: 60 Second Viz Tips

I think the picture says it all.  A rapid fire tips battle at 9 am the day after Data Night Out?  YES please.  This session was an unbelievable joy to participate in.  Birthed out of the minds of the amazing London Tableau User Group and brought to the sessions of conference.  As if by fate the first community folks that found me when I dropped in on Vegas were none other than Paul Chapman and Lorna Eden.  I can’t be more grateful for the opportunity to contribute to the conference.  And let’s not forget the trophy guys.  This is a new tradition I’d love to see carry out into the conferences to come.

#3 – The Vizzies!

A pure demonstration of the awesome Tableau community that exists globally.  I was so honored to be recognized this year as a Community Leader.  Just take a look at the amazing folks that I am so thankful to be surrounded by.  More than being recognized, the fact that the community was so prominent is unbelievable.  I couldn’t go anywhere without being stopped and having conversations, hugs, or smiles shared between community members.

#4 – Iron Viz!

As I always say – the pinnacle when it comes to Tableau.  The chance to see what 20 minutes of extreme vizzing looks like from the front row.  This one is near and dear to my heart because I submitted for each of the feeder contests this year.  I’d love the opportunity to get up on the big stage and participate, but barring that – it’s an enthusiast’s dream to see the excitement play out on the big stage.

#5 – Fanalytics

Although it is coming in at my #5, this is probably the highest impact moment of conference.  A 3+ hour conversation started by important members of the Tableau community talking about their journey and growth.  It ended with me facilitating one of 8 conversations about important topics the community is facing.  Mine was focused on female participation in #IronViz.  What was interesting about this was the massive feeling that we needed more data to wrap our arms around the topic.  And this has become my first action item post-conference.  I wanted to extend the conversation beyond the round table of remaining conference goers and into the broader community.

I’ve been so inspired, impressed, and energized by all the community and people I encountered over the past week.  I can’t wait to see what the next 12 months look like.

And now that I’ve provided my top 5, I’m curious – what are your top #data17 moments?

#MakeoverMonday Week 25 | Maricopa County Ozone Readings

We had another giant data set this week – 202 million records of EPA Ozone readings across the United States.  The giant data set is generously hosted by Exasol.  I encourage you to register here to gain access to the data.

The heart of the data is pretty straight forward – PPM readings across several sites around the nation for the past 25+ years.  As I went through and browsed the data set, it’s easy to see that there are multiple readings per site per day.  Here’s the basic data model:

Parameter Name only has Ozone, Units of Measure only has Parts per million.  There is one little tweak to this data set – the Datum field.  Now this wasn’t a familiar term for me, so I described the domain to see what it had.

I know exactly what one of these 4 things means (beyond Unknown) – that’s WGS84.  I was literally at the Alteryx Inspire conference two weeks ago and in a Spatial Analytics session where people were talking about different standards for coordinate systems on Earth.  The facilitators mentioned that WGS84 was a main standard.  For fun I decided to plot the number of records for each Datum per year to see how the Lat/Lon have potentially changed in measurement over time.  Since 2012 it seems like WGS84 has dominated as the preferred standard.

So armed with that knowledge I sort of kept it in my back pocket of something I may need to be mindful of if I enter the world of mapping.

Beyond that, I had to start my focus on preparing something for Tableau Public.  202 million records unfortunately won’t sit on Public and I have to extract the data.  Naturally I did what every human would do and zeroed in on my city: Phoenix Metropolitan area aka Maricopa County.

So going through the data set there are multiple sites that are taking measurements.  And more than that, these sites are taking measurements multiple times per day.  I really wanted to express that somehow in my final visualization.  Here’s all the site averages plotted each day for the past 30 years – thanks Exasol!

So this is averaged per day per site – and you can see how much variation there is.  Some are reporting very low numbers, even zeros.  Some are very high.

If I take off the site ID, here’s what I get for the daily averages:

Notice the Y-axis – much less dramatic.  Now the EPA has the AQI measurements and it doesn’t even get into the “bad” range until 0.071 PPM (Unhealthy for Sensitive Groups).  So there’s less of a story to some extent when we take the averages.  This COULD be because of the sites in Maricopa county (maybe there are low or faulty numbers dragging down the average) or it could be because when you do the average you’re getting better precision of truth.

I’m going down this path because at this point I decided to make a decision: I wanted to look at the maximum daily measurement.  Given that these are instantaneous measurements, I felt that knowing the maximum measurement in a given day would also provide insight and value into how Ozone levels are faring.  And more specifically, knowing my region a little bit – the measurement sites could be outside of well populated areas and may naturally have lower occurring measurements.

So that was step one for me: move to the world of MAX.  This let me leverage all the site data and get going.  (Also originally I wanted to jitter and display all the sites because I thought that would be interesting – I distilled the data down further because I wasn’t getting what I wanted in terms of presentation in the end result).

Okay – next up was plotting the data.  I wanted to do a single page very dense data display that had all the years and the months and allowed for easy comparisons.  I had thought a cycle plot may be appropriate, but after trying a few combinations I didn’t see anything special about day of the week additions and noticed that the measurement really is about time of year (the month).  Secondary comparison being each year.

Now that I’ve covered that part – next up was how to plot.  Again, this originally started out its life as dots that were going to be color encoded using the AQI scale with PPM on the Y-axis.  And I almost published it that way.  But to be honest with you, I don’t know if the minutia of the PPM really matters that much.  I think that dimension defined on top of the measurement is easier for an end user to understand.  Hence my final development fork: turn the categorical result into a unit measure (1, 2, 3, 4 etc.) as a byproduct to represent height of a bar chart.  And that’s where I got really inspired.  I made “Good” -1 and “Moderate” 0.  That way anything positive on the Y-axis is a bad day.  To me this will allow you to see the streaks of bad throughout the time periods.

Close up of 2015 – I love this.  Look at those moderates just continuing the axis.  Look how clear the not so good to very bad is.  This resonates with me.

Okay – so final steps here were going to be to have a map of all the measurements at each site (again the max for each site based on the user clicking a day).  It was actually quite cute showing Phoenix more close up.  And then I was going to have national readings (max for each site upon clicking a day) as a comparison.  This would have been super awesome – here’s the picture:

So good.  And perhaps I could have kept this, but knowing I have to go to Tableau Public – it just isn’t going to handle the national data well.  So I sat on this for an evening and while I was driving to work I decided to do a marginal chart that showed the breakdown of number of days of each type.  The “why” was because it looks like things are getting better – more attention needs to be drawn to that!

So last steps ended up being to add on the marginal bar charts and then go one step further to isolate the “bad days” per year and have them be the final distilled metric at the far far right.  My thought process: scan each year, get an idea of performance, see it aggregated to the bar chart, then see the bad as a single number.  For sheer visual pleasure I decided to distill the “bad” further into one more chart.  I had a stacked bar chart to start, but didn’t like it.  I figured for the sake of artistry I could get away with the area chart and I really like the effect it brings.  You can see that the “very bad” days have become less prominent in recent years.

So that pretty much sums up the development process.  Here’s the full viz again and a comparison to the original output for Maricopa County, which echos the sentiment of my maximums – Ozone measurements are going down.

 

 

#MakeoverMonday Week 24 – The Watercolours of Tate

First – I apologize.  I did a lot of web editing this week that has led to a series of system fails.  The first was spelling the hashtag wrong.  Next I decided to re-upload the workbook and ruin the bit link.  What will be the next fail?

Anyway – to rectify the series of fails I decided that the best thing to do would be to create a blog post.  Blog posts merit new tweets and new links!

So week 24’s data was the Tate Collection, which upon click through of this link indicates it is a decent approximation of artwork housed at Tate.

Looking at the underlying data set, here’s the columns we get:

And the records:

So I started off decently excited about the fact that there were 2 URLs to leverage in the data set.  One with just a thumbnail image only and the other a full link to the asset.  However, the Tate website can’t be accessed via HTTPS, so it doesn’t work for on dashboard URLs on Tableau Public.  I guess Tableau wants us to be secure – and I respect that!

So my first idea of going the route of all float with an image in the background was out.

Now my next idea was to limit the data set.  I had originally thought to do the “Castles of Tate” – check out the number of titles:

A solid number: 2,791 works of art.  A great foundation for the underneath.  Except of course for what we knew to be true of the data: Turner.

Sigh – this bummed me out.  Apparently only Turner really likes to label works of art with “Castle.”  Same was true for River and Mountain.  Fortunately I was able to easily see that using the URL actions on Tableau Desktop (again can’t do that on Public because of security reasons):

Here is a classic Turner castle:

Now yes, it is artwork – but doesn’t necessarily evoke what I was looking to unearth in the Tate collection.

So I went another path, focusing on the medium.  There was a decent collection of watercolour (intentional European spelling).  And within that a few additional artist representations beyond our good friend Turner.

So this informed the rest of the visualization.  Lucky for me there was a decent amount of distribution date wise, both from a creation and acquisition standpoint.  This allowed me to do some really pretty things with binned time buckets.  And inspired by the Tate logo: I took a very abstract approach to the visualization this week.  The output is intentionally meant for data discovery.  I am not deriving insights for you, I’m building a view for you to explore.

One of my most favorite elements is the small multiples bubble chart.  This is not intended to aid in cognition, this is intended to be artwork of artwork.  I think that pretty much describes the entire visualization if I’m being honest.  Something that could stand alone as a picture perhaps or be drilled deep to the depths of going to each piece’s website and finding out more.

Some oddities with color I explored this week included: using an index and placing that on the color shelf with a diverging color palette (that’s what is coloring the bubble charts).  And also using modulo on the individual asset names to spark some fun visual encoding.  Better than all one color, I felt breaking up the values in a programmatic way would be fun and different.

Perhaps my most favorite of this is the top section with the bubble charts and bar charts below with the binned year ranges between.  Pure data art blots.

Here’s the full visualization on Tableau Public – I promise not to tinker further with the URLs.

#MakeoverMonday Week 22 – Internet Usage by Country

This week’s data set demonstrates the number of users per 100 people by country spanning several years.  The original data set and accompanying visualization starts as an interactive map with the ability to animate through the changing values year by year.  Additionally, the interactor can click into a country to see percentage changes or the comparative changes with multiple countries.

Channeling my inner Hans Rosling – I was drawn to play through the animation of the change by year, starting with 1960.  What sort of narrative could I see play out?

Perhaps it was the developer inside of me, but I couldn’t get over the color legend.  For the first 30 years (1960 to 1989) there’s only a few data points, all signifying zero.  Why?  Does this mean that those few countries actually measured this value in those years, or is it just bad data?  Moving past the first 30 years, my mind was starting to try and resolve the rest of usage changes.  However – here again my mind was hurt by the coloration.  The color legend shifts from year to year.  There’s always a green, greenish yellow, yellow, orange, and red.  How am I to ascertain growth or change when I must constantly refer to the legend?  Sure there’s something to say about comparing country to country, but it loses alignment once you start paginating through the years.

Moving past my general take on the visualization – there were certain things I picked up on and wanted to carry forward on my makeover.  The first was the value out of 100 people.  Because I noticed that the color legend was increasing year to year, this meant that overall number of users was increasing.  Similarly, when thinking about comparing the countries, coloration changed, meaning ranks were changing.

I’ll tell you – my mind was originally drawn to the idea of 3 slope charts sitting next to each other.  One representing the first 5 years, then next 5 years, and so on.  Each country as a line.  Well that wasn’t really possible because the data has 1990 to 2000 as the first set of years – so I went down the path of the first 10 years.  It doesn’t tell me much other than something somewhat obvious: internet usage exploded from 1990 to 2000.

Here’s how the full set would have maybe played out:

This is perhaps a bit more interesting, but my mind doesn’t like the 10 year gap between 1990 and 2000, five year gaps from 2000 to 2010, and then annual measurements from 2010 to 2015 (that I didn’t include on this chart).  More to the point, it seems to me that 2000 may be a better starting measurement point.  And it created the inflection point of my narrative.

Looking at this chart – I went ahead and decided my narrative would be to understand not only how much more internet usage there is per country, but to also demonstrate how certain countries have grown throughout the time periods.  I limited the data set to the top 50 in 2015 to eliminate some of the data noise (there were 196 members in the country domain, when I cut it to 100 there were still some 0s in 2000).

To help demonstrate that usage was just overall more prolific, I developed a consistent dimension to block out number of users.  So as you read it – it goes from light gray to blue depending on the value.  The point being that as we get nearer in time, there’s more dark blue, no light gray.

And then I went the route of a bump chart to show how the ranks have changed.  Norway had been at the top of the charts, now it’s Iceland.  When you hover over the lines you can see what happened.  And in some cases it makes sense, a country is already dominating usage, increasing can only go so far.

But there are some amazing stories that can unfold in this data set: check out Andorra.  It went from #33 all the way up to #3.

You can take this visualization and step back into different years and benchmark each country on how prolific internet usage was during the time.  And do direct peer comparatives to boot.

This one deserves time spent focused on the interactivity of the visualization.  That’s part of the reason why it is so dense at first glance.  I’m intentionally trying to get the end user to see 3 things up front: overall internet usage in 2000 (by size and color encoding) and the starting rank of countries, the overall global increase in internet usage (demonstrated by coloration change over the spans), and then who the current usage leader is.

Take some time to play with the visualization here.

#MakeoverMonday Week 21 – Are Britons Drinking Less?

After some botched attempts at reestablishing routine, #MakeoverMonday week 21 got made within the time-boxed week!  I have one pending makeover and an in-progress blog post to talk about Viz Club and the 4 developed during that special time.  But for now, a quick recap of the how and why behind this week’s viz.

This week’s data set was straightforward – aggregated measures sliced by a few dimensions.  And to what I believe is now becoming an obvious trend on how data is published, it included both aggregated and lower dimensions within the same field (read this as “men,” “women,” “all people”).  The structured side of my doesn’t like it and screams for me to exclude from any visualizations, but this week I figured I’d take a different approach.

The key questions asked related to alcohol consumption frequency by different age and gender combinations (plus those aggregates) – so there was lots of opportunity to compare within those dimensions.  More to that, the original question and how the data was presented begged to rephrase into what became the more direct title (Are Britons Drinking Less?)

The question really informed the visualizations – and more to that point, the phrasing of the original article seemed to dictate to me that this was a “falling measure.”  Meaning it has been declining for years or year-to-year, or now compared to then – you get the idea.

With it being a falling measure and already in percentages, this made the concept of using an “difference from first” table calculation a natural progression.  When using the calculation the first year of the measure would be anchored at zero and subsequent years would be compared to it.  Essentially asking and answering for every year “was it more or less than the first year we asked?”  Here’s the beautiful small multiple:

Here the demographics are set to color, lightest blue being youngest to darkest blue being oldest; red is the ‘all.’  I actually really enjoyed being able to toss the red on there for a comparison and it is really nice to see the natural over/under of the age groups (which mathematically follows if they’re aggregates of the different groups).

One thing I did to add further emphasis was to put positive deltas on size – that is to say to over emphasize (in a very subdued, probably only Ann appreciates the humor behind it way) when it is anti-the trend.  Or more directly stated: draw the readers attention to different points where the percentage response has increased.

Here’s the resultant:

So older demographics are drinking more than they used to and that’s fueled by women.  This becomes more obvious to the point of the original article when looking at the Teetotal groups and seeing many more fat lines.

Here’s the calculation to create the line sizing:

Last up was to make one more view to help sell the message.  I figured a dot plot would mimic champagne bubbles in a very abstract way.  And I also thought open/closed circles in combination with the color encoding would be pleasant for the readers.  Last custom change there was to flip the vertical axis of time to be in reverse.  Time is read top down and you can see it start to push down to the left in some of the different groupings.

If you go the full distance and interact with the dashboard, the last thing I hope you’ll notice and appreciate is the color legend/filter bar at the top.  I hate color legends because they lack utility.  Adding in a treemap version of a legend that does double duty as highlight buttons is my happy medium (and only when I feel like color encoding is not actively communicated enough).

#MakeoverMonday Week 18

{witty intro}  This week’s makeover challenge was to take Sydney ferry data for 7 ferry lines and 8 months.  What’s even better is there was another dimension with a domain of 9 members.  This is a dream data set.  I say it’s a dream from the perspective of having two dimensions that can be manipulated and managed (no deciding HOW they have to be reduced or further grouped) and there’s decent data volume with each one.

In the world of visualization, I think this is a great starter data set.  And it was fun for me because I could focus on some of the design rather than deciding on a deep analytical angle.  Plus in the spirit of the original, my approach was to redo the output of “who’s riding the ferries” and make it more accessible.

So the lowdown: first decision made was the color palette.  The ferry route map had a lot of greens in it.  And obviously a lot of blues because of water.

So I wanted to take that idea and take it one step further.  That landed to me a world of deep blues and greens – using the darkest blue/green throughout typically represent the “most” of something.

These colors informed most decisions that came afterward.  I really wanted to stick to small multiples on this one, just by the sheer line up of the two medium/small domained dimensions.  Unfortunately – nothing of that nature turned out very interesting.  Here’s an example:

Like it’s okay and somewhat interesting – especially giving each row the opportunity to have a different axis range.  But you can see the “problem” immediately, there’s a few routes that are pretty flat and further to that, end users are likely going to be frustrated by the independent axis when they dive deeper to compare.

Pivoting from that point led me to the conclusion that the dimensions shouldn’t necessarily be shown together, but instead show one within the other.  But – worth noting, in the small multiple above you can see that the ‘Adult’ fare is just the most everywhere all the time.  Which led to this guy:

Where the bars are overall and the dots are Adult fares.  I felt that representing them in this context could free up the other dwarfed fare types to play with the data.

Last step from my end was to highlight those fare types and add a little whimsy.  I knew switching to % of total would be ideal because of the trip amounts for each route.  Interpret this as: normalizing to proportions gave opportunity to compare the routes.

I actually landed on the area chart by accident – I was stuck with lines, did my typical CTRL + drag of same pill to try and do some fun dual axis… and Tableau decided to automatically build me an area chart.

The original view of this was obviously not as attractive and I’ve done a few things to enhance how this displays.  The main thing was to eliminate the adult fare from the view visually.  We KNOW it’s the most, let’s move on.  Next was to stretch out the data a bit to see what’s going on in the remaining 30%-ish of rides. (Nerd moment: look at what I titled the sheet.)

Finishing up – there’s some label magic to show only those that are non-adult.  I also RETAINED the axis labels – I am hoping this helps to demonstrate and draw attention to the tagged axis at 50%.  What’s probably the most fun about this viz – you can hover over that same blue space and see the adult contribution – no data lost.

Overall I’m happy with the final effect.  A visually attractive display of data that hopefully invites users into deeper exploration.  Smaller dimension members given a chance to shine, and some straightforward questions asked and answered.

#MakeoverMonday Week 17

After a bit of life prioritization, I’m back in full force on a mission to contribute to Makeover Monday.  To that end, I’m super thrilled to share that I’ve completed my MBA.  I’ve always been an individual destined not to settle for one higher education degree, so having that box checked has felt amazing.

Now on to the Makeover!  This week’s data set was extra special because it was published on the Tableau blog – essentially more incentive to participate and contribute (there’s plenty of innate incentive IMO).

The data was courtesy of LinkedIn and represented 3 years worth of “top skills.”  Here’s my best snapshot of the data: 

 

This almost perfectly describes the data set, without the added bonus of there being a ‘Global’ skills in the Country dimension as well.  Mixing aggregations or concepts of what people believe can be aggregated, I sighed just a little bit.  I also sighed at seeing some countries are missing 2014 skills and 2016 is truncated to 10 skills each.

So the limitations of the data set meant that there had to be some clever dealing to get around this.  My approach was to take it from a 2016 perspective.  And furthermore to “look back” to 2014 whenever there was any sort of comparison.    I made the decision to eliminate “Global” and any countries without 2014 from the data set.  I find that the data lends itself best to comparison within a given country (my perspective) – so eliminating countries was something I could rationalize.

Probably the only visualization I really cared about was a slope chart.  I thought this would be a good representation of how a skill has gotten hotter (or not).  Here’s that:

Some things I did to jazz it up a bit.  Added a simple boolean expression to color to denote if the rank has improved since 2014.  Added on reference lines for the years to anchor the lines.  I’ve done slope charts different ways, but this one somehow evolved into this approach.  Here’s what the sheet looks like:

Walking through it, starting with the filter shelf.  I’ve got an Action filter on country (based on action filter buttons elsewhere on the dashboard).  Year has been added to context and 2015 eliminated.  Datasource filtered out the countries without 2014 data & global.  Skill is filtered to an LOD for 2016 Rank <> 0.  This ensures I’m only using 2016 skills.  The context filters keep everything looking pretty for the countries.

The year lines are reference lines – all headers are hidden.  There’s a dual axis on rows to have line chart & circle chart.  The second Year in columns is redundant and leftover from an abandoned labeling attempt (but adds nice dual labels automatically to my reference lines).

Just as a note – I made the 2016 LOD with a 2014 LOD to do some cute math for line size – I didn’t like it so abandoned.

Last steps were to add additional context to the “value” of 2016 skills.  So a quick unit chart and word cloud.  One thing I like to do on my word clouds these days is square the values on size.  I find that this makes the visual indicator for size easier to understand.  What’s great about this is that smaller rank is better, so instead of “^2” it became this:

Sometimes math just does you a real solid.

The kicker of this entire data set for me and gained knowledge: Statistical Analysis and Data Mining are hot!  Super hot!  Also really like that User Interface Design and Algorithm Design made it to the top 10 for the United States.  I would tell anyone that a huge component of my job is designing analytical outputs for all types of users and that requires an amount of UX design.  And coincidentally I’m making an algorithm to determine how to eliminate a backlog, all in Tableau.  (basic linear equation)

#MakeoverMonday Week 12 – All About March Madness

This week’s Makeover Monday topic was based on an article attempting to provide analysis into why it is harder for people to correctly pick their March Madness brackets. The original visualization is this guy:

With most Makeover Monday approaches I like to review the inspiration and visualization and let that somewhat decide the direction of my analysis. In this case I found that completely impossible. For my own sake I’m going to try and digest what it is I’m seeing/interpreting.

  • Title indicates that we’re looking at the seeds making the final 4
  • Each year is represented as a discrete value
  • I should be able to infer that “number above column represents sum of Final Four seeds” by the title
  • In the article it says in 2008 that all the seeds which made it to the final 4 were #1 – validates my logical assumption
  • Tracking this down further, I am now thinking each color represents a region – no idea which colors mean what – I take that back, I think they are ranked by the seed value (it looks like the first instance of the best seed rank is always yellow)
  • And then there’s an annotation tacked on % of Final Four teams seeded 7th or lower for two different time periods
    • Does 5.2% from 1985 to 2008 equal: count the blue bars (plus one red bar) with values >=7 – that’s 5 out of (24 * 4) = 5/96 = 5.2%
    • Same logic for the second statistic: 7 out of (8 * 4) = 21.9%

And then there’s the final distraction of the sum of the seed values above each bar.  What does this accomplish?  Am I going to use it to quickly try and calculate an “average seed value” for each year?  Because my math degree didn’t teach me to compute ratios at the speed of thought – it taught me to solve problems by using a combination of algorithms and creative thinking.  It also doesn’t help me with understanding interesting years – the height of the stacked bars does this just fine.

So to me this seems like an article where they’ve decided to take up more real estate and beef up the analysis with a visual display.  It’s not working and I’m sad that it is a “Chart of the Day.”

Now on to what I did and why.  I’ll add a little preface and say that I was VERY compelled to do a repeat of my Big Game Battle visualization, because I really like the idea of using small multiples to represent sports and team flux.  Here’s that display again:

Yes – you have to interact to understand, but once you do it is very clear.  Each line represents a win/loss result for the teams.  They are then bundled together by their regions to see how they progressed into the Superbowl.  In the line chart it is a running sum.  So you can quickly see that the Patriots and Falcons both had very strong seasons.  The 49ers were awful.

So that was my original inspiration, but I didn’t want to do the same thing and I had less time.  So I went a super distilled route of cutting down the idea behind the original article further.  Let’s just focus on seed rank of those in the championship.  To an extent I don’t really think there’s a dramatic story in the final 4 rankings – the “worst” seed that made it there was 11th.  We don’t even know if that team made it further.

In my world I’ve got championship winners vs. losers with position indicating their seed rank.  Color represents the result for the team for the year and for overall visual appeal I’ve made the color ramp.  To help orient the reader, I’ve added min/max ranks (I screwed this up and did pane for winner, should have been table like it is for loser, but it looks nice anyway).  I’ve also added on strategic years to help demonstrate that it’s a timeline.  If you were to interact, you’d see the name of the team and a few more specifics about what it is you’re looking at.

The reality of my takeaway here – a #1 seed usually wins.  Consistently wins, wins in streaks.  And there’s even a fair amount of #1 losers.  If I had to make a recommendation based on 32 years of championships: pick the #1 seeds and stick with them.  Using the original math from the article: 19 out of 32 winners were seed #1 (60%) and 11 out of 32 losers were seed #1 (34%).  Odds of a 1 being in the final 2 across all the years?  47% – And yes, that is said very tongue in cheek.

Makeover Monday Week 10 – Top 500 YouTube Game(r) Channels

We’re officially 10 weeks into Makeover Monday, which is a phenomenal achievement.  This means that I’ve actively participated in recreating 10 different visualizations with data varying from tourism, to Trump, to this week’s Youtube gamers.

First some commentary people may not like to read: the data set was not that great.  There’s one huge reason why it wasn’t great: one of the measures (plus a dimension) was a dependent variable on two independent variables.  And that dependent variable was processed via a pre-built algorithm.  So it would almost make sense to use the resultant dependent variable to enrich other data.

I’m being very abstract right now – here’s the structure of the data set:

Let’s walk through the fields:

  • Rank – this is a component based entirely on the sort chosen by the top (for this view it is by video views, not sure what those random 2 are, I just screencapped the site)
  • SB Score/Rank – this is some sort of ranking value applied to a user based on a propriety algorithm that takes a few variables into consideration
  • SB Score (as a letter grade) – the letter grade expression of the SB score
  • User – the name of the gamer channel
  • Subscribers – the # of channel subscribers
  • Video Views – the # of video views

As best as I can tell through reading the methodology – SB score/rank (the # and the alpha) are influenced in part from the subscribers and video views.  Which means putting these in the same view is really sort of silly.  You’re kind of at a disadvantage if you scatterplot subscribers vs. video views because the score is purportedly more accurate in terms of finding overall value/quality.

There’s also not enough information contained within the data set to amass any new insights on who is the best and why.  What you can do best with this data set is summarization, categorization, and displaying what I consider data set “vitals.”

So this is the approach that I took.  And more to that point, I wanted to make over a very specific chart style that I have seen Alberto Cairo employ a few times throughout my 6 week adventure in his MOOC.

That view: a bar chart sliced through with lines to help understand size of chunks a little bit better.  This guy:

So my energy was focused on that – which only happened after I did a few natural (in my mind) steps in summarizing the data, namely histograms:

Notice here that I’ve leveraged the axis values across all 3 charts (starting with SB grade and through to it’s sibling charts to minimize clutter).  I think this has decent effect, but I admit that the bars aren’t equal width across each bar chart.  That’s not pleasant.

My final two visualizations were to demonstrate magnitude and add more specifics in a visual manner to what was previously a giant text table.

The scatterplot helps to achieve this by displaying the 2 independent variables with the overall “SB grade” encoded on both color and size.  Note: for size I did powers of 2: 2^9, 2^8, 2^7…2^1.  This was a decent exponential effect to break up the sizing in a consistent manner.

The unit chart on the right is to help demonstrate not only the individual members, but display the elite A+ status and the terrible C+, D+, and D statuses.  The color palette used throughout is supposed to highlight these capstones – bright on the edges and random neutrals between.

This is aptly named an exploration because I firmly believe the resultant visualization was built to broadly pluck away at the different channels and get intrigued by the “details.”  In a more real world I would be out hunting for additional data to tag this back to – money, endorsements, average video length, number of videos uploaded, subject matter area, type of ads utilized by the user.  All of these appended to this basic metric aimed at measuring a user’s “influence” would lead down the path of a true analysis.