Makeover Monday Week 9 – Andy’s AMEX

So I started my dream job at the beginning of February.  This means I’ve been spending the month adjusting and tweaking my personal schedule and working on bringing back good habits.  In particular – I’ve missed out on doing daily workouts and consistently blogging about data viz.  Fortunately I’ve been keeping up with the practice component (Makeover Monday, Hackathon, Workout Wednesday), but I wholeheartedly believe in the holistic approach of sharing the thought process behind the viz.  (TL;DR – this was my paragraph of empty excuses)

Moving on then- the thought process behind the makeover.  And what’s even more interesting perhaps is that I can almost post-viz take some of the thoughts that Andy had regarding this week’s visualizations and provide my context.

Based on the original visualization I had an inkling that there wasn’t going to be a ton of data funneling in.  Being an individual who tracks all expenses and has seen them visually represented, I felt like food should represent a larger proportion of expenses.

Andy’s AMEX ’16

For reference, here’s a wonderful donut chart that provided me on my top 3 most used credit cards.  I funnel everything I can through credit cards, and food in general takes up a huge portion of spend.

Ann’s ’16 Credit Card Spending

Both of these visualizations leave something to be desired.  I like Andy’s original AMEX one better than the donut I got, but they are both very distilled.  Andy spent a lot on transportation and travel, and apparently I spent a lot on shopping and education.

Getting REALLY specific about the data – there were 110 records (FYI my donut is 477, 209 is food/dining).  Plotting the data quickly over time, there were large gaps of time with no purchases.

Armed with this, I decided to take an approach of piggybacking off the predefined categories to see if throughout time Andy typically has one category that gets a lot of spend, or to see if the spend trends are lumped together.

More to that point, I wanted to show the way the data was dispersed in a daily fashion… so I went down this path.  The largest transaction for each day plotted (using the category on color, amount on size) across the 12 months.  I actually really like this view because I can clearly see the large vehicle purchase in December and you get a better feel for how spread out the card’s utilization was.  (I am guessing my lack of axis label on the day of the month is jarring.)

Also because I hate color legends, this meant I needed to introduce the idea of a color legend via data points elsewhere and led to the first view:

So… I kind of got really interested in utilization frequency and wanted to take it further.  So the next step was to make a barcode chart.  Very similar in concept to what the “top daily spend” is showing, but not limiting the data to only the top daily in this case.

Insights gained here – I get this feeling that Andy may only (or mostly) use his AMEX for meals where he’s out traveling.  Hovering over the points would add more insight to the transaction values.  More than that, we get a feel for what this card is generally swiped for: the 3 categories at the bottom (and FYI I ranked these by sum dollars spent).

Finally – bundling it together in a palatable format – what were the headline transactions for the year?  I wanted to do monthly and have categories, but there wasn’t enough data.  So I opted to go transaction level and keep it top 5 each quarter.  I think there’s novelty here in terms of presentation, but also value in quick rough comparisons of values over each quarter.

And this rounds out the end of the analysis.  Most of the transactions here are centered around travel.  My brain is not sanitized enough to say what you can infer – I have too much generalized knowledge of how Andy’s profession could explain these findings to present from a pure lack of knowledge standpoint.  (TL;DR – I know that Andy travels for his job, was surprised #data16 wasn’t an obvious point within the data set)

So the thought process in general behind the path I took this: I wanted to explore how often Andy spends money in certain categories.  I was intrigued by frequency of usage to see if it could eventually point back to provide the data creator (the guy who bought stuff) some additional aha! moments.

To be more honest – I actually think this is something that I would want for myself.  I in particular would love to plot my transactions and see how that changes throughout the year.  Both in barcode for frequency (imagining Black Friday is heavy) and then to see if I’m utilizing their services any differently.  (I have this feeling that grocery type purchases are on the climb).

Oh – and in terms of asking about colors and fonts: I did go Andy’s blog for inspiration.  I wanted to do a red/blue motif based on the blog, but needed more colors.  So I think I googled “blue color palette” and ended up with this cute starting palette that evolved into having pops of orange and yellow.  Font: I left this to something minimal that I thought Andy would be okay with (Arial Narrow) that would also bode well across all platforms.

Makeover Monday Week 8 – Potatoes in the EU

I’ll say this first – I don’t eat potatoes.  Although potatoes are super tasty, I refuse to have them as part of my diet.  So I was less than thrilled about approaching a week that was pure potato (especially coming off the joy of Valentine’s Day).  Nonetheless – it presented itself with a perfect opportunity for growth and skill testing.  Essentially, if I could make a viz I loved about a vegetable I hate – that would speak to my ability to interpret varying data sets and build out displays.

I’m very pleased with the end results.  I think it has a very Stephen Few-esque approach.  Several small multiples with high and low denoted, color playing throughout as a dual encoder.  And there’s even visual interest in how the data was sorted for data shape.

So how did I arrive there?  It started with the bar chart of annual yield.  I had an idea on color scheme and knew that I wanted to make it more than gray.

This gave perfect opportunity to highlight the minimum and maximum yields.  To see what years different countries production was affected by things like weather and climate.  It’s actually very interesting to see that not too many of the dark bars (max value) are in more recent times.  Seems like agricultural innovation is keeping pace with climate issues.

After that I was hooked on this idea of sets of 3.  So I knew I wanted to replicate a small multiple in a different way using the same sort order.  That’s where Total Yield came in.  I’ve been pondering this one in the shower on the legitimacy of adding up annual ratios for an overall yield.  My brain says it’s fine because the size of the country doesn’t change.  But my vulnerable brain part says that someone may take issue with it.  I’d love for a potato farming expert to come along and tell me if that’s a silly thing to add up.  I see the value in doing a straight total comparison of the years.  Because although there’s fluctuation in the yield annually, we have a normalized way to show how much each countries produces irrespective of total land size.

Next was the dot plot of the current year.  This actually started out its life as a KPI indicator of up or down from previous, but it was too much for the visual.  I felt the idea of the dot plot of current year would do more justice to “right now” understanding.  Especially because you can do some additional visual comparison to its flanks and see more insight.

And then rinse/repeat for the right side.  This is really where things get super interesting.  The amount of variability in pricing for each country, both by average and current year.  Also – 2013 was a great year for potatoes.

Makeover Monday 2017 – Week 4 New Zealand Tourism

This week’s Makeover was addressing Domestic and International tourism trend in New Zealand.  No commentary provided with the data set, the original was just 2 charts left to the user to interpret.  See Eva’s tweet for the originals:

Going back to basics this week with what I like and dislike about it:

  • Titles are clear, bar chart isn’t too busy (like)
  • Not too many grid lines (like)
  • It’s easy to see the shape of the data and seasonality (like)
  • The scales are different between International & Domestic (dislike)
  • 3 years for easy comparison (like)
  • Eva chose this to promote her home country (like)

I think this was a good data set for week 4.  No data story to rewrite, special attention was made by Eva to mitigate data misinterpretation, and she added on a bonus of geospatial data for New Zealand.

My process really began with the geospatial part.  I haven’t yet had a chance to work with geospatial particulars developed and appended to data sets.  My experience in this has been limited to using Tableau’s functionality to manually add in latitude and longitude for unclear/missing/invalid data points.

So as I got started, I had no idea how to use the data.  There were a few fields that certainly pointed me in the right direction.  The first was “Point Order.”  Immediately I figured that needed to be used on “path” to determine where each data point fell.  That got me to this really cute outlined version of NZ (which looks like an upside down boot):

So I knew something additional needed to be done to get to a filled map.  That’s when I discovered the “PolygonNumber” field.  Throwing that onto detail, changing my marks to polygon and voila – New Zealand.  Here’s a Google image result for a comparison:

Eva did a great job trying to explain how NZ is broken up in terms of regions/territories/areas, but I have to admit I got a little lost.  I think what’s clear from the two pictures is I took the most granular approach to dissecting the country.

I’m super thrilled that I got this hands on opportunity to implement.  Geospatial is one of those areas of analytics that everyone wants to go and by including this – I feel much more equipped for challenges in the future.

Next up was the top viz: I’ve been wanting to try out a barbell/DNA chart for a long time.  I’ve made them in the past, but nothing that’s landed in a final viz.  I felt like there was an opportunity to try this out with the data set based on the original charts.  I quickly threw that together (using Andy Kriebel’s video tutorial) and really enjoyed the pattern that emerged.

The shape of the data really is what kicked off the path that the final viz ended up taking on.  I liked the stratification of domestic vs. international and wanted to carry that throughout.  This is also where I chose a colors.

The bottom left chart started out it’s life as a slope chart.  I originally did the first data point vs. the last data point (January 2008 vs. April 2016) for both of the types of tourism.  It came out to be VERY misleading – international had plummeted.  When I switched over to an annual aggregation, the story was much different.

International NZ tourism is failing!
Things look less scary and International is improving!

Good lesson in making sure to holistically look at the data.  Not to go super macro and get it wrong.  Find the right level of aggregation that keeps the message intact.

The last viz was really taking the geospatial component and adding in the tourism part.  I am on a small multiples kick and loved the novelty of having NZ on there more than once.  Knowing that I could repeat the colors again by doing a dual axis map got me sold.

All that was left was to add in interactivity.  Interactivity that originally was based on the barbell and line chart for the maps, but wasn’t quite clear.  I HATE filter drop downs for something that is going to be a static presentation (Twitter picture), so I wanted to come up with a way to give the user a filter option for the maps (because the shading does change over time), but have it be less tied to the static companion vizzes.  This is where I decided to make a nice filter sheet of the years and drop a nice diverging color gradient to add a little more beauty.  I’m really pleased with how that turned out.

My last little cute moment is the data sourcing.  The URLs are gigantic and cluttered the viz.  So instead I made a basic sheet with URL actions to quickly get to both data sets.

A fun week and one that I topped off by spelling Tourism wrong in the initial Tweet (haha).  Have to keep things fun and not super serious.

Full dashboard here.

Makeover Monday 2017 – Week 3 Trump Tweets

**Update (1/20/17) : The original data set had a date formatting snafu resulting in 1307 tweets at the 12:00-12:59 PM (UTC time) hour to be displayed as 00:00-00:59 (aka 12 AM hour).  This affected 4.3% of the original data set visualization and has been corrected.  I have also added a footnote denoting the visualization is in EST.  This affects the shape of the data in both the 4 AM – 8 AM and 4 PM – 8 PM sections.

Rolling right along into week 3’s Makeover Monday.  The data set this week: Donald Trump’s tweets.  The original Buzzfeed viz and article accompanying this analyzed Trump’s retweet activity since his announcement of running for president.  The final viz ended up being what I would best describe as bubble charts of the top users he retweeted during this time:

What’s interesting is that the actual article goes into significant depth on how their team systematically reviewed the tweets.  It’a a bummer that the additional analysis done couldn’t be synthesized into visual form.

My take on the makeover this week was driven completely by the underlying data available.  The TDE provided had the following fields:

Two things stuck out to me with the data.  First: the username being retweeted wasn’t included; second: the entire tweet text was included.  Having full text available just screams for some sort of text analysis.  I got committed at that point to doing something with the text.

My initial idea was to do some sort of sentiment analysis.  Recently I had installed both R-Studio and Python on my PC to try integration with Tableau.  I’d had success with R-Studio (mind you after watching a brief YouTube video), but I hadn’t gotten Python to cooperate (my effort in assisting in this cooperation = 2 out of 10).  I figured since I had both available maybe I should make an attempt.  After marinating on the concept I didn’t feel comfortable adding more sentiment analysis to the fire of American politics.  (On a personal note: I have been politically checked out since the early primaries.)

So instead of doing sentiment analysis, I decided to turn the data more into text mining for mentions and hashtags.  I had done some fiddling with the time component and was digging how the cycle plot/horizon chart were playing out visually.  So it seemed natural to continue on a progression of getting more details out of the bars and times of day.

Note on the time: time is graciously parsed into correct format with the data.  In looking at the original time, I am under the impression it was represented in GMT (+0000).  To adjust for this, I added -5 hours to all of the parsed dates to put it in EST aka Trump time.

So back to text mining.  Post #data16 conference, a colleague of mine was recounting how to use regex to scrub through text.  I walked away from his talk thinking I need to use that next time I have the opportunity.  And what I love about it: NATIVE FUNCTION TO TABLEAU!!  So this was making me sing.  Now I don’t know a ton about regex (lots of notation I have yet to memorize), so I decided to quickly google my way to getting the user handles and hashtags.  These handy results really made this analysis zip along: regexr & regex+twitter.

Everything else came to life pretty quickly.  I knew I wanted to include at least one or two tweets to read through, but I wanted to keep it curated.  I think this was accomplished well and I spent a good deal of time trying out different time combinations just to see what would bubble to the surface.

A final note on aesthetics this week: I’m reading Alberto Cairo’s The Functional Art, and as I mentioned in an earlier post, I’m also participating in his MOOC that starts tomorrow.  I am only 4 chapters in, but Alberto has me taking a few things to heart.  I don’t think it is by coincidence that I decided to push the beauty side of things.  I always strive for elegance, but I strive for it through white space and keeping that “data ink ratio” at a certain point.  But I’m not blind to the different visualizations out there that attract people.  So for once I used a non-white background (yay!).  And I also went for a font that’s well outside of the look of my usual vizzing font.

More than focusing on aesthetics, is of course the function of the viz.  I tried to spend more time thinking about the audience and what they were going to “get” out of it.  I hope that the final product is less of a “visual aid” to my analysis and more of an interactive tool to explore the tweets of the soon to be President.

Full viz available on my Tableau public page.

Makeover Monday 2017 – Week 2

It’s time for Makeover Monday – Week 2.  This week’s data set was the quarterly sales (by units) of Apple iPhones for the past 10ish years.  The original article accompanying the data indicated that the golden years of Apple may be over.

So let me start by saying – I broke the rules (or rather, the guidelines).  Makeover Monday guidelines indicate that the goal is to improve upon the original visualization and stick to the original data fields.  I may have overlooked that guideline this week in favor of adding a little more context.

When I first approached the data set and dropped it into Tableau, the first thing I immediately noticed was that Q4 always has a dip compared to the other quarters of the year.

This view contradicted all of my existing knowledge of how iPhone releases work.  Typically every year Apple holds a conference around the middle/end of September announcing the “new” iPhone.  That can either be the gap increase (off year, aka the S) or the new generation.  It lines up such that pre-sales and sales come in the weeks shortly following.  And in addition to that I would suspect that sales would stay heightened throughout the holiday season.

This is where I immediately went back to the data to challenge it and I noticed that Apple defines its fiscal year differently.  Specifically October to December (of the previous year) counts as Q1 of the current year.  Essentially Q1 of 2017 is actually 10/1/16 to 12/31/16.  Meaning that in the normalized world thinking about quarters, everything should be adjusted.

Now I was starting to feel much better about how things were looking.  It aligned with my real world expectations.

I still couldn’t help but feel that a significant portion of the story was missing.  In my mind it wasn’t fair to only look at iPhone sales over time without understanding more data points of the smartphone market.  I narrowed it down to overall sales of smartphones and number of smartphone users.  The idea I had was this: have we reached a point where the number of smartphone users is now a majority?  Essentially the Adoption Curve came to my mind – maybe we’ve hit that sweet spot where the Late Majority is now getting in on smartphones.

To validate the theory and keep things simple, I did quick searches for data sets I could bring into the view.  As if through serendipity, the two additional sources I stumbled upon came from the same as the original (  I went ahead and added them into my data set and got to work.

My initial idea was this: line plot of iPhone sales vs. overall smartphone sales.  See if directionality was the same.  Place a smaller graph of smartphone users to the side (mainly because it was US only, couldn’t find a free global data set).  And the last viz was going to be a combination of the 3 showing basic “growth” change.  That in my mind would in a very basic way display an answer to my questioning.

I went through a couple of iterations and finally landed on the view below as my final.

I think it sums up the thought process and answers the question I originally asked myself when I approached the data set.  And hopefully I can be pardoned (if even necessary) since the accompanying data added in merely enhanced information at hand and kept with the simplicity of data points available (units and time).

Makeover Monday 2017 – Week 1

It’s officially 2017 – the start of a new year.  As such, this is a great time for anyone in the Tableau universe to make a fresh commitment to participate in the community challenge known as Makeover Monday.

As I jump into this challenge, I’ve made the conscious decision to start with the things I already like doing and to add on each time.  This to me is the way that I’ll be able to stay actively involved and enthusiastic.  Essentially: keep it simple.

For this week’s data set it was obvious that something of a comparative nature needed to be applied.  I started off with a basic dot plot and went from there.

What I ended up with: a slope chart with the slope representing the delta in rank of income by gender, the size of the line representing the annual monetary difference in income, and 3 colors representing categorized multipliers on the wage gap.

I wanted this to be for a phone, so I held to the idea of a single viz.  Interactivity is really limited to tooltips, most other nuance comes from the presentation of the visualization itself.

And I pushed myself to add a little journalistic flare this week.  Not really my style, but I figured I would see where it took me.

#MakeoverMonday 11/22/16 – Advanced Logging Edition

And it’s time – my first ever Makeover Monday.  I’ll admit, I’ve attempted to catch up in the past, but always lost steam.  I think the first data set might be related to sports and I struggle to focus on making something interesting.

Despite my follies, I’m proud to say that I’ve participated in this week’s Makeover Monday in honor of the special advanced logging that is taking place.  Along with submitting work with the hashtag on twitter, Tableau has asked for us to upload a copy of our log files and workbook.  Contained within the advanced log files are .PNGs that show analysis iterations.

I went into this Monday with the idea of doing a basic “best practices” version.  One that would mimic something I might create for ultimate exploration and zero data journalism.  I tried to stick with one element that I thought worked well and create the dashboard around it.

Looking at the other participants, I’m already thinking that my time heatmap could be improved.  My mind was stuck on the day numbers and quarters.  I should have switched to days of the week!  Irrespective – here it is:

And the GIF: