#MakeoverMonday Week 12 – All About March Madness

This week’s Makeover Monday topic was based on an article attempting to provide analysis into why it is harder for people to correctly pick their March Madness brackets. The original visualization is this guy:

With most Makeover Monday approaches I like to review the inspiration and visualization and let that somewhat decide the direction of my analysis. In this case I found that completely impossible. For my own sake I’m going to try and digest what it is I’m seeing/interpreting.

  • Title indicates that we’re looking at the seeds making the final 4
  • Each year is represented as a discrete value
  • I should be able to infer that “number above column represents sum of Final Four seeds” by the title
  • In the article it says in 2008 that all the seeds which made it to the final 4 were #1 – validates my logical assumption
  • Tracking this down further, I am now thinking each color represents a region – no idea which colors mean what – I take that back, I think they are ranked by the seed value (it looks like the first instance of the best seed rank is always yellow)
  • And then there’s an annotation tacked on % of Final Four teams seeded 7th or lower for two different time periods
    • Does 5.2% from 1985 to 2008 equal: count the blue bars (plus one red bar) with values >=7 – that’s 5 out of (24 * 4) = 5/96 = 5.2%
    • Same logic for the second statistic: 7 out of (8 * 4) = 21.9%

And then there’s the final distraction of the sum of the seed values above each bar.  What does this accomplish?  Am I going to use it to quickly try and calculate an “average seed value” for each year?  Because my math degree didn’t teach me to compute ratios at the speed of thought – it taught me to solve problems by using a combination of algorithms and creative thinking.  It also doesn’t help me with understanding interesting years – the height of the stacked bars does this just fine.

So to me this seems like an article where they’ve decided to take up more real estate and beef up the analysis with a visual display.  It’s not working and I’m sad that it is a “Chart of the Day.”

Now on to what I did and why.  I’ll add a little preface and say that I was VERY compelled to do a repeat of my Big Game Battle visualization, because I really like the idea of using small multiples to represent sports and team flux.  Here’s that display again:

Yes – you have to interact to understand, but once you do it is very clear.  Each line represents a win/loss result for the teams.  They are then bundled together by their regions to see how they progressed into the Superbowl.  In the line chart it is a running sum.  So you can quickly see that the Patriots and Falcons both had very strong seasons.  The 49ers were awful.

So that was my original inspiration, but I didn’t want to do the same thing and I had less time.  So I went a super distilled route of cutting down the idea behind the original article further.  Let’s just focus on seed rank of those in the championship.  To an extent I don’t really think there’s a dramatic story in the final 4 rankings – the “worst” seed that made it there was 11th.  We don’t even know if that team made it further.

In my world I’ve got championship winners vs. losers with position indicating their seed rank.  Color represents the result for the team for the year and for overall visual appeal I’ve made the color ramp.  To help orient the reader, I’ve added min/max ranks (I screwed this up and did pane for winner, should have been table like it is for loser, but it looks nice anyway).  I’ve also added on strategic years to help demonstrate that it’s a timeline.  If you were to interact, you’d see the name of the team and a few more specifics about what it is you’re looking at.

The reality of my takeaway here – a #1 seed usually wins.  Consistently wins, wins in streaks.  And there’s even a fair amount of #1 losers.  If I had to make a recommendation based on 32 years of championships: pick the #1 seeds and stick with them.  Using the original math from the article: 19 out of 32 winners were seed #1 (60%) and 11 out of 32 losers were seed #1 (34%).  Odds of a 1 being in the final 2 across all the years?  47% – And yes, that is said very tongue in cheek.

February Book Binge

Another month has passed, so it’s time to recount what I’ve been reading.

Admittedly it was kind of a busy month for me, so I decided to mix up some of my book habits with podcasts.  To reflect that – I’ve decided to share a mixture of both.

 

First up is Rhinoceros Success by Scott Alexander

This is a short read designed to ignite fire and passion into whoever reads it. It walks through how a big burly rhino would approach every day life, and how you as a rhino should follow suit.

I read this one while I was transitioning between jobs and found it to be a great source of humor during the process. It helps to articulate out ‘why’ you may be doing certain things and puts it in the context of what a rhino would do. This got me through some rough patches of uncertainty.

The next book was Made to Stick by the Heath brothers

This was another recommendation and one that I enjoyed. I will caveat and say that this book is really long. I struggled to try and get through a chapter at a time (~300 pages and only 7 chapters). It is chocked full of stories to help the reader understand the required model to make ideas stick.

I read this one because often times a big part of my job is communicating out a yet to be seen vision. And it is also to try and get people to buy-in to a new type of thinking. These aren’t easy and can be met with resistance. The tools that the Heath brothers offer are simple and straightforward. I think they even extend further to writing or public speaking. How do you communicate a compelling idea that will resonate with your audience?

I’ve got their 2 other books and will be reading one of them in March.

Lastly – I wanted to spend a little bit of time sharing a podcast that I’ve come to enjoy. It is Design Matters with Debbie Millman.

This was shared with me by someone on Twitter. I found myself commuting much more than average this much (as part of the job change) and I was looking for media to consume during the variable length (30 to 60 minute) commute. This podcast fits that time slot so richly. What’s awesome is the first podcast I listed to had Seth Godin on it (reading one of his books now) – so it was a great dual purpose item. I could hear Seth and preview if I should read one of his many books and also get a dose of Debbie.

The beauty of this podcast for me is that Debbie spends a lot of time exploring the personality and history of modern artists/designers. She does this by amassing research on each individual and then having a very long sit-down to discuss findings. Often times this involves analyzing individual perspectives and recounting significant past events. I always find it illuminating how these people view the world and how they’ve “arrived” at their current place in life.

That wraps up my content diet for the month – and I’m off to listen to Seth.

Makeover Monday Week 10 – Top 500 YouTube Game(r) Channels

We’re officially 10 weeks into Makeover Monday, which is a phenomenal achievement.  This means that I’ve actively participated in recreating 10 different visualizations with data varying from tourism, to Trump, to this week’s Youtube gamers.

First some commentary people may not like to read: the data set was not that great.  There’s one huge reason why it wasn’t great: one of the measures (plus a dimension) was a dependent variable on two independent variables.  And that dependent variable was processed via a pre-built algorithm.  So it would almost make sense to use the resultant dependent variable to enrich other data.

I’m being very abstract right now – here’s the structure of the data set:

Let’s walk through the fields:

  • Rank – this is a component based entirely on the sort chosen by the top (for this view it is by video views, not sure what those random 2 are, I just screencapped the site)
  • SB Score/Rank – this is some sort of ranking value applied to a user based on a propriety algorithm that takes a few variables into consideration
  • SB Score (as a letter grade) – the letter grade expression of the SB score
  • User – the name of the gamer channel
  • Subscribers – the # of channel subscribers
  • Video Views – the # of video views

As best as I can tell through reading the methodology – SB score/rank (the # and the alpha) are influenced in part from the subscribers and video views.  Which means putting these in the same view is really sort of silly.  You’re kind of at a disadvantage if you scatterplot subscribers vs. video views because the score is purportedly more accurate in terms of finding overall value/quality.

There’s also not enough information contained within the data set to amass any new insights on who is the best and why.  What you can do best with this data set is summarization, categorization, and displaying what I consider data set “vitals.”

So this is the approach that I took.  And more to that point, I wanted to make over a very specific chart style that I have seen Alberto Cairo employ a few times throughout my 6 week adventure in his MOOC.

That view: a bar chart sliced through with lines to help understand size of chunks a little bit better.  This guy:

So my energy was focused on that – which only happened after I did a few natural (in my mind) steps in summarizing the data, namely histograms:

Notice here that I’ve leveraged the axis values across all 3 charts (starting with SB grade and through to it’s sibling charts to minimize clutter).  I think this has decent effect, but I admit that the bars aren’t equal width across each bar chart.  That’s not pleasant.

My final two visualizations were to demonstrate magnitude and add more specifics in a visual manner to what was previously a giant text table.

The scatterplot helps to achieve this by displaying the 2 independent variables with the overall “SB grade” encoded on both color and size.  Note: for size I did powers of 2: 2^9, 2^8, 2^7…2^1.  This was a decent exponential effect to break up the sizing in a consistent manner.

The unit chart on the right is to help demonstrate not only the individual members, but display the elite A+ status and the terrible C+, D+, and D statuses.  The color palette used throughout is supposed to highlight these capstones – bright on the edges and random neutrals between.

This is aptly named an exploration because I firmly believe the resultant visualization was built to broadly pluck away at the different channels and get intrigued by the “details.”  In a more real world I would be out hunting for additional data to tag this back to – money, endorsements, average video length, number of videos uploaded, subject matter area, type of ads utilized by the user.  All of these appended to this basic metric aimed at measuring a user’s “influence” would lead down the path of a true analysis.

Makeover Monday Week 9 – Andy’s AMEX

So I started my dream job at the beginning of February.  This means I’ve been spending the month adjusting and tweaking my personal schedule and working on bringing back good habits.  In particular – I’ve missed out on doing daily workouts and consistently blogging about data viz.  Fortunately I’ve been keeping up with the practice component (Makeover Monday, Hackathon, Workout Wednesday), but I wholeheartedly believe in the holistic approach of sharing the thought process behind the viz.  (TL;DR – this was my paragraph of empty excuses)

Moving on then- the thought process behind the makeover.  And what’s even more interesting perhaps is that I can almost post-viz take some of the thoughts that Andy had regarding this week’s visualizations and provide my context.

Based on the original visualization I had an inkling that there wasn’t going to be a ton of data funneling in.  Being an individual who tracks all expenses and has seen them visually represented, I felt like food should represent a larger proportion of expenses.

Andy’s AMEX ’16

For reference, here’s a wonderful donut chart that Mint.com provided me on my top 3 most used credit cards.  I funnel everything I can through credit cards, and food in general takes up a huge portion of spend.

Ann’s ’16 Credit Card Spending

Both of these visualizations leave something to be desired.  I like Andy’s original AMEX one better than the donut I got, but they are both very distilled.  Andy spent a lot on transportation and travel, and apparently I spent a lot on shopping and education.

Getting REALLY specific about the data – there were 110 records (FYI my donut is 477, 209 is food/dining).  Plotting the data quickly over time, there were large gaps of time with no purchases.

Armed with this, I decided to take an approach of piggybacking off the predefined categories to see if throughout time Andy typically has one category that gets a lot of spend, or to see if the spend trends are lumped together.

More to that point, I wanted to show the way the data was dispersed in a daily fashion… so I went down this path.  The largest transaction for each day plotted (using the category on color, amount on size) across the 12 months.  I actually really like this view because I can clearly see the large vehicle purchase in December and you get a better feel for how spread out the card’s utilization was.  (I am guessing my lack of axis label on the day of the month is jarring.)

Also because I hate color legends, this meant I needed to introduce the idea of a color legend via data points elsewhere and led to the first view:

So… I kind of got really interested in utilization frequency and wanted to take it further.  So the next step was to make a barcode chart.  Very similar in concept to what the “top daily spend” is showing, but not limiting the data to only the top daily in this case.

Insights gained here – I get this feeling that Andy may only (or mostly) use his AMEX for meals where he’s out traveling.  Hovering over the points would add more insight to the transaction values.  More than that, we get a feel for what this card is generally swiped for: the 3 categories at the bottom (and FYI I ranked these by sum dollars spent).

Finally – bundling it together in a palatable format – what were the headline transactions for the year?  I wanted to do monthly and have categories, but there wasn’t enough data.  So I opted to go transaction level and keep it top 5 each quarter.  I think there’s novelty here in terms of presentation, but also value in quick rough comparisons of values over each quarter.

And this rounds out the end of the analysis.  Most of the transactions here are centered around travel.  My brain is not sanitized enough to say what you can infer – I have too much generalized knowledge of how Andy’s profession could explain these findings to present from a pure lack of knowledge standpoint.  (TL;DR – I know that Andy travels for his job, was surprised #data16 wasn’t an obvious point within the data set)

So the thought process in general behind the path I took this: I wanted to explore how often Andy spends money in certain categories.  I was intrigued by frequency of usage to see if it could eventually point back to provide the data creator (the guy who bought stuff) some additional aha! moments.

To be more honest – I actually think this is something that I would want for myself.  I in particular would love to plot my Amazon.com transactions and see how that changes throughout the year.  Both in barcode for frequency (imagining Black Friday is heavy) and then to see if I’m utilizing their services any differently.  (I have this feeling that grocery type purchases are on the climb).

Oh – and in terms of asking about colors and fonts: I did go Andy’s blog for inspiration.  I wanted to do a red/blue motif based on the blog, but needed more colors.  So I think I googled “blue color palette” and ended up with this cute starting palette that evolved into having pops of orange and yellow.  Font: I left this to something minimal that I thought Andy would be okay with (Arial Narrow) that would also bode well across all platforms.