A follow up to The Women of #IronViz

It’s now 5 days removed from the Tableau Conference (#data17) and the topic of women in data visualization and the particularly pointed topic of women competing in Tableau’s #IronViz competition is still fresh on everyone’s mind.

First – I think it’s important to recognize how awesome the community reception of this topic has been.  Putting together a visualization that highlights a certain subsection of our community is not without risk.  While going through the build process, I wanted to keep the visualization in the vein of highlighting the contributions of women in the community.  It wasn’t meant to be selective or exclusive, instead, a visual display of something I was interested in understanding more about.  Despite being 5 days removed from the conference, the conversations I’ve been involved in (and observed from a distance) have all remained inclusive and positive.  I’ve seen plenty of people searching for understanding and hunting for more data points.  I’ve also seen a lot of collaboration around solutions and collecting the data we all seek.  What I’m thankful that I have not witnessed is blame or avoidance.  In my mind this speaks volumes to the brilliant and refined members of our community and their general openness and acceptance of change, feedback, and improvement.

One thing circling the rounds that I felt compelled to iterate off of, is @visualibrarian’s recent blog post that has interview style questions and answers around the topic.  I am a big believer in self reflection and exploration and was drawn to her call to action (maybe it was the punny and sarcastic nature of the ask) to answer the questions she put forth.

1. Tell me about yourself. What is your professional background? When did you participate in Iron Viz?

My professional background is that of a data analyst.  Although I have a bachelor’s degree in Mathematics, my first professional role was as a Pharmacy Technician entering prescriptions.  That quickly morphed into someone dedicated to reducing prescription entry errors and built on itself over and over to be put in roles like quality improvement and process engineering.  I’ve always been very reliant on data and data communication (in my early days as PowerPoint) to help change people and processes.  About 2 or 3 years ago I got fed up with being the business user at the mercy of traditional data management or data owners and decided to brute force my way into the “IT” side of things.  I was drawn to doing more with data and having better access to it.  Fast-forward to the role I’ve had for a little over 8 months as a Data Visualization Consultant.  Which essentially means I spend a significant amount of my time partnering with organizations to either enable them to use visual analytics, improve the platforms that they are currently using, or overcoming any developmental obstacles they may have.  It also means I spend a significant amount of time championing the power of data visualization and sharing “best practices” on the topic.  I often call myself a “data champion” because I seek simply to be the voice of the data sets I’m working with.  I’m there to help people understand what they’re seeing.

In terms of Iron Viz – I first participated in 2016’s 3rd round feeder, Mobile Iron Viz.  I’ve since participated in every feeder round since.  And that’s the general plan on my end, continue to participate until I make it on stage or they tell me to stop 🙂

2. Is Tableau a part of your job/professional identity?

Yes – see answer to question #1.  It’s pretty much my main jam right now.  But I want to be very clear on this point – I consider my trade visual analytics, data visualization, and data analytics.  Tableau is to me the BEST tool to use within my trade.  By no means the only tool I use, but the most important one for my role.

3. How did you find out about Iron Viz?

When I first started getting more deeply involved in my local User Group, I found out about the competition.  Over time I became the leader of my user group and a natural advocate for the competition.  Once I became a part of the social community (via Twitter) it was easy to keep up with the ins and outs of the competition.

4. Did you have any reservations about participating in Iron Viz?

Absolutely – I still have reservations.  The first one I participated in was sort of on the off chance because I found something that I want to re-visualize in a very pared down elegant, simplistic way.  I ended up putting together the visualization in a very short period of time and after comparing it to the other entries I felt my entry was very out of place.  I tend to shy away from putting text heavy explanations within my visualizations, so I’ve felt very self-conscience that my designs don’t score well on “story telling.”  It was also very hard in 2016 and the beginning of 2017.  Votes were based off of Twitter.  You could literally search for your hashtag and see how many people liked your viz.  It’s a very humbling and crushing experience when you don’t see any tweets in your favor.

5. Talk me through your favorite submission to Iron Viz. What did you like about it? Why?

Ah – they are all my favorite for different reasons.  For each entry I’ve always remained committed and deeply involved in what the data represents.  Independent of social response, I have always been very proud of everything I’ve developed.  For no other reason than the challenge of understanding a data set further and for bringing a new way to visually display it.  My mobile entry was devastatingly simple – I love it to death because it is so pared down (the mobile version).  For geospatial I made custom shapes for each of the different diamond grades.  It’s something I don’t think anyone in the world knows I did – and for me it really brought home the lack of interest I have in diamonds as rare coveted items.

6. What else do you remember about participating in Iron Viz?

The general anxiety around it.  For geospatial 2017 I procrastinated around the topic so much.  My parents actually came to visit me and I took time away from being with them to complete.  I remember my mom consoling me because I was so adamant that I needed to participate.

Safari and Silver Screen were different experiences for me.  I immediately locked in on data sets on subjects I loved, so there was less stress.  When I did the Star Trek entry I focused on look and feel of the design and was so stoked that the data set even existed.  Right now I am watching The Next Generation nightly and I go back to that visualization to see how it compares to my actual perception of each episode (in terms of speaking pace and flow).

7. Which Iron Viz competitions did you participate in, and why?

Everything since 2016 feeder round 3.  I felt a personal obligation and an obligation to my community to participate.  It was also a great way for me to practice a lot of what I tell others – face your fears and greet them as an awesome challenge.  Remain enthusiastic and excited about the unknown.  It’s not always easy to practice, but it makes the results so worth it.

8. What competitions did you not participate in, and why?

Anything before mobile – and only because I (most likely) didn’t know about it.  Or maybe more appropriately stated – I wasn’t connected enough to the community to know of it’s existence or how to participate.

9. Do you participate in any other (non Iron Viz) Tableau community events?

Yes – I participate in #MakeoverMonday and #WorkoutWednesday.  My goal for the end of 2017 is to have all 52 for each completed.  Admittedly I am a bit off track right now, but I plan on closing that gap soon.  I also participate in #VizForSocialGood and have participated in past User Group viz contests.  I like to collect things and am a completionist – so these are initiatives that I’ve easily gotten hooked on.  I’ve also reaped so many benefits from participation.  Not just the growth that’s occurred, but the opportunity to connect with like-minded individuals across the globe.  It’s given me the opportunity to have peers that can challenge me and to be surrounded by folks that I aspire to be more like.  It keeps me excited about getting better and knowing more about our field.  It’s a much richer and deeper environment than I have ever found working within a single organization.

10. Do you have any suggestions for improving representation in Iron Viz?

  • Make it more representative of the actual stage contest
  • Single data set
  • Everyone submits on the same day
  • People don’t tweet or reveal submissions until contest closes
  • Judges provide scoring results to individual participants
  • The opportunity to present analysis/results, the “why”
  • Blind submissions – don’t reveal participants until results are posted
  • Incentives for participation!  It would be nice to have swag or badges or a gallery of all the submissions afterward

And in case you just came here to see the visualization that’s set as the featured image, here’s the link.

Don’t be a Bridge, Instead be a Lock

Lately I’ve spent a lot of time pondering my role in the world of data.  There’s this common phrase that we as data visualization and data analytics (BI) professionals hear all the time (and that I am guilty of saying):

“I serve as the bridge between business and IT.”

Well – I’m here to say it’s time to move on.  Why?  Because the bridge analogy is incomplete.  And because it doesn’t accurately represent the way in which we function in this critical role.  At first glance the bridge analogy seams reasonable.  A connector, something that joins two disparate things.  In a very physical way it connects two things that otherwise have an impasse between them.  The business is an island.  IT is an island.  Only a bridge can connect them.  But is this really true?

Instead of considering the two as separate entities that must be connected, what if we rethought it to be bodies of water at different levels?  They touch each other, they are one.  They are the same type of thing.  The only difference is that they are at different levels, so something like a boat can’t easily go between them.  Is this not what is really happening.  “The business” and “IT” are all really one large organization – not two separate, foreign entities.

This is where the role of being the Lock comes in.  A lock is the mechanism by which watercraft are raised or lowered between waterways.  And to a large extent it is a better analogy to our roles in data.  We must adapt to the different levels of business and IT.  And more importantly it is our responsibility to form that function – and to get the boat (more specifically “the data”) through from one canal to the other.

Even exploring what Wikipedia says about a lock – it fits better.

“Locks are used to make a river more easily navigable, or to allow a canal to cross land that is not level. ”

“Larger locks allow for a more direct route to be taken” [paraphrased]

Is this not how we function in our daily roles?  How fitting is it to say this:

“My role is to make your data more easily navigable.  My goal is to allow data to flow through on your level.  I’m here to allow a more direct route between you and your data.”

It feels right.  I’m there to help you navigate your data through both IT and business waters.  And it is my privilege and honor to facilitate this.  Let’s drop the bridge analogy and move toward a new paradigm – the world where we are locks, adjusting our levels to fit the needs of both sides.

Boost Your Professional Skills via Games

Have you ever found yourself in a situation where you were looking for opportunities to get more strategic, focus on communication skills, improve your ability to collaborate, or just stretch your capacity to think critically?  Well I have the answer for you: pick up gaming.

Let’s pause for a second and provide some background: I was born the same year the NES was released in North America – so my entire childhood was littered with video games.  I speak quite often about how much video gaming has influenced my life.  I find them to be one of the best ways to unleash creativity, have a universe where failure is safe, and there is always an opportunity for growth and challenge.

With all that context you may think this post is about video games and how they can assist with growing out the aforementioned skills.  And that’s where I’ll add a little bit of intrigue: this post is actually dedicated to tabletop games.

For the past two years I’ve picked up an awesome hobby – tabletop gaming.  Not your traditional Monopoly or Game of Life – but games centered around strategy and cooperation.  I’ve taken to playing them with friends, family, and colleagues as a way to connect and learn.  And along the way I’ve come across a few of my favorites that serve as great growth tools.

Do I have you intrigued?  Hopefully!  Now on to a list of recommendations.  And the best part?  All but one of these can be played with 2 players.  I often play games with my husband as a way to switch off my brain from the hum of everyday life and into the deep and rich problems and mechanics that arise during game play.

First up – Jaipur

Jaipur is only a 2 player game that centers around trading and selling goods.  The main mechanics here are knowing when to sell, when to hold, and how to manipulate the market.  There are camel cards that get put in place that when returned cause new goods to appear.

Why you should play: It is a great way to understand value at a particular moment in time.  From being first to market, to waiting until you have several of a specific good to sell, to driving changes in the market by forcing your opponent’s hand.  It helps unlock the necessity to anticipate next steps.  It shows how you can have control over certain aspects (say all the camels to prevent variety in the market), but how that may put you at a disadvantage when trying to sell goods.

It’s a great game that is played in a max of 3 rounds and probably 30 minutes.  The variety and novelty of what happens makes this a fun to repeat game.

Hanabi

Hanabi is a collaborative game that plays anywhere from 2 to 5 people.  The basic premise is that you and your friends are absentminded fireworks makers and have mixed up all the fireworks (numeric sets 1 to 5 of 5 different colors).  Similar to Indian Poker you have a number of cards (3 or 4) facing away from you.  That is to say – you don’t know your hand, but your friends do.  Through a series of sharing information and discarding/drawing cards everyone is trying to put down cards in order from 1 to 5 for particular colors.  If you play a card too soon then the fireworks could go off early and there’s only so much information to share before the start of the fireworks show.

This is a great game to learn about collaboration and communication.  When you’re sharing information you give either color or numeric information to someone about their hand.  This can be interpreted several different ways and it’s up to the entire team to communicate effectively and adjust to interpretation style.  It also forces you to make choices.  My husband and I recently played and got dealt a bunch of single-high value cards that couldn’t be played until the end.  We had to concede as a team that those targets weren’t realistic to go after and were the only way we could end up having a decent fireworks display.

Lost Cities

This is another exclusively two player game.  This is also a set building game where you’re going on exploration missions to different natural wonders.  Your goal is to fill out sets in numeric order (1 to 10) by color.  There’s a baseline cost to going on a mission, so you’ll have to be wise about going off on a mission.  There are also cards you can play (before the numbers) that let you double, triple, or quadruple your wager on successfully going on the exploration.  You and your opponent take turns drawing from a pool of known cards or from a deck.  Several tactics can unfold here.  You can build into a color early, or completely change paths once you see what the other person is discarding.  It’s also a juggling act to decide how much to wager to end up making money.

Bohnanza

This one plays well with a widespread number of players.  The key mechanic here is that you’re a bean farmer with 2 fields to plant beans.  The order in which you receive cards is crucial and can’t be changed.  It’s up to you to work together with your fellow farmers at the bean market to not uproot your fields too early and ruin a good harvest.  This is a rapid fire trading game where getting on someone’s good side is critical and you’ll immediately see the downfall of holding on to cards for the “perfect deal.”  But of course you have to balance out your friendliness with the knowledge that if you share too many high value beans the other farmers may win.  There’s always action on the table and you have to voice your offer quickly to remain part of the conversation.

The Grizzled

The Grizzled is a somewhat melancholy game centered around World War I.  You’re on a squad and trying to successfully fulfill missions before all morale is lost.  You’ll do this by dodging too many threats and offering support to your team.  You’ll even make speeches to encourage your comrades.  This game offers lots of opportunities to understand when and how to be a team player to keep morale high and everyone successful.  The theme is a bit morose, but adds context to the intention behind each player’s actions.

The Resistance

Sadly this requires a minimum of 5 people to play, but is totally worth it.  As the box mentions it is a game of deduction and deception.  You’ll be dealt a secret role and are either fighting for victory or sabotage.  I played this one with 8 other colleagues recently and pure awesomeness was the result.  You’ll get the chance to pick teams for missions, vote on how much you trust each other, and ultimately fight for success or defeat.  You will get insight into crowd politics and how individuals handle situations of mistrust and lack of information.  My recent 9 player game divulged into using a white board to help with deductions!

Next time you’re in need of beefing up your soft skills or detaching from work and want to do it in a productive and fun manner – consider tabletop gaming.  Whether you’re looking for team building exercises or safe environments to test how people work together – tabletop games offer it all.  And in particular – collaborative tabletop games.  With most games there’s always an element of putting yourself first, but you will really start to understand how individuals like to contribute to team mechanics.

#IronViz – Let’s Go on a Pokémon Safari!

It’s that time again – Iron Viz!  The second round of Iron Viz entered my world via an email with a very enticing “Iron Viz goes on Safari!” theme.  My mind immediately got stuck on one thing: Pokémon Safari Zone.

Growing up I was a huge gamer and Pokémon was (and still is) one of my favorites.  I even have a cat named after a Pokémon, it’s Starly (find her in the viz!).  So I knew if I was going to participate that the idea for Pokémon Safari was the only way to go.

I spent a lot of time thinking about how I might want to bring this to life.  Did I want to do a virtual safari of all the pocket monsters?  Did I want to focus on the journey of Ash Ketchum through the Safari Zone?  Did I want to focus on the video games?

After all the thoughts swirled through my mind – I settled on the idea of doing a long form re-creation of Ash Ketchum’s adventure through the Safari Zone in the anime.  I sat down and googled to figure out the episode number and go watch.  But to my surprise the episode has been banned.  It hasn’t made it on much TV and the reason it is banned makes it very unattractive and unfriendly for an Iron Viz long form.  I was gutted and had to set off on a different path.

The investment into the Safari Zone episode got me looking through the general details of the Safari Zone in the games.  And that’s what ended up being my hook.  I tend to think in a very structured format and because there were 4 regions that HAD Safari Zones (or what I’d consider to be the general spirit of one) it made it easy for me to compare each of them against each other.

Beyond that I knew I wanted to keep the spirit of the styling similar to the games.  My goal for the viz is to give the end user an understanding of the types of Pokémon in each game.  To show some basic details about each pocket monster, but to have users almost feel like they’re on the Safari.

There’s also this feeling I wanted to capture – for anyone who has played Pokémon you may know it.  It’s the shake of the tall grass.  It is the tug of the Fishing Pole.  It’s the screen transition.  In a nutshell: what Pokémon did I just encounter?  There is a lot of magic in that moment of tall grass shake and transition to ‘battle’ or ‘encounter’ screen.

My hope is that I captured that well with the treemaps.  You are walking through each individual area and encountering Pokémon.  For the seasoned Safari-goer, you’ll be more interested in knowing WHERE you should go and understanding WHAT you can find there.  Hence the corresponding visuals surrounding.

The last component of this visualization was the Hover interactivity.  I hope it translates well because I wanted the interactivity to be very fluid.  It isn’t a click and uncover – that’s too active.  I wanted this to be a very passive and openly interactive visualization where the user would unearth more through exploring and not have to click.

#WorkoutWednesday Week 24 – Math Musings

The Workout Wednesday for week 24 is a great way to represent where a result for a particular value falls with respect to a broader collection.  I’ve used a spine chart recently on a project where most data was centered around certain points and I wanted to show the range.  Propagating maximums, minimums, averages, quartiles, and (when appropriate) medians can help to profile data very effectively.

So I started off really enjoying where this visualization was going.  Also because the spine chart I made on a recent project was before I even knew the thing I developed had already been named.  (Sad on my part, I should read more!)

My enjoyment turned into caution really quickly once I saw the data set.  There are several ratios in the data set and very few counts/sums of things.  My math brain screams trap!  Especially when we start tiptoeing into the world of what we semantically call “average of all” or “overall average” or something that somehow represents a larger collective (“everybody”).  There is a lot of open-ended interpretation that goes into this particular calculation and when you’re working with pre-computed ratios it gets really tricky really quickly.

Here’s a picture of the underlying data set:

 

Some things to notice right away – the ratios for each response are pre-computed.  The number of responses is different for each institution.  (To simplify this view, I’m on one year and one question).

So the heart of the initial question is this: if I want to compare my results to the overall results, how would I do that?  Now there are probably 2 distinct camps here.  1: take the average of one of the columns and use that to represent the “overall average”.  Let’s be clear on what that is: it is the average pre-computed ratio of a survey.  It is NOT the observed percentage of all individuals surveyed.  That would be option 2: the weighted average.  For the weighted average or to calculate a representation of all respondents we could add up all the qualifying respondents answering ‘agree’ and divide it by the total respondents.

Now we all know this concept of average of an average vs. weighted average can cause issues.  Specifically we’d feel the friction immediately if there were several low-end responses commingled with several higher response capturing entities.  EX: Place A: 2 people out of 2 answered yes (100%) and  Place B: 5 out of 100 answered ‘yes’ (5%).  If we average 100% and 5% we’ll get 52.5%.  But what if we take 7 out of 102, that’s 6.86% – a way different number.  (Intentionally extreme example.)

So my math brain was convinced that the “overall average” or “ratio for all” should be inclusive of the weights of each Institution.  That was fairly easy to compensate for: take each ratio and multiply it by the number of respondents to get raw counts and then add those all back up together.

The next sort of messy thing to deal with was finding the minimums and maximums of these values.  It seems straightforward, but when reviewing the data set and the specifications of what is being displayed there’s caution to throw with regard to level of aggregation and how the data is filtered.  As an example, depending on how the ratios are leveraged, you could end up finding the minimum of 3 differently weighted subjects to a subject group.  You could also probably find the minimum Institution + subject result at the subject level of all the subjects within a group.  Again I think the best bet here is to tread cautiously over the ratios and get into raw counts as quickly as possible.

So what does this all mean?  To me it means tread carefully and ask clear questions about what people are trying to measure.  This is also where I will go the distance and include calculations in tool tips to help demonstrate what the values I am calculating represent.  Ratios are tricky and averaging them is even trickier.  There likely isn’t a perfect way to deal with them and it’s something we all witness consistently throughout our professional lives (how many of us have averaged a pre-computed average handle time?).

Beyond the math tangent – I want to reiterate how great a visualization I think this is.  I also want to highlight that because I went deep-end math on it that I decided to go deep end development different.

The main difference from the development perspective?  Instead of using reference bands, I used a gannt bar as the IQR.  I really like using the bar because it gives users an easier target to hover over.  It also reduce some of the noise of the default labeling that occurs with reference lines.  To create the gannt bar – simply compute the IQR as a calculated field and use it as the size.  You can select one of the percentile points to be the start of the mark.

March & April Combined Book Binge

Time for another recount of the content I’ve been consuming.  I missed my March post, so I figured it would be fine to do a combined effort.

First up:

The Icarus Deception by Seth Godin

In my last post I mentioned that I got a recommendation to tune in to Seth and got the opportunity to hear him firsthand on Design Matters.  Well, here’s the first full Seth book I’ve consumed and it didn’t disappoint.  If I had to describe what this book contains – I would say that it is a near manifesto for the modern artist.  The world is run by industrialists and the artist is trying to break through.

I appreciate how Seth frames the concept of an artist – he unpacks the term and invites or ENCOURAGES everyone to identify as such.  Being an artist means being emotionally invested, showing up, giving a shit.  That giving a shit, caring, connecting is ALL there is.  That you succeed in the world by connecting, by sharing your art.  These concepts and ideals resonate deeply with me.  He also explains how vulnerable and gutting it can be to live as an artist – something I’ve felt and experienced several times.

During the course of listening to this book I was on site with a client.  We got to a certain point, agreed on the direction and visualizations, then shared them with the broader team.  The broader team came heavy with design suggestions – most notable the green/red discussion came in to play.  I welcome these challenges and as an artist and communicator it is my responsibility to share my process, listen to feedback, and collaborate to find a solution.  That definitely occurred throughout the process, but honestly caused me to lose my balance for a moment.

As I reflected on what happened – I was drawn to this idea that as a designer I try to have ultimate empathy for the end user.  And furthermore the amount of care given to the end user is never fully realized by the casual interactor.  A melancholy realization, but one that should not be neglected or forgotten.

Moving on to the next book:

Rework by Jason Fried & David Heinemeier Hansson

This one landed in my lap because it was available while perusing through library books.

A quick read that talks about how to succeed in business.  It takes an extreme focus on being married to a vision and committing to it.  The authors focus on getting work done.  Sticking to a position and seeing it through.  I very much appreciated that they were PROUD of decisions they made for their products and company.  Active decisions NOT to do something can be more liberating and make someone more successful than being everything to everyone.

Last up was this guy:

Envisioning Information by Edward Tufte

A continuation of reading through all the Tufte books.  I am being lazy by saying “more of the same.”  Or “what I’ve come to expect.”  These are lazy terms, but they encapsulate what Tufte writes about: understanding visual displays of information.  Analyzing at a deep level the good, bad, and ugly of displays to get to the heart of how we can communicate through visuals.

I particularly loved some of the amazing train time tables displayed.  This concept of using lines to represent timing of different routes was amazing to see.  And the way color is explored and leveraged is on another level.  I highly recommend this one if the thought of verbalizing your witnessing of Tufte’s strong tongue-in-cheek style sounds entertaining.  I know for me it was.

February Book Binge

Another month has passed, so it’s time to recount what I’ve been reading.

Admittedly it was kind of a busy month for me, so I decided to mix up some of my book habits with podcasts.  To reflect that – I’ve decided to share a mixture of both.

 

First up is Rhinoceros Success by Scott Alexander

This is a short read designed to ignite fire and passion into whoever reads it. It walks through how a big burly rhino would approach every day life, and how you as a rhino should follow suit.

I read this one while I was transitioning between jobs and found it to be a great source of humor during the process. It helps to articulate out ‘why’ you may be doing certain things and puts it in the context of what a rhino would do. This got me through some rough patches of uncertainty.

The next book was Made to Stick by the Heath brothers

This was another recommendation and one that I enjoyed. I will caveat and say that this book is really long. I struggled to try and get through a chapter at a time (~300 pages and only 7 chapters). It is chocked full of stories to help the reader understand the required model to make ideas stick.

I read this one because often times a big part of my job is communicating out a yet to be seen vision. And it is also to try and get people to buy-in to a new type of thinking. These aren’t easy and can be met with resistance. The tools that the Heath brothers offer are simple and straightforward. I think they even extend further to writing or public speaking. How do you communicate a compelling idea that will resonate with your audience?

I’ve got their 2 other books and will be reading one of them in March.

Lastly – I wanted to spend a little bit of time sharing a podcast that I’ve come to enjoy. It is Design Matters with Debbie Millman.

This was shared with me by someone on Twitter. I found myself commuting much more than average this much (as part of the job change) and I was looking for media to consume during the variable length (30 to 60 minute) commute. This podcast fits that time slot so richly. What’s awesome is the first podcast I listed to had Seth Godin on it (reading one of his books now) – so it was a great dual purpose item. I could hear Seth and preview if I should read one of his many books and also get a dose of Debbie.

The beauty of this podcast for me is that Debbie spends a lot of time exploring the personality and history of modern artists/designers. She does this by amassing research on each individual and then having a very long sit-down to discuss findings. Often times this involves analyzing individual perspectives and recounting significant past events. I always find it illuminating how these people view the world and how they’ve “arrived” at their current place in life.

That wraps up my content diet for the month – and I’m off to listen to Seth.

Makeover Monday Week 10 – Top 500 YouTube Game(r) Channels

We’re officially 10 weeks into Makeover Monday, which is a phenomenal achievement.  This means that I’ve actively participated in recreating 10 different visualizations with data varying from tourism, to Trump, to this week’s Youtube gamers.

First some commentary people may not like to read: the data set was not that great.  There’s one huge reason why it wasn’t great: one of the measures (plus a dimension) was a dependent variable on two independent variables.  And that dependent variable was processed via a pre-built algorithm.  So it would almost make sense to use the resultant dependent variable to enrich other data.

I’m being very abstract right now – here’s the structure of the data set:

Let’s walk through the fields:

  • Rank – this is a component based entirely on the sort chosen by the top (for this view it is by video views, not sure what those random 2 are, I just screencapped the site)
  • SB Score/Rank – this is some sort of ranking value applied to a user based on a propriety algorithm that takes a few variables into consideration
  • SB Score (as a letter grade) – the letter grade expression of the SB score
  • User – the name of the gamer channel
  • Subscribers – the # of channel subscribers
  • Video Views – the # of video views

As best as I can tell through reading the methodology – SB score/rank (the # and the alpha) are influenced in part from the subscribers and video views.  Which means putting these in the same view is really sort of silly.  You’re kind of at a disadvantage if you scatterplot subscribers vs. video views because the score is purportedly more accurate in terms of finding overall value/quality.

There’s also not enough information contained within the data set to amass any new insights on who is the best and why.  What you can do best with this data set is summarization, categorization, and displaying what I consider data set “vitals.”

So this is the approach that I took.  And more to that point, I wanted to make over a very specific chart style that I have seen Alberto Cairo employ a few times throughout my 6 week adventure in his MOOC.

That view: a bar chart sliced through with lines to help understand size of chunks a little bit better.  This guy:

So my energy was focused on that – which only happened after I did a few natural (in my mind) steps in summarizing the data, namely histograms:

Notice here that I’ve leveraged the axis values across all 3 charts (starting with SB grade and through to it’s sibling charts to minimize clutter).  I think this has decent effect, but I admit that the bars aren’t equal width across each bar chart.  That’s not pleasant.

My final two visualizations were to demonstrate magnitude and add more specifics in a visual manner to what was previously a giant text table.

The scatterplot helps to achieve this by displaying the 2 independent variables with the overall “SB grade” encoded on both color and size.  Note: for size I did powers of 2: 2^9, 2^8, 2^7…2^1.  This was a decent exponential effect to break up the sizing in a consistent manner.

The unit chart on the right is to help demonstrate not only the individual members, but display the elite A+ status and the terrible C+, D+, and D statuses.  The color palette used throughout is supposed to highlight these capstones – bright on the edges and random neutrals between.

This is aptly named an exploration because I firmly believe the resultant visualization was built to broadly pluck away at the different channels and get intrigued by the “details.”  In a more real world I would be out hunting for additional data to tag this back to – money, endorsements, average video length, number of videos uploaded, subject matter area, type of ads utilized by the user.  All of these appended to this basic metric aimed at measuring a user’s “influence” would lead down the path of a true analysis.

The Flow of Human Migration

Today I decided to take a bit of a detour while working on a potential project for #VizForSocialGood.  I was focused on a data set provided by UNICEF that showed the number of migrants from different areas/regions/countries to destination regions/countries.  I’m pretty sure it is the direct companion to a chord diagram that UNICEF published as part of their Uprooted report.

As I was working through the data, I wanted to take it and start at the same place.  Focus on migration globally and then narrow the focus in on children affected by migration.

Needless to say – I got side tracked.  I started by wanting to make paths on maps showing the movement of migrants.  I haven’t really done this very often, so I figured this would be a great data set to play with.  Once I set that up, it quickly divulged into something else.

I wasn’t satisfied with the density of the data.  The clarity of how it was displayed wasn’t there for me.  So I decided to take an abstract take on the same concept.  As if by fate I had received Chart Chooser cards in the mail earlier and Josh and I were reviewing them.  We were having a conversation about the various uses of each chart and brainstorming on how it could be incorporated into our next Tableau user group (I really do eat, drink, and breathe this stuff).

Anyway – one of the charts we were talking about was the sankey diagram.  So it was already on my mind and I’d seen it accomplished multiple times in Tableau.  It was time to dive in and see how this abstraction would apply to the geospatial.

I started with Chris Love’s basic tutorial of how to set up a sankey.  It’s a really straightforward read that explains all the concepts required to make this work.  Here’s the quick how-to in my paraphrased words.

  1. Duplicate your data via a Union, identify the original and the copy (Which is great because I had already done this for the pathing)  As I understand it from Chris’s write-up this let’s us ‘stretch out’ the data so to speak.
  2. Once the data is stretched out, it’s filled in by manipulating the binning feature in Tableau.  My interpretation would be that the bins ‘kind of’ act like dimensions (labeled out by individual integers).  This becomes useful in creating individual points that eventually turn into the line (curve).
  3. Next there are ranking functions made to determine the starting and end points of the curves.
  4. Finally the curve is built using a mathematical function called a sigmoid function.  This is basically an asymptotic function that goes from -1 to 1 and has a middle area with a slope of ~1.
  5. After the curve is developed, the points are plotted.  This is where the ranking is set up to determine the leftmost and rightmost points.  Chris’s original specifications had the ranking straightforward for each of the dimensions.  My final viz is a riff on this.
  6. The last steps are to switch the chart to a line chart and then build out the width (size) of the line based on the measure you used in the ranking (percent of total) calculation.

So I did all those steps and ended up with exactly what was described – a sankey diagram.  A brilliant one too, I could quickly switch the origin dimension to different levels (major area, region, country) and do similar work on the destination side.  This is what ultimately led me to the final viz I made.

So while adjusting the table calculations, I came to one view that I really enjoyed.  The ranking pretty much “broke” for the initial starting point (everything was at 1), but the destination was right.  What this did for the viz was take everything from a single point and then create roots outward.  Initial setup had this going from left to right – but it was quite obvious that it looked like tree roots.  So I flipped it all.

I’ll admit – this is mostly a fun data shaping/vizzing exercise.  You can definitely gain insights through the way it is deployed (take a look at Latin America & Caribbean).

After the creation of the curvy (onion shape), it was a “what to add next” free for all.  I had wrestled with the names of the destination countries to try and get something reasonable, but couldn’t figure out how to display them in proximity with the lines.  No matter – the idea of a word cloud seemed kind of interesting.  You’d get the same concept of the different chord sizes passed on again and see a ton of data on where people are migrating.  This also led to some natural interactivity of clicking on a country code to see its corresponding chords above.

Finally to add more visual context a simple breakdown of the major regions origin to destinations.  To tell the story a bit further.  The story points for me: most migrants move within their same region, except for Latin America/Caribbean.

And so it beings – Adventures in Python

Tableau 10.2 is on the horizon and with it comes several new features – one that is of particular interest to me is their new Python integration.  Here’s the Beta program beauty shot:

Essentially what this will mean is that more advanced programming languages aimed at doing more sophisticated analysis will become an easy to use extension of Tableau.  As you can see from the picture, it’ll work similar to how the R integration works with the end-user using the SCRIPT_STR() function to pass through the native Python code and allowing output.

I have to admit that I’m pretty excited by this.  For me I see this propelling some data science concepts more into the mainstream and making it much easier to communicate and understand the purpose behind them.

In preparation I wanted to spend some time setting up a Linux Virtual Machine to start getting a ‘feel’ for Python.

(Detour) My computer science backstory: my intro to programming was C++ and Java.  They both came easy to me.  I tried to take a mathematics class based in UNIX later on that was probably the precursor to some of the modern languages we’re seeing, but I couldn’t get on board with the “terminal” level entry.  Very off putting coming from a world where you have a better feedback loop in terms of what you’re coding.  Since that time (~9 years ago) I haven’t had the opportunity to encounter or use these types of languages.  In my professional world everything is built on SQL.

Anyway, back to the main heart – getting a box set up for Python.  I’m a very independent person and like to take the knowledge I’ve learned over time and troubleshoot my way to results.  The process of failing and learning on the spot with minimal guidance helps me solidify my knowledge.

Here’s the steps I went through – mind you I have a PC and I am intentionally running Windows 7.  (This is a big reason why I made a Linux VM)

  1. Download and install VirtualBox by Oracle
  2. Download x86 ISO of Ubuntu
  3. Build out Linux VM
  4. Install Ubuntu

These first four steps are pretty straightforward in my mind.  Typical Windows installer for VirtualBox.  Getting the image is very easy as is the build (just pick a few settings).

Next came the Python part.  I figured I’d have to install something on my Ubuntu machine, but I was pleasantly surprised to learn that Ubuntu already comes with Python 2.7 and 3.5.  A step I don’t have to do, yay!

Now came the part where I hit my first real challenge.  I had this idea of getting to a point where I could go through steps of doing sentiment analysis outlined by Brit Cava on the Tableau blog.  I’d reviewed the code and could follow the logic decently well.  And I think this is a very extensible starting point.

So based on the blog post I knew there would be some Python modules I’d be in need of.  Googling led me to believe that installing Anaconda would be the best path forward, it contains several of the most popular Python modules.  Thus installing it would eliminate the need to individually add in modules.

I downloaded the file just fine, but instructions on “installing” were less than stellar.  Here’s the instructions:

Directions on installing Anaconda on Linux

So as someone who takes instructions very literal (and again – doesn’t know UNIX very well) I was unfortunately greeted with a nasty error message lacking any help.  Feelings from years ago were creeping in quickly.  Alas, I Googled my way through this (and had a pretty good inkling that it just couldn’t ‘find’ the file).

What they said (also notice I already dropped the _64) since mine isn’t 64-bit.

 

Alas – all that was needed to get the file to install!

So installing Anaconda turned out to be pretty easy.  After getting the right code in the prompt.  Then came the fun part, trying to do sentiment analysis.  I knew enough based on reading that Anaconda came with the three modules mentioned: pandas, nltk, and time.  So I felt like this was going to be pretty easy to try and test out – coding directly from the terminal.

Well – I hit my second major challenge.  The lexicon required to do the sentiment analysis wasn’t included.  So, I had no way of actually doing the sentiment analysis and was left to figure it out on my own.  This part was actually not that bad, Python did give me a good prompt to fix – essentially to call the nltk downloader and get the lexicon.  And the nltk downloader has a cute little GUI to find the right lexicon (vader).  I got this installed pretty quickly.

Finally – I was confident that I could input the code and come up with some results.  And this is where I hit my last obstacle and probably the most frustrating of the night.  When pasting in the code (raw form from blog post) I kept running into errors.  The message wasn’t very helpful and I started cutting out lines of code that I didn’t need.

What’s the deal with line 5?

Eventually I figured out the problem – there were weird spaces in the raw code snippet.  To which after some additional googling (this time from my husband) he kindly said “apparently spaces matter according to this forum.”  No big deal – lesson learned!

Yes! Success!

So what did I get at the end of the day?  A wonderful CSV output of sentiment scores for all the words in the original data set.

Looking good, there’s words and scores!
Back to my comfort zone – a CSV

Now for the final step – validate that my results aligned with expectations.  And it did – yay!

0.3182 = 0.3182

Next steps: viz the data (obviously).  And I’m hoping to extend this to an additional sentiment analysis, maybe even something from Twitter.  Oh and I also ended up running (you guessed it, already installed) a Jupyter notebook to get over the pain of typing directly in the Terminal.