Makeover Monday Week 10 – Top 500 YouTube Game(r) Channels

We’re officially 10 weeks into Makeover Monday, which is a phenomenal achievement.  This means that I’ve actively participated in recreating 10 different visualizations with data varying from tourism, to Trump, to this week’s Youtube gamers.

First some commentary people may not like to read: the data set was not that great.  There’s one huge reason why it wasn’t great: one of the measures (plus a dimension) was a dependent variable on two independent variables.  And that dependent variable was processed via a pre-built algorithm.  So it would almost make sense to use the resultant dependent variable to enrich other data.

I’m being very abstract right now – here’s the structure of the data set:

Let’s walk through the fields:

  • Rank – this is a component based entirely on the sort chosen by the top (for this view it is by video views, not sure what those random 2 are, I just screencapped the site)
  • SB Score/Rank – this is some sort of ranking value applied to a user based on a propriety algorithm that takes a few variables into consideration
  • SB Score (as a letter grade) – the letter grade expression of the SB score
  • User – the name of the gamer channel
  • Subscribers – the # of channel subscribers
  • Video Views – the # of video views

As best as I can tell through reading the methodology – SB score/rank (the # and the alpha) are influenced in part from the subscribers and video views.  Which means putting these in the same view is really sort of silly.  You’re kind of at a disadvantage if you scatterplot subscribers vs. video views because the score is purportedly more accurate in terms of finding overall value/quality.

There’s also not enough information contained within the data set to amass any new insights on who is the best and why.  What you can do best with this data set is summarization, categorization, and displaying what I consider data set “vitals.”

So this is the approach that I took.  And more to that point, I wanted to make over a very specific chart style that I have seen Alberto Cairo employ a few times throughout my 6 week adventure in his MOOC.

That view: a bar chart sliced through with lines to help understand size of chunks a little bit better.  This guy:

So my energy was focused on that – which only happened after I did a few natural (in my mind) steps in summarizing the data, namely histograms:

Notice here that I’ve leveraged the axis values across all 3 charts (starting with SB grade and through to it’s sibling charts to minimize clutter).  I think this has decent effect, but I admit that the bars aren’t equal width across each bar chart.  That’s not pleasant.

My final two visualizations were to demonstrate magnitude and add more specifics in a visual manner to what was previously a giant text table.

The scatterplot helps to achieve this by displaying the 2 independent variables with the overall “SB grade” encoded on both color and size.  Note: for size I did powers of 2: 2^9, 2^8, 2^7…2^1.  This was a decent exponential effect to break up the sizing in a consistent manner.

The unit chart on the right is to help demonstrate not only the individual members, but display the elite A+ status and the terrible C+, D+, and D statuses.  The color palette used throughout is supposed to highlight these capstones – bright on the edges and random neutrals between.

This is aptly named an exploration because I firmly believe the resultant visualization was built to broadly pluck away at the different channels and get intrigued by the “details.”  In a more real world I would be out hunting for additional data to tag this back to – money, endorsements, average video length, number of videos uploaded, subject matter area, type of ads utilized by the user.  All of these appended to this basic metric aimed at measuring a user’s “influence” would lead down the path of a true analysis.

Makeover Monday Week 9 – Andy’s AMEX

So I started my dream job at the beginning of February.  This means I’ve been spending the month adjusting and tweaking my personal schedule and working on bringing back good habits.  In particular – I’ve missed out on doing daily workouts and consistently blogging about data viz.  Fortunately I’ve been keeping up with the practice component (Makeover Monday, Hackathon, Workout Wednesday), but I wholeheartedly believe in the holistic approach of sharing the thought process behind the viz.  (TL;DR – this was my paragraph of empty excuses)

Moving on then- the thought process behind the makeover.  And what’s even more interesting perhaps is that I can almost post-viz take some of the thoughts that Andy had regarding this week’s visualizations and provide my context.

Based on the original visualization I had an inkling that there wasn’t going to be a ton of data funneling in.  Being an individual who tracks all expenses and has seen them visually represented, I felt like food should represent a larger proportion of expenses.

Andy’s AMEX ’16

For reference, here’s a wonderful donut chart that Mint.com provided me on my top 3 most used credit cards.  I funnel everything I can through credit cards, and food in general takes up a huge portion of spend.

Ann’s ’16 Credit Card Spending

Both of these visualizations leave something to be desired.  I like Andy’s original AMEX one better than the donut I got, but they are both very distilled.  Andy spent a lot on transportation and travel, and apparently I spent a lot on shopping and education.

Getting REALLY specific about the data – there were 110 records (FYI my donut is 477, 209 is food/dining).  Plotting the data quickly over time, there were large gaps of time with no purchases.

Armed with this, I decided to take an approach of piggybacking off the predefined categories to see if throughout time Andy typically has one category that gets a lot of spend, or to see if the spend trends are lumped together.

More to that point, I wanted to show the way the data was dispersed in a daily fashion… so I went down this path.  The largest transaction for each day plotted (using the category on color, amount on size) across the 12 months.  I actually really like this view because I can clearly see the large vehicle purchase in December and you get a better feel for how spread out the card’s utilization was.  (I am guessing my lack of axis label on the day of the month is jarring.)

Also because I hate color legends, this meant I needed to introduce the idea of a color legend via data points elsewhere and led to the first view:

So… I kind of got really interested in utilization frequency and wanted to take it further.  So the next step was to make a barcode chart.  Very similar in concept to what the “top daily spend” is showing, but not limiting the data to only the top daily in this case.

Insights gained here – I get this feeling that Andy may only (or mostly) use his AMEX for meals where he’s out traveling.  Hovering over the points would add more insight to the transaction values.  More than that, we get a feel for what this card is generally swiped for: the 3 categories at the bottom (and FYI I ranked these by sum dollars spent).

Finally – bundling it together in a palatable format – what were the headline transactions for the year?  I wanted to do monthly and have categories, but there wasn’t enough data.  So I opted to go transaction level and keep it top 5 each quarter.  I think there’s novelty here in terms of presentation, but also value in quick rough comparisons of values over each quarter.

And this rounds out the end of the analysis.  Most of the transactions here are centered around travel.  My brain is not sanitized enough to say what you can infer – I have too much generalized knowledge of how Andy’s profession could explain these findings to present from a pure lack of knowledge standpoint.  (TL;DR – I know that Andy travels for his job, was surprised #data16 wasn’t an obvious point within the data set)

So the thought process in general behind the path I took this: I wanted to explore how often Andy spends money in certain categories.  I was intrigued by frequency of usage to see if it could eventually point back to provide the data creator (the guy who bought stuff) some additional aha! moments.

To be more honest – I actually think this is something that I would want for myself.  I in particular would love to plot my Amazon.com transactions and see how that changes throughout the year.  Both in barcode for frequency (imagining Black Friday is heavy) and then to see if I’m utilizing their services any differently.  (I have this feeling that grocery type purchases are on the climb).

Oh – and in terms of asking about colors and fonts: I did go Andy’s blog for inspiration.  I wanted to do a red/blue motif based on the blog, but needed more colors.  So I think I googled “blue color palette” and ended up with this cute starting palette that evolved into having pops of orange and yellow.  Font: I left this to something minimal that I thought Andy would be okay with (Arial Narrow) that would also bode well across all platforms.

Makeover Monday Week 8 – Potatoes in the EU

I’ll say this first – I don’t eat potatoes.  Although potatoes are super tasty, I refuse to have them as part of my diet.  So I was less than thrilled about approaching a week that was pure potato (especially coming off the joy of Valentine’s Day).  Nonetheless – it presented itself with a perfect opportunity for growth and skill testing.  Essentially, if I could make a viz I loved about a vegetable I hate – that would speak to my ability to interpret varying data sets and build out displays.

I’m very pleased with the end results.  I think it has a very Stephen Few-esque approach.  Several small multiples with high and low denoted, color playing throughout as a dual encoder.  And there’s even visual interest in how the data was sorted for data shape.

So how did I arrive there?  It started with the bar chart of annual yield.  I had an idea on color scheme and knew that I wanted to make it more than gray.

This gave perfect opportunity to highlight the minimum and maximum yields.  To see what years different countries production was affected by things like weather and climate.  It’s actually very interesting to see that not too many of the dark bars (max value) are in more recent times.  Seems like agricultural innovation is keeping pace with climate issues.

After that I was hooked on this idea of sets of 3.  So I knew I wanted to replicate a small multiple in a different way using the same sort order.  That’s where Total Yield came in.  I’ve been pondering this one in the shower on the legitimacy of adding up annual ratios for an overall yield.  My brain says it’s fine because the size of the country doesn’t change.  But my vulnerable brain part says that someone may take issue with it.  I’d love for a potato farming expert to come along and tell me if that’s a silly thing to add up.  I see the value in doing a straight total comparison of the years.  Because although there’s fluctuation in the yield annually, we have a normalized way to show how much each countries produces irrespective of total land size.

Next was the dot plot of the current year.  This actually started out its life as a KPI indicator of up or down from previous, but it was too much for the visual.  I felt the idea of the dot plot of current year would do more justice to “right now” understanding.  Especially because you can do some additional visual comparison to its flanks and see more insight.

And then rinse/repeat for the right side.  This is really where things get super interesting.  The amount of variability in pricing for each country, both by average and current year.  Also – 2013 was a great year for potatoes.

Book Binge – January Edition

It’s time for another edition of book binge – a random category of blog posts devised (and now only on its second iteration) where I walk through the different books I’ve read and purchased this month.

First – a personal breakthrough!  I have always been an avid reader, but admittedly become lazy in recent years.  Instead of reading at least one book a month, I was going on small reading sprees of 2 or 3 books every four or five months.  After the success of my December reads, I figured I would keep things going and try to substitute books as entertainment whenever possible.

Here are a few books I read in January:

The Functional Art by Alberto Cairo

I picked this one up because it is quintessential to the world of data journalism and data visualization.  I also thought it would be great to get into the head of one of the instructors of a MOOC I’m taking.  Plus who can resist the draw of the slope chart on the cover?

I loved this one.  I like Alberto’s writing style.  It is rooted in logic and his use of text spacing and bold as emphasis is heavy on impact.  I appreciate that he says data visualization has to first be functional, but reminds us that it has to be seen to matter.  It’s also interesting to read the interviews/profiles in the end of the book of journalists.  This is an excellent way for me to shift my perspective and paradigm.  I come from the analysis/mathematical side of things – these folks are there to communicate stories of data.  A great read that is broken up in such a way that it is easy to digest.

Next book was The Visual Display of Quantitative Information by Edward Tufte

Obviously a classic read for anyone in the data visualization world by the “father” of modern information graphics.  I must admit I picked up all 4 of Tufte’s books in December, and couldn’t get my brain wrapped around them.  I was flipping through the pages to get a sense for how the information was contained and felt a little intimidated.  That intimidation was all in my head.  Once I began reading – the flow of information made perfect sense.

I appreciate Tufte’s voice and axiom type approach to information graphics.  Yes – there are times when it is snarky and absurd, but it is also full of purpose.  He walks through information graphics history, spotlighting many of the greats and lamenting the lack of recent progression (or more of a recession) in the art.

I have two favorites in this one: how he communicates small multiples and sparklines.  The verbiage used to describe the impact (and amount of information) small multiples can convey is poetic (and I don’t really like poetry).  His work on developing and demonstrating sparklines is truly illuminating.  There were times where I had dreams of putting together some of the high “data-ink” low “chartjunk” visuals that he described.  And his epilogue makes me forgive all the snarkyness.  The first in a series that I am ecstatic to continue to read.

The last book I’ll highlight this month was a short read – a Christmas present from a friend.

Together is Better by Simon Sinek

I’m very familiar with Simon – mostly because of his famous TED talk on starting with why. I’ve read his book on the subject as well. So I was delighted to be handed this tiny gem.  Written in hybrid format of children’s book and inspirational quote book – this is a good one to flip through if you’re in need of a quiet moment.  Simon calls himself a self professed optimist at the end, and that’s definitely how I left the book feeling.

It aims at sparking the inner fire we all have – and the most powerful moment: Simon saying that you don’t have to invent a new idea and then follow it.  It is perfectly acceptable to commit to someone else’s vision and follow them.  It frees you completely from the world of “special,” new, and different that entrepreneurial and ambitious types (myself) get hung up on.  You don’t have to make up an original idea – just find something that resonates deeply with you and latch on.  That is just as powerful as being a visionary.

The other part of this book devotes a significant amount of snippet takes on leadership.  A friendly reminder of what leadership looks like.  Leadership is not management.

I’ve got more books on the way and will be back in a month with three new reads to share.

The Flow of Human Migration

Today I decided to take a bit of a detour while working on a potential project for #VizForSocialGood.  I was focused on a data set provided by UNICEF that showed the number of migrants from different areas/regions/countries to destination regions/countries.  I’m pretty sure it is the direct companion to a chord diagram that UNICEF published as part of their Uprooted report.

As I was working through the data, I wanted to take it and start at the same place.  Focus on migration globally and then narrow the focus in on children affected by migration.

Needless to say – I got side tracked.  I started by wanting to make paths on maps showing the movement of migrants.  I haven’t really done this very often, so I figured this would be a great data set to play with.  Once I set that up, it quickly divulged into something else.

I wasn’t satisfied with the density of the data.  The clarity of how it was displayed wasn’t there for me.  So I decided to take an abstract take on the same concept.  As if by fate I had received Chart Chooser cards in the mail earlier and Josh and I were reviewing them.  We were having a conversation about the various uses of each chart and brainstorming on how it could be incorporated into our next Tableau user group (I really do eat, drink, and breathe this stuff).

Anyway – one of the charts we were talking about was the sankey diagram.  So it was already on my mind and I’d seen it accomplished multiple times in Tableau.  It was time to dive in and see how this abstraction would apply to the geospatial.

I started with Chris Love’s basic tutorial of how to set up a sankey.  It’s a really straightforward read that explains all the concepts required to make this work.  Here’s the quick how-to in my paraphrased words.

  1. Duplicate your data via a Union, identify the original and the copy (Which is great because I had already done this for the pathing)  As I understand it from Chris’s write-up this let’s us ‘stretch out’ the data so to speak.
  2. Once the data is stretched out, it’s filled in by manipulating the binning feature in Tableau.  My interpretation would be that the bins ‘kind of’ act like dimensions (labeled out by individual integers).  This becomes useful in creating individual points that eventually turn into the line (curve).
  3. Next there are ranking functions made to determine the starting and end points of the curves.
  4. Finally the curve is built using a mathematical function called a sigmoid function.  This is basically an asymptotic function that goes from -1 to 1 and has a middle area with a slope of ~1.
  5. After the curve is developed, the points are plotted.  This is where the ranking is set up to determine the leftmost and rightmost points.  Chris’s original specifications had the ranking straightforward for each of the dimensions.  My final viz is a riff on this.
  6. The last steps are to switch the chart to a line chart and then build out the width (size) of the line based on the measure you used in the ranking (percent of total) calculation.

So I did all those steps and ended up with exactly what was described – a sankey diagram.  A brilliant one too, I could quickly switch the origin dimension to different levels (major area, region, country) and do similar work on the destination side.  This is what ultimately led me to the final viz I made.

So while adjusting the table calculations, I came to one view that I really enjoyed.  The ranking pretty much “broke” for the initial starting point (everything was at 1), but the destination was right.  What this did for the viz was take everything from a single point and then create roots outward.  Initial setup had this going from left to right – but it was quite obvious that it looked like tree roots.  So I flipped it all.

I’ll admit – this is mostly a fun data shaping/vizzing exercise.  You can definitely gain insights through the way it is deployed (take a look at Latin America & Caribbean).

After the creation of the curvy (onion shape), it was a “what to add next” free for all.  I had wrestled with the names of the destination countries to try and get something reasonable, but couldn’t figure out how to display them in proximity with the lines.  No matter – the idea of a word cloud seemed kind of interesting.  You’d get the same concept of the different chord sizes passed on again and see a ton of data on where people are migrating.  This also led to some natural interactivity of clicking on a country code to see its corresponding chords above.

Finally to add more visual context a simple breakdown of the major regions origin to destinations.  To tell the story a bit further.  The story points for me: most migrants move within their same region, except for Latin America/Caribbean.

And so it beings – Adventures in Python

Tableau 10.2 is on the horizon and with it comes several new features – one that is of particular interest to me is their new Python integration.  Here’s the Beta program beauty shot:

Essentially what this will mean is that more advanced programming languages aimed at doing more sophisticated analysis will become an easy to use extension of Tableau.  As you can see from the picture, it’ll work similar to how the R integration works with the end-user using the SCRIPT_STR() function to pass through the native Python code and allowing output.

I have to admit that I’m pretty excited by this.  For me I see this propelling some data science concepts more into the mainstream and making it much easier to communicate and understand the purpose behind them.

In preparation I wanted to spend some time setting up a Linux Virtual Machine to start getting a ‘feel’ for Python.

(Detour) My computer science backstory: my intro to programming was C++ and Java.  They both came easy to me.  I tried to take a mathematics class based in UNIX later on that was probably the precursor to some of the modern languages we’re seeing, but I couldn’t get on board with the “terminal” level entry.  Very off putting coming from a world where you have a better feedback loop in terms of what you’re coding.  Since that time (~9 years ago) I haven’t had the opportunity to encounter or use these types of languages.  In my professional world everything is built on SQL.

Anyway, back to the main heart – getting a box set up for Python.  I’m a very independent person and like to take the knowledge I’ve learned over time and troubleshoot my way to results.  The process of failing and learning on the spot with minimal guidance helps me solidify my knowledge.

Here’s the steps I went through – mind you I have a PC and I am intentionally running Windows 7.  (This is a big reason why I made a Linux VM)

  1. Download and install VirtualBox by Oracle
  2. Download x86 ISO of Ubuntu
  3. Build out Linux VM
  4. Install Ubuntu

These first four steps are pretty straightforward in my mind.  Typical Windows installer for VirtualBox.  Getting the image is very easy as is the build (just pick a few settings).

Next came the Python part.  I figured I’d have to install something on my Ubuntu machine, but I was pleasantly surprised to learn that Ubuntu already comes with Python 2.7 and 3.5.  A step I don’t have to do, yay!

Now came the part where I hit my first real challenge.  I had this idea of getting to a point where I could go through steps of doing sentiment analysis outlined by Brit Cava on the Tableau blog.  I’d reviewed the code and could follow the logic decently well.  And I think this is a very extensible starting point.

So based on the blog post I knew there would be some Python modules I’d be in need of.  Googling led me to believe that installing Anaconda would be the best path forward, it contains several of the most popular Python modules.  Thus installing it would eliminate the need to individually add in modules.

I downloaded the file just fine, but instructions on “installing” were less than stellar.  Here’s the instructions:

Directions on installing Anaconda on Linux

So as someone who takes instructions very literal (and again – doesn’t know UNIX very well) I was unfortunately greeted with a nasty error message lacking any help.  Feelings from years ago were creeping in quickly.  Alas, I Googled my way through this (and had a pretty good inkling that it just couldn’t ‘find’ the file).

What they said (also notice I already dropped the _64) since mine isn’t 64-bit.

 

Alas – all that was needed to get the file to install!

So installing Anaconda turned out to be pretty easy.  After getting the right code in the prompt.  Then came the fun part, trying to do sentiment analysis.  I knew enough based on reading that Anaconda came with the three modules mentioned: pandas, nltk, and time.  So I felt like this was going to be pretty easy to try and test out – coding directly from the terminal.

Well – I hit my second major challenge.  The lexicon required to do the sentiment analysis wasn’t included.  So, I had no way of actually doing the sentiment analysis and was left to figure it out on my own.  This part was actually not that bad, Python did give me a good prompt to fix – essentially to call the nltk downloader and get the lexicon.  And the nltk downloader has a cute little GUI to find the right lexicon (vader).  I got this installed pretty quickly.

Finally – I was confident that I could input the code and come up with some results.  And this is where I hit my last obstacle and probably the most frustrating of the night.  When pasting in the code (raw form from blog post) I kept running into errors.  The message wasn’t very helpful and I started cutting out lines of code that I didn’t need.

What’s the deal with line 5?

Eventually I figured out the problem – there were weird spaces in the raw code snippet.  To which after some additional googling (this time from my husband) he kindly said “apparently spaces matter according to this forum.”  No big deal – lesson learned!

Yes! Success!

So what did I get at the end of the day?  A wonderful CSV output of sentiment scores for all the words in the original data set.

Looking good, there’s words and scores!
Back to my comfort zone – a CSV

Now for the final step – validate that my results aligned with expectations.  And it did – yay!

0.3182 = 0.3182

Next steps: viz the data (obviously).  And I’m hoping to extend this to an additional sentiment analysis, maybe even something from Twitter.  Oh and I also ended up running (you guessed it, already installed) a Jupyter notebook to get over the pain of typing directly in the Terminal.

Makeover Monday 2017 – Week 4 New Zealand Tourism

This week’s Makeover was addressing Domestic and International tourism trend in New Zealand.  No commentary provided with the data set, the original was just 2 charts left to the user to interpret.  See Eva’s tweet for the originals:

Going back to basics this week with what I like and dislike about it:

  • Titles are clear, bar chart isn’t too busy (like)
  • Not too many grid lines (like)
  • It’s easy to see the shape of the data and seasonality (like)
  • The scales are different between International & Domestic (dislike)
  • 3 years for easy comparison (like)
  • Eva chose this to promote her home country (like)

I think this was a good data set for week 4.  No data story to rewrite, special attention was made by Eva to mitigate data misinterpretation, and she added on a bonus of geospatial data for New Zealand.

My process really began with the geospatial part.  I haven’t yet had a chance to work with geospatial particulars developed and appended to data sets.  My experience in this has been limited to using Tableau’s functionality to manually add in latitude and longitude for unclear/missing/invalid data points.

So as I got started, I had no idea how to use the data.  There were a few fields that certainly pointed me in the right direction.  The first was “Point Order.”  Immediately I figured that needed to be used on “path” to determine where each data point fell.  That got me to this really cute outlined version of NZ (which looks like an upside down boot):

So I knew something additional needed to be done to get to a filled map.  That’s when I discovered the “PolygonNumber” field.  Throwing that onto detail, changing my marks to polygon and voila – New Zealand.  Here’s a Google image result for a comparison:

Eva did a great job trying to explain how NZ is broken up in terms of regions/territories/areas, but I have to admit I got a little lost.  I think what’s clear from the two pictures is I took the most granular approach to dissecting the country.

I’m super thrilled that I got this hands on opportunity to implement.  Geospatial is one of those areas of analytics that everyone wants to go and by including this – I feel much more equipped for challenges in the future.

Next up was the top viz: I’ve been wanting to try out a barbell/DNA chart for a long time.  I’ve made them in the past, but nothing that’s landed in a final viz.  I felt like there was an opportunity to try this out with the data set based on the original charts.  I quickly threw that together (using Andy Kriebel’s video tutorial) and really enjoyed the pattern that emerged.

The shape of the data really is what kicked off the path that the final viz ended up taking on.  I liked the stratification of domestic vs. international and wanted to carry that throughout.  This is also where I chose a colors.

The bottom left chart started out it’s life as a slope chart.  I originally did the first data point vs. the last data point (January 2008 vs. April 2016) for both of the types of tourism.  It came out to be VERY misleading – international had plummeted.  When I switched over to an annual aggregation, the story was much different.

International NZ tourism is failing!
Things look less scary and International is improving!

Good lesson in making sure to holistically look at the data.  Not to go super macro and get it wrong.  Find the right level of aggregation that keeps the message intact.

The last viz was really taking the geospatial component and adding in the tourism part.  I am on a small multiples kick and loved the novelty of having NZ on there more than once.  Knowing that I could repeat the colors again by doing a dual axis map got me sold.

All that was left was to add in interactivity.  Interactivity that originally was based on the barbell and line chart for the maps, but wasn’t quite clear.  I HATE filter drop downs for something that is going to be a static presentation (Twitter picture), so I wanted to come up with a way to give the user a filter option for the maps (because the shading does change over time), but have it be less tied to the static companion vizzes.  This is where I decided to make a nice filter sheet of the years and drop a nice diverging color gradient to add a little more beauty.  I’m really pleased with how that turned out.

My last little cute moment is the data sourcing.  The URLs are gigantic and cluttered the viz.  So instead I made a basic sheet with URL actions to quickly get to both data sets.

A fun week and one that I topped off by spelling Tourism wrong in the initial Tweet (haha).  Have to keep things fun and not super serious.

Full dashboard here.

Synergy through Action

This has been an amazing week for me.  On the personal side of things my ship is sailing in the right direction.  It’s amazing what the new year can do to clarify values and vision.

Getting to the specifics of why I’m calling this post “Synergy through Action.”  That’s the best way for me to describe how my participation in this week’s Tableau and data visualization community offerings have influenced me.

It all actually started on Saturday.  I woke up and spent the morning working on a VizforSocialGood project, specifically a map to represent the multiple locations connected to the February 2017 Women in Data Science conference.  I’d been called out on Twitter (thanks Chloe) and felt compelled to participate.  The kick of passion I received after submitting my viz propelled me into the right mind space to tackle 2 papers toward my MBA.

Things continued to hold steady on Sunday where I took on the #MakeoverMonday task of Donald Trump’s tweets.  I have to imagine that the joy from accomplishment was the huge motivator here.  Otherwise I can easily imagine myself hitting a wall.  Or perhaps it gets easier as time goes on?  Who knows, but I finished that viz feeling really great about where the week was headed.

Monday – Alberto Cairo and Heather Krause’s MOOC was finally open!  Thankfully I had the day off to soak it all in.  This kept my brain churning.  And by Wednesday I was ready for a workout!

So now that I’ve described my week – what’s the synergy in action part?  Well I took all the thoughts from the social good project, workout Wednesday, and the sage wisdom from the MOOC this week to hit on something much closer to home.

I wound up creating a visualization (in the vein of) the #WorkoutWednesday redo offered up.  What’s it of?  Graduation rates of specific demographics for every county in Arizona for the past 10ish years.  Stylized into small multiples using at smattering of slick tricks I was required to use to complete the workout.

Here’s the viz – although admittedly it is designed more as a static view (not quite an infographic).

 

And to sum it all up: this could be the start of yet another spectacular thing.  Bringing my passion to the local community that I live in – but more on a widespread level (in the words of Dan Murray, user groups are for “Tableau zealots”).

Makeover Monday 2017 – Week 3 Trump Tweets

**Update (1/20/17) : The original data set had a date formatting snafu resulting in 1307 tweets at the 12:00-12:59 PM (UTC time) hour to be displayed as 00:00-00:59 (aka 12 AM hour).  This affected 4.3% of the original data set visualization and has been corrected.  I have also added a footnote denoting the visualization is in EST.  This affects the shape of the data in both the 4 AM – 8 AM and 4 PM – 8 PM sections.

Rolling right along into week 3’s Makeover Monday.  The data set this week: Donald Trump’s tweets.  The original Buzzfeed viz and article accompanying this analyzed Trump’s retweet activity since his announcement of running for president.  The final viz ended up being what I would best describe as bubble charts of the top users he retweeted during this time:

What’s interesting is that the actual article goes into significant depth on how their team systematically reviewed the tweets.  It’a a bummer that the additional analysis done couldn’t be synthesized into visual form.

My take on the makeover this week was driven completely by the underlying data available.  The TDE provided had the following fields:

Two things stuck out to me with the data.  First: the username being retweeted wasn’t included; second: the entire tweet text was included.  Having full text available just screams for some sort of text analysis.  I got committed at that point to doing something with the text.

My initial idea was to do some sort of sentiment analysis.  Recently I had installed both R-Studio and Python on my PC to try integration with Tableau.  I’d had success with R-Studio (mind you after watching a brief YouTube video), but I hadn’t gotten Python to cooperate (my effort in assisting in this cooperation = 2 out of 10).  I figured since I had both available maybe I should make an attempt.  After marinating on the concept I didn’t feel comfortable adding more sentiment analysis to the fire of American politics.  (On a personal note: I have been politically checked out since the early primaries.)

So instead of doing sentiment analysis, I decided to turn the data more into text mining for mentions and hashtags.  I had done some fiddling with the time component and was digging how the cycle plot/horizon chart were playing out visually.  So it seemed natural to continue on a progression of getting more details out of the bars and times of day.

Note on the time: time is graciously parsed into correct format with the data.  In looking at the original time, I am under the impression it was represented in GMT (+0000).  To adjust for this, I added -5 hours to all of the parsed dates to put it in EST aka Trump time.

So back to text mining.  Post #data16 conference, a colleague of mine was recounting how to use regex to scrub through text.  I walked away from his talk thinking I need to use that next time I have the opportunity.  And what I love about it: NATIVE FUNCTION TO TABLEAU!!  So this was making me sing.  Now I don’t know a ton about regex (lots of notation I have yet to memorize), so I decided to quickly google my way to getting the user handles and hashtags.  These handy results really made this analysis zip along: regexr & regex+twitter.

Everything else came to life pretty quickly.  I knew I wanted to include at least one or two tweets to read through, but I wanted to keep it curated.  I think this was accomplished well and I spent a good deal of time trying out different time combinations just to see what would bubble to the surface.

A final note on aesthetics this week: I’m reading Alberto Cairo’s The Functional Art, and as I mentioned in an earlier post, I’m also participating in his MOOC that starts tomorrow.  I am only 4 chapters in, but Alberto has me taking a few things to heart.  I don’t think it is by coincidence that I decided to push the beauty side of things.  I always strive for elegance, but I strive for it through white space and keeping that “data ink ratio” at a certain point.  But I’m not blind to the different visualizations out there that attract people.  So for once I used a non-white background (yay!).  And I also went for a font that’s well outside of the look of my usual vizzing font.

More than focusing on aesthetics, is of course the function of the viz.  I tried to spend more time thinking about the audience and what they were going to “get” out of it.  I hope that the final product is less of a “visual aid” to my analysis and more of an interactive tool to explore the tweets of the soon to be President.

Full viz available on my Tableau public page.

#DataResolutions – More than a hashtag

This gem of a blog post appeared on Tableau Public and within my twitter feed earlier this week asking what my #DataResolutions are.  Here was my lofty response:

 


Sound like a ton of goals and setting myself up for failure?  Think again.  At the heart of most of my work with data visualization are 2 concepts: growth and community.  I’ve had the amazing opportunity to co-lead and grow the Phoenix Tableau user group over the past 5+ months.  And one thing I’ve learned along the way: to be a good leader you have to show up.  Regardless of skill level, technical background, formal education, we’re all bound together by our passion for data visualization and data analytics.

To ensure that I communicate my passion, I feel that it’s critical to demonstrate it.  It grows me as a person and stretches me outside of my comfort zone to an extreme.  And it opens up opportunities and doors for me to grow in ways I didn’t know existed.  A great example of this is enrolling in Alberto Cairo and Healther Krause’s MOOC Data Exploration and Storytelling: Finding Stories in Data with Exploratory Analysis and Visualization.  I see drama and story telling as a development area for me personally.  Quite often I think I get very wrapped up in the development of data stories that the final product is a single component being used as my own visual aid.  I’d like the learn how to communicate the entire process within a visualization and guide a reader through.  I also want to be surrounded by 4k peers who have their own passion and opinions.

Moving on to collaborations.  There are 2 collaborations I mentioned above, one surrounding data+women and the other is data mashup.  My intention behind developing out these is to once again grow out of my comfort zone.  Data Mashup is also a great way for me to enforce accountability to Makeover Monday and to develop out my visualization interpretation skills.  The data+women project is still in an incubation phase, but my goal there is to spread some social good.  In our very cerebral world, sometimes it takes a jolt from someone new to be used as fuel for validation and action.  I’m hoping to create some of this magic and get some of the goodness of it from others.

More to come, but one thing is for sure: I can’t fail if I don’t write down what I want to achieve.  The same is true for achievement, unless it’s written down, how can I measure?