Category: Alteryx

  • The Shape of Shakespeare’s Sonnets | #IronViz Books & Literature

    The Shape of Shakespeare’s Sonnets | #IronViz Books & Literature

    Jump directly to the viz

    If it’s springtime that can only mean that it’s time to begin the feeder rounds for Tableau’s Iron Viz contest.  The kick-off global theme for the first feeder is books & literature, a massive topic with lots of room for interpretation.  So without further delay, I’m excited to share my submission: The Shape of Shakespeare’s Sonnets.

    The genesis of the idea

    The idea came after a rocky start and abandoned initial idea.  My initial idea was to approach the topic with a meta-analysis or focus on the overall topic (‘books’) and to avoid focusing on a single book.  I found a wonderful list of NYT non-fiction best sellers lists, but was uninspired after spending a significant amount of time consuming and prepping the data.  So I switched mid-stream and decided to keep the parameters of a meta-analysis, but change to a body of literature that a meta-analysis could be performed on.  I landed on Shakespeare’s Sonnets for several reasons:

    • Rigid structure – great for identifying patterns
    • 154 divides evenly for small multiples (11×14 grid)
    • Concepts of rhyme and sentiment could easily be analyzed
    • More passionate subject: themes of love, death, wanting, beauty, time
    • Open source text, should be easy to find
    • Focus on my strengths: data density, abstract design, minimalism
    Getting Started

    I wasn’t disappointed with my google search, it took me about 5 minutes to locate a fantastic CSV containing all of the Sonnets (and more) in a nice relational format.  There were some criteria necessary for the data set to be usable – namely each line of the sonnet needed to be a record.  After that point, I knew I could explode and reshape the data as necessary to get to a final analysis.

    Prepping & Analyzing the Data

    The strong structuring of the sonnets meant that counting things like number of characters and number of words would yield interesting results.  And that was the first data preparation moment.  Using Alteryx I expanded out line into columns for individual words.  Those were then transposed back into rows and affixed to the original data set.  Why?  This would allow for quick character counting in Tableau, repeated dimensions (like line, sonnet number), and dimensions for the word number in each line.

    I also extracted out all the unique words, counted their frequency, and exported them to a CSV for sentiment analysis.  Sentiment analysis is a way to score words/phrases/text to determine the intention/sentiment/attitude of the words.  For the sake of this analysis, I chose to go with a negative/positive scoring system.  Using Python and the nltk package, each word’s score was processed (with VADER).  VADER is optimized for social media, but I found the results fit well with the words within the sonnets.

    The same process was completed for each sonnet line to get a more aggregated/overall sentiment score.  Again, Alteryx was the key to extracting the data in the format I needed to quickly run it through a quick Python script.

    Here’s the entire Alteryx workflow for the project:

    The major components
    • Start with original data set (poem_lines.csv)
      • filter to Sonnets
      • Text to column for line rows
      • Isolate words, aggregate and export to new CSV (sonnetwords.csv)
      • Isolate lines, export to new CSV (sonnetlines)
      • Join swordscore to transformed data set
      • Join slinescore to transformed data set
      • Export as XLSX for Tableau consumption (sonnets2.xlsx)
    Python snippet
    make sure you download nltk leixcons after importing; thanks to Brit Cava for code inspiration

    The Python code is heavily inspired by a blog post from Brit Cava in December 2016.  Blog posts like hers are critically important, they help enable others within the community do deeper analysis and build new skills.

    Bringing it all together

    Part of my vision was the provoke patterns, have a highly dense data display, and use an 11×14 grid.  My first iteration actually started with mini bar charts for number of characters in each word.  The visual this produced was what ultimately led to the path of including word sentiment.

    height = word length, bars are in word order

    This eventually changed to circles, which led to the progression of adding a bar to represent the word count of each individual line.  The size of the words at this point became somewhat of a disruption on the micro-scale, so sentiment was distilled down into 3 colors: negative, neutral, or positive.  The sentiment of the entire line instead has a gradient spectrum (same color endpoints for negative/positive).  Sentiment score for each word was reserved for a viz in tool tip – which provides inspiration for the name of the project.

    Sonnet 72, line 2

    Each component is easy to see and repeated in macro format at the bottom – it also gives the end user an easy way to read each Sonnet from start to finish.

    designed to show the progression of abstraction

    And there you have it – a grand scale visualization showing the sentiment behind all 154 of Shakespeare’s Sonnets.  Spend some time reciting poetry, exploring the patterns, and finding the meaning behind this famous body of literature.

    Closing words: thank you to Luke Stanke for being a constant source of motivation, feedback, and friendship.  And to Josh Jackson for helping me battle through the creative process.

    The Shape of Shakespeare’s Sonnets

    click to interact at Tableau Public

     

     

     

  • Dying Out, Bee Colony Loss in US | #MakeoverMonday Week 18

    Dying Out, Bee Colony Loss in US | #MakeoverMonday Week 18

    Week 18 of Makeover Monday tackles the issue of the declining bee population in the United States.  Data was provided by BeeInformed and the re-visualization is in conjunction with Viz for Social Good.  Unfamiliar with a few of the terms – check out their websites to learn what Makeover Monday and Viz for Social Good are all about.

    The original visualization is a filled map showing the annual percentage of bee colony loss for the United States.  Each state (and DC) are filled with a gradient color from blue (low loss) to orange (high loss).  The accompanying data set for the makeover included historical data back to 2010/11.

    Original visualization | Bee Informed

    Looking at the data my goal was to capitalize on some of the same concepts presented in the original visualization, but add more analytical value by including the dimension of time.  The key component I was aiming to understand was that there’s annual colony loss, but how “bad” is the loss.  The critical “compared to what” question.

    My Requirements
    • Keep the map theme – good way to demonstrate data
    • Add in time dimension
    • Keep color as an indicator of performance (good/bad indicator) – clarify how color was used
    • Provide more context for audience
    • Switch to tile map for skill building
    • Key question: where are bees struggling to survive
    • Secondary question: which states (if any) have improved

    Building out the tile map and beginning to add the time series was pretty simple.  I downloaded the hexmap template provided by Matt Chambers.  I did a bit of tweaking to the file to change where Washington D.C. was located.  Original file has it off to the side, I decided to place it in-line with the continental US to clean up the final look.

    Well documented through the Tableau Community – the next step was to take the two data sources (bees + map) and blend them together.  Part of that process includes setting up the relationship between the two data sources and then adding them both to a single view:

    setting up the relationship between data sources
    visual cues – MM18 extract is primary data source, hexmap secondary

    To change to a line chart and start down the path of showing a metric (in our case annual bee colony loss) over time – a few minor tweaks:

    • Column/Row become discrete (why: so we can have continuous axes inside of our rows & columns)
    • Add on continuous fields for time & metric

    This to me was a big improvement over the original visualization (because of the addition of time).  But it still needs a bit of work to clearly explain where good and bad are.  This brought me back to a concept I worked on during Week 17 – using the background of a chart as an indicator of performance.

    forest land consumption

    In week 17 I looked at the annual consumption of carbon, forest land, and crop land by the top 10 world economies compared to the global footprint.  Background color indicates whether the country’s footprint is above/below the current global metric.  I particularly appreciate this view because you get the benefit of the aggregate and immediate feedback with the nice detail of trend.

    This led me down the path of ranking each of the states (plus DC) to determine which state had experienced the most colony loss between the years of the data (2010/11 and 2016/17).  You’d get a sense of where the biggest issues were and where hope is sprouting.

    To accomplish this I ended up using Alteryx to create a rank.  The big driver behind creating a rank pre-visualization was to replicate the same rank number across the years.  The background color for the final visualization is made by creating constant value bar charts for each year.  So having a constant number for each state based off of a calculation from 2010 vs. 2016 would be much easier to develop with.

    notice the bar chart marks card; Record ID is the rank

     

    Here’s my final Alteryx workflow.  Essentially I took the primary data set, split it up into 2010 and 2016, joined it back, calculated the difference between them, corrected for a few missing data points, sorted them from greatest decline in bee colony loss to smallest, applied a rank, joined back all the data, and then exported it as a .hyper file.

    definitely a quick & dirty workflow

    This workflow developed in less than 10 minutes eliminated the need for me to do at least one table calculation and brought me closer to my overall vision quickly and painlessly.

    Final touches were to be a little descriptive to eliminate the need for a color legend and to provide a first-time reader areas to focus on.  And picking the right color palette and title.  Color always leads my design – so I settled on the gold early on, but it took a few iterations to evoke the feeling of “dying out” from the color range.

    tones of brown to keep theme of loss, gold indicates more hope

    And here’s the final visualization again, with link to interactive version in Tableau Public.

    click to interact on Tableau Public
  • Azure + Tableau Server = Flex

    Azure + Tableau Server = Flex

    I’m affectionately calling this post Azure + Tableau Server = Flex for two reasons.  First – are you a desktop user that has always wanted to extend your skills in Tableau as a platform?  Or perhaps you’re someone who is just inherently curious and gains confidence by learning and doing (I fall into this camp).  Well then this is the blog post for you.

    Let me back up a bit.  I am very fortunate to spend a majority of my working time (and an amount of my free time!) advocating for visual analytics and developing data visualizations to support the value it brings.  That means doing things like speaking with end users, developing requirements, partnering with application and database owners/administrators, identifying and documenting important metrics, and finally (admittedly one of the more enjoyable components) partnering with the end user on the build out and functionality of what they’re looking for.  A very iterative process to get to results that have a fair amount of communication and problem solving sprinkled in to pure development time – a lucky job.  The context here is this: as soon as you start enabling people to harness the power of data visualization and visual analytics the immediate next conversation becomes: how can I share this with the world (or ‘my organization’).  Aha!  We’ve just stepped into the world of Tableau Server.

    Tableau Server or Tableau Online bring the capabilities to share the visualizations that you’re making with everyone around you.  It does exactly what you want it to do: via a URL share interactive and data rich displays.  Just the thought of it gets me misty-eyed.  But, as with any excellent technology tool it comes with the responsibility of implementation, maintenance, security, cost, and ultimately a lot of planning.  And this is where the desktop developer can hit a wall in taking things to that next level.  When you’re working with IT folks or someone who may have done something like this in the past you’ll be hit with a question wall that runs the entire length of every potential ‘trap’ or ‘gotcha’ moment you’re likely to experience with a sharing platform.  And more than that – you’re tasking with knowing the answers immediately.  Just when you thought getting people to add terms like tooltip, boxplot, and dot plot was exciting they start using words like performance, permissions, and cluster.

    So what do you do?  You start reading through administration guides, beefing up your knowledge on the platform, and most likely extending your initial publisher perspective of Tableau Server to the world of sever administrator or site administrator.  But along the way you may get this feeling – I certainly have – I know how to talk about it, but I’ve never touched it.  This is all theoretical – I’ve built out an imaginary instance in my mind a million times, but I’ve never clicked the buttons.  It’s the same as talking through the process of baking and decorating a wedding cake and actually doing it.  And really if we thought about it: you’d be much more likely to trust someone who can say “yeah I’ve baked wedding cakes and served them” opposed to someone who says “I’ve read every article and recipe and how-to in the world on baking wedding cakes.”

    Finally we’re getting to the point and this is where Azure comes into play.  Instead of stopping your imaginary implementation process because you don’t have hardware or authority or money to test out an implementation and actually UNBOX the server – instead use Azure and finish it out.  Build the box.

    What is Azure?  It’s Microsoft’s extremely deep and rich platform for a wide variety of services in the cloud.  Why should you care?  It gives you the ability to deploy a Tableau Server test environment through a website, oh, and they give you money to get started.  Now I’ll say this right away: Azure isn’t the only one.  There’s also Amazon’s AWS.  I have accounts with both – I’ve used them both.  They are both rich and deep.  I don’t have a preference.  For the sake of this post – Azure was attractive because you get free credits and it’s the tool I used for my last sandbox adventure.

    So it’s really easy to get started with Azure.  You can head over to their website and sign up for a trial.  At the time of writing they were offering a 30-day free trial and $200 in credits.  This combination is more than enough resources to be able to get started and building your box.  (BTW: nobody has told me to say these things or offered me money for this – I am writing about this because of my own personal interest).

    Now once you get started there are sort of 2 paths you can take.  The first one would be to search the marketplace for Tableau Server.  When you do that there’s literally step by step configuration settings to get to deployment.  You start at the beginning with basic configuration settings and then get all the way to the end.  It’s an easy path to get to the Server, but outside of the scope of where I’m taking this.  Instead we’re going to take the less defined path.

    Why not use the marketplace process?  Well I think the less defined path offers the true experience of start to finish.  Hardware sizing through to software installation and configuration.  By building the machine from scratch (albeit it is a Virtual Machine) it would mimic the entire process more closely than using a wizard.  You have fewer guard rails, more opportunity for exploration, and the responsibility of getting to the finish line correctly is completely within your hands.

    So here’s how I started: I made a new resource, a Windows Server 2012 R2 Datacenter box.  To do that, you go through the marketplace again and choose that as a box type.  It’s probably a near identical process to the marketplace Tableau Server setup.  Make a box, size the box, add optional features, and go.  To bring it closer to home go through the exercise of minimum requirements vs. recommended requirements from Tableau.  For a single-node box you’ll need to figure out the number of CPUs (cores), the amount of RAM (memory), and the disk space you’ll want.  When I did this originally I tried to start cheap.  I looked through the billing costs of the different machines on Azure and started at the minimum.  In retrospect I would say go with something heavier powered.  You’ll always have the option to resize/re-class the hardware – but starting off with a decent amount of power will prevent slow install experience and degraded initial Server performance.

    Once you develop the resource, you literally click a button to boot up the box and get started.  It took probably 15 to 20 minutes for my box to initially be built.  More than I was expecting.

    Everything done up to this point it to get to a place where you have your own Tableau Server that you can do whatever you want with.  You can set up the type of security, configure different components – essentially get down to the nitty gritty of what it would feel like to be a server administrator.

    Your virtual machine should have access to the internet, so next steps are to go to here and download the software.  Here’s a somewhat pro tip.  Consider downloading a previous version of the server software so that you can upgrade and test out what that feels like.  Consider the difference between major and minor releases and the nuance between what the upgrade process will be.  For this adventure I started with 10.0.11 and ended up upgrading to 10.3.1.

    The process of the actual install is on the level of “stupid easy.”  But, you probably wouldn’t feel comfortable saying “stupid easy” unless you’ve actually done it.  There are a few click through windows with clear instructions, but for the most part it installs start to finish without much input from the end user.

    You get to this window here once you’ve finished the install process.

    This is literally the next step and shows the depths to which you can administer the platform from within the server (from a menu/GUI perspective).  Basic things can be tweaked and setup – the type of authentication, SMTP (email) for alerts and subscriptions, and the all important Run As User account.  Reading through the Tableau Server: Everybody’s Install Guide is the best approach to get to this point.  Especially because of something I alluded to earlier: the majority of this is really in the planning of implementation, not the unboxing or build.

    Hopefully by this point the amount of confidence gained in going through this process is going to have you feeling invincible.  You can take your superhero complex to the next level by doing the following tasks:

    Start and Stop the Server via Tabadmin.  This is a great exercise because you’re using the command line utility to interact with the Server.  If you’re not someone who spends a lot of time doing these kinds of tasks it can feel weird.  Going through the act of starting and stopping the server will make you feel much more confident.  My personal experience was also interesting here: I like Tabadmin better than interacting with the basic utilities.  You know exactly what’s going on with Tabadmin.  Here’s the difference between the visual status indicator and what you get from Tabadmin.

    When you right-click and ask for server status, it takes some time to display the status window.  When you’re doing the same thing in Tabadmin, it’s easier to tell that the machine is ‘thinking.’

    Go to the Status section and see what it looks like.  Especially if you’re a power user from the front end (publisher, maybe even site administrator) – seeing the full details of what is in Tableau Server is exciting.

    There are some good details in the Settings area as well.  This is where you can add another site if you want.

    Once you’ve gotten this far in the process – the future is yours.  You can start to publish workbooks and tinker with settings.  The possibilities are really limitless and you will be working toward understanding and feeling what it means to go through each step.  And of course the best part of it all: if you ruin the box, just destroy it and start over!  You’ve officially detached yourself from the chains of responsibility and are freely developing in a sandbox.  It is your chance to get comfortable and do whatever you want.

    I’d even encourage you to interact with the API.  See what you can do with your site.  Even if you use some assisted API process (think Alteryx Output to Tableau Server tool) – you’ll find yourself getting much more savvy at speaking Server and that much closer to owning a deployment in a professional environment.

  • Alteryx Inspire – Day 1

    When I went to the Tableau Conference last year, I felt it was important to spend some time documenting my experience.  Anytime I go to a conference related to my professional aspirations I’m always taken by the wealth of knowledge that’s uncovered.

    The Alteryx Inspire conference is a pared down conference with about 2,000 attendees.  It is comfortably housed in the Aria hotel across 2 spacious and open floors.  There are escalators that split between level 3 and level 1 – there’s nice flow to it and plenty of natural light.  Events take place over three days: Monday, Tuesday, and Wednesday.  Monday is mostly a product training day and the bulk of sessions are the remainder of the week.  Opening keynote is Tuesday.

    This year – being my first – I was extremely fortunate to be able to attend and to do the product training track.  This gives me a firsthand opportunity to see how the product company sells and trains on its tool.  Facilitators are typically great at selling the ‘why’ and ‘how’ behind something.

    Today I sat for a full day going through the introduction to Alteryx Designer.  Not because it was my first time using the tool, but because I believe there’s something very powerful about origin stories.  There’s something you learn in the first 30 minutes that someone who doesn’t have the ‘formal training’ may never pick up.  That happened for me today and it was great to see everything in action.

    As an advocate for data-informed decision making the tool is indispensable.  Just by listening to the 100+ in my classroom, it’s scary to witness firsthand the youth that exists with businesses accessing data.  Yes, there have been really great strides, but so many people are just at the beginning.  I chuckle when I hear the typical ‘Excel’ analogies, but the overwhelming majority are nodding with how much they relate to the joke.

    I’ve always seen Alteryx as a natural companion for a data analyst.  For anyone out there trying to manage data it offers up a solution.  If only for the single act of being able to see a visual output of the thought process and work that went in to producing a data model.  A data model or report that can be shared, saved, printed (please don’t print), and most importantly: be communicated.  For someone doing data prep, blending, gathering – this is how you explain to your boss what you do.  This is the demonstration of what it takes to be the data wrangler.  This is how you share your critical thinking skills.

    I’ve just scratched the surface and have 2 more full days of Alteryx.  One that has already been peppered with amazing collaboration opportunities and sharing of enthusiasm.  The vibe is chill, the people are great, and the mission is achievable.

    Tomorrow is another day and an opportunity to take the building blocks and dream of skyscrapers.