Tag: data

  • Aiming for data-driven?  Don’t forget the people.

    Aiming for data-driven? Don’t forget the people.

    I’ve been in this situation too much recently: I’m having a conversation with someone about the state of analytics and there’s a sudden turn to product feature comparison.  What follows are a series of strengths and weaknesses bulletpoints.  The kicker?  Often the points are focused on what the tool can do, or how the tool fits into a technology stack, or (inevitably) the cost of the tool.

    What’s left out?  How it works with people.  How the tool you’re selecting is going to be affected by existing culture, or more importantly, what your future-state analytics culture will look like.

    Feature comparisons put machines before people.  Feature comparisons assume that the needs of the machines supersede the needs of your team.  Feature comparisons focus on a future state of “will these technologies play nice with each other” and not on “will my team be enabled to access and understand OUR data.”

    A data-driven culture is dependent on the people.  Culture is created or manifested by human behaviors and values.  A data-driven culture is one where people are empowered to use data to answer questions.  Curiosity, exploration, iteration, analytical reasoning, and continuous improvement are all critical.  Doing this all at speed is essential.

    The questions you should be asking are: how do I set up an environment where my team can be curious and motivated to find answers?  What support, education, and resources are needed to ensure they can model the behaviors we want to embody?  How do we build out a strong infrastructure and communication pipeline between hardware managers and data explorers?

    Starting with those questions ensure the vision you’re trying to achieve is the priority, not the minimization of growing pains that must be experienced to reach your goal.  It clarifies how you’ll measure success as well.  Success won’t be meeting a migration timeline or coming in under budget, success will be the transformation of how people engage and react to data daily.  Your ultimate finish line becomes the day where relentless analytical reasoning and action from its outputs are the norm.  Where new, deeper, more complex, and creative questions get asked daily.

  • Don’t be a Bridge, Instead be a Lock

    Lately I’ve spent a lot of time pondering my role in the world of data.  There’s this common phrase that we as data visualization and data analytics (BI) professionals hear all the time (and that I am guilty of saying):

    “I serve as the bridge between business and IT.”

    Well – I’m here to say it’s time to move on.  Why?  Because the bridge analogy is incomplete.  And because it doesn’t accurately represent the way in which we function in this critical role.  At first glance the bridge analogy seams reasonable.  A connector, something that joins two disparate things.  In a very physical way it connects two things that otherwise have an impasse between them.  The business is an island.  IT is an island.  Only a bridge can connect them.  But is this really true?

    Instead of considering the two as separate entities that must be connected, what if we rethought it to be bodies of water at different levels?  They touch each other, they are one.  They are the same type of thing.  The only difference is that they are at different levels, so something like a boat can’t easily go between them.  Is this not what is really happening.  “The business” and “IT” are all really one large organization – not two separate, foreign entities.

    This is where the role of being the Lock comes in.  A lock is the mechanism by which watercraft are raised or lowered between waterways.  And to a large extent it is a better analogy to our roles in data.  We must adapt to the different levels of business and IT.  And more importantly it is our responsibility to form that function – and to get the boat (more specifically “the data”) through from one canal to the other.

    Even exploring what Wikipedia says about a lock – it fits better.

    “Locks are used to make a river more easily navigable, or to allow a canal to cross land that is not level. ”

    “Larger locks allow for a more direct route to be taken” [paraphrased]

    Is this not how we function in our daily roles?  How fitting is it to say this:

    “My role is to make your data more easily navigable.  My goal is to allow data to flow through on your level.  I’m here to allow a more direct route between you and your data.”

    It feels right.  I’m there to help you navigate your data through both IT and business waters.  And it is my privilege and honor to facilitate this.  Let’s drop the bridge analogy and move toward a new paradigm – the world where we are locks, adjusting our levels to fit the needs of both sides.

  • Star Trek The Next Generation: Every Episode (#IronViz 3)

    Star Trek The Next Generation: Every Episode (#IronViz 3)

    It’s that time again – Iron Viz feeder contest!  The third and final round for a chance to battle at conference in a chef coat is upon us.  This round the focus was on anything ‘Silver Screen.’

    With a limitless topic I was certain that I would find myself in a creative rut that would likely result in submitting something at the end of the submission time period (August 13th).  So I am as shocked as anyone else that I have a fully formed submission way before deadline.

    So what’s the topic and what got me unstuck?  Star Trek of course!  The backstory here is amazing – I went to a belated wedding shower for a few friends and they mentioned to me that they were going to the annual Star Trek convention.  And more specifically there was a special celebration occurring – the 30th anniversary of Star Trek: The Next Generation.  Not even up for debate – it just IS the best incarnation of the Star Trek universe.

    So I decided to take a moment to do some research on finding TNG data.  It didn’t take me long to unearth this fantastic data set on GitHub that includes each episode’s script parsed out by character.

    Really inspired by the thought of seeing each word of each episode visualized – I set forth on my mission.  As I got started there was one component that was mission critical: the bold colors present throughout the world of Star Trek.  The bold and moody colors of Star Trek are fantastic – especially paired with a black background.  And working with individual scripts meant that I could use color to accentuate different characters – much like their uniforms do in the episodes.

    The next component that I wanted to invoke on this (again – design focused here) was the electronics and computer interfaces.  I particularly like the rounded edges and strong geometric shapes that are on the computer screens across all iterations of Star Trek.  So that describes most of the design – the choice of colors and how some of the visualizations were setup.

    Now on to the next important component here: analysis.  When you see this visualization you may find yourself realizing that I don’t draw any conclusions.  For this collection of visualizations I am playing the role of curator.  I am developing a visual world for you to interact with, to go deep and wide in your understanding.  I am not attempting to summarize data for you or force conclusions upon you.  I am inviting you to come into the world of Star Trek, unearth who speaks during each episode, find out what that character is saying.  I want there to be an unending number of takeaways and perceptions generated from this.

    And the last part you need to understand is the story telling.  This entire visualization has an untold number of stories in it by virtue of it being a visualization of the entire series.  If you want a meta-story to tell it’s simply this: Star Trek The Next Generation is such a deep and rich world that you should go get lost.  And while you’re on the path of getting lost do me a favor: retain some leadership tidbits from Picard and sprinkle in some logical takeaways from Data.

     

  • Alteryx Inspire – Day 1

    When I went to the Tableau Conference last year, I felt it was important to spend some time documenting my experience.  Anytime I go to a conference related to my professional aspirations I’m always taken by the wealth of knowledge that’s uncovered.

    The Alteryx Inspire conference is a pared down conference with about 2,000 attendees.  It is comfortably housed in the Aria hotel across 2 spacious and open floors.  There are escalators that split between level 3 and level 1 – there’s nice flow to it and plenty of natural light.  Events take place over three days: Monday, Tuesday, and Wednesday.  Monday is mostly a product training day and the bulk of sessions are the remainder of the week.  Opening keynote is Tuesday.

    This year – being my first – I was extremely fortunate to be able to attend and to do the product training track.  This gives me a firsthand opportunity to see how the product company sells and trains on its tool.  Facilitators are typically great at selling the ‘why’ and ‘how’ behind something.

    Today I sat for a full day going through the introduction to Alteryx Designer.  Not because it was my first time using the tool, but because I believe there’s something very powerful about origin stories.  There’s something you learn in the first 30 minutes that someone who doesn’t have the ‘formal training’ may never pick up.  That happened for me today and it was great to see everything in action.

    As an advocate for data-informed decision making the tool is indispensable.  Just by listening to the 100+ in my classroom, it’s scary to witness firsthand the youth that exists with businesses accessing data.  Yes, there have been really great strides, but so many people are just at the beginning.  I chuckle when I hear the typical ‘Excel’ analogies, but the overwhelming majority are nodding with how much they relate to the joke.

    I’ve always seen Alteryx as a natural companion for a data analyst.  For anyone out there trying to manage data it offers up a solution.  If only for the single act of being able to see a visual output of the thought process and work that went in to producing a data model.  A data model or report that can be shared, saved, printed (please don’t print), and most importantly: be communicated.  For someone doing data prep, blending, gathering – this is how you explain to your boss what you do.  This is the demonstration of what it takes to be the data wrangler.  This is how you share your critical thinking skills.

    I’ve just scratched the surface and have 2 more full days of Alteryx.  One that has already been peppered with amazing collaboration opportunities and sharing of enthusiasm.  The vibe is chill, the people are great, and the mission is achievable.

    Tomorrow is another day and an opportunity to take the building blocks and dream of skyscrapers.

  • #MakeoverMonday 11/22/16 – Advanced Logging Edition

    And it’s time – my first ever Makeover Monday.  I’ll admit, I’ve attempted to catch up in the past, but always lost steam.  I think the first data set might be related to sports and I struggle to focus on making something interesting.

    Despite my follies, I’m proud to say that I’ve participated in this week’s Makeover Monday in honor of the special advanced logging that is taking place.  Along with submitting work with the hashtag on twitter, Tableau has asked for us to upload a copy of our log files and workbook.  Contained within the advanced log files are .PNGs that show analysis iterations.

    I went into this Monday with the idea of doing a basic “best practices” version.  One that would mimic something I might create for ultimate exploration and zero data journalism.  I tried to stick with one element that I thought worked well and create the dashboard around it.

    Looking at the other participants, I’m already thinking that my time heatmap could be improved.  My mind was stuck on the day numbers and quarters.  I should have switched to days of the week!  Irrespective – here it is:


    And the GIF:

    makeover-monday-112116

  • Funnel Plots

    As I continue to read through Stephen Few’s “Signal: Understanding What Matters in a World of Noise” there have been some new charts or techniques I’ve come across.

    In an attempt to understand their purpose on a deeper level (and implement them in my professional life), I’m on a mission to recreate them in Tableau.

    First up is a funnel plot. Stephen explains that funnel plots are good when we may need to adjust something before an accurate comparison can be made. In the example video, I adjust how we’re looking at the average profit per item on a given order compared to all of the orders.

    What’s interesting is that in tandem with this exercise, I’m working on an quantitative analysis class for my MBA, so there was quite a bit of intersection. I actually quickly pulled the confidence interval calculation (in particular the standard error equation) from the coursework.

    I find that overall statistical jargon is really sub-par in explaining what is going on, and all the resources I used left me oscillating between “oh I totally get this” and “I have no idea what this means.” To that end, I’m open to any comments or feedback to the verbiage used in the video or expert knowledge you’d like to share.

    Link to full workbook on Tableau public for calculated fields: https://public.tableau.com/views/FunnelPlot10_2_16/Results?:embed=y&:display_count=yes

  • Thoughts on sorting in Tableau

    Now with video 🙂

    Last week I ran into an interesting situation with Tableau.  I wanted to sort dimensions within larger dimensions by a measure.  After that sort, I wanted to encode an additional dimension on color.  Here’s what that would look like using Superstore:

    Sorting Figure 1

    In the view I am looking at sub-categories by each segment, hoping to rank them by the sum of Sales.  I’ve encoded an additional measure (discount) on color.

    This could be a great visualization for understanding demographics within hierarchical type dimensions.  Like say the gender breakdown of who has diabetes at hospital A.

    The issue is, getting to the view shown above is somewhat more complex than I had originally thought.  So let me walk you through what happened.

    1.  Set up my view of Customer Segment, Sub-category, by sum of sales
    2. Created initial rank calculation (index()) and then did the typical sorting.
    3. Table calculation is set as follows (Custom Sort = sort by Sum of Sales, descending order):

    sorting-2

    4. Gets me here:

    sorting-3

    5. Now when I add Discount to color, my whole viz breaks:

    sorting-4

    6. To correct this a few things have to be done.  Initial table calculation needs to be modified to ensure the Discount is taken into consideration, but not considered for the sorting:

    sorting-5

    Super important to notice here that Discount is at the lowest level, but we’re computing at the “Sub-Category” level.  We’re still restarting every “Segment.”

    That gives us this:

    sorting-6

    So now we have the sub-categories correct, we’re looking by region.  But we’re back at that original point of our sort isn’t computed correctly.  This is because we’re sorting by the highest sum of sales for a given discount in a segment.  The first sub-category is found and grouped together.  Then the next sub-category is found with the next highest (not 2nd, just ‘next’) sum of sales for a given discount.  Check it out in comparison to the crosstab, the pink highlights how the index() is working:

    sorting-7

    To fix this last step, we need to let Tableau (the table calculation, the world!) know that we don’t care about discount for the sum of sales.  We only care about sub-category and segment.  To resolve let’s pull in a simple LOD:

    sorting-8

    Now finishing it all up, replace the measure used in the table calculation for sorting:

    sorting-9

    And we’re back at what we originally wanted:

    Sorting Figure 1

    Full workbook with story point walk-through here: https://public.tableau.com/views/LearningMoment-sortingwithtablecalculations/Learningmoment-tcalcs?:embed=y&:display_count=yes

  • Me vs. ETL

    I had a great moment today.  A moment where I felt like I conquered the beast known as “ETL.”  For those of you who don’t know (lucky you!) ETL stands for Extract, Transform, Load.  Like it suggests: it is the term used to move data around and taking elements from “raw” forms and making them usable data.  And it is has been a beast in my life because I feel like it gets used for “gotcha” moments in the data/IT/technical world.

    Let me side-track and say that I have a background in physics and mathematics AKA problem solving.  I didn’t go to school for computer science and I have no formal training in the proper terminology of data management, ETL, data warehousing.  I appreciate those who did, because they have the benefit of being taught the history and terminology that drives a lot of what goes on.  Unfortunately, I never did that.

    I was driven to a job in data because of my ability to problem solve and communicate.  Math and physics are very algorithmic, they require you to assess your surroundings, identify the problem, and work on a solution.  I started out in operations, moved to quality of operations, then process improvement of operations, and finally into what I would consider the “IT world.”  Each layer that I dove into required more and more data access and to get closer and closer.  You can imagine in my quality role, I received flat files from existing front end systems to do analytical work.  Once I got to process improvement, I actually interacting with reporting associates who would pull custom SQL queries for me out of our transaction databases.  Today I work in that same type of “reporting” environment.

    What’s kind of funny is – now that I’m here I realize it’s all the same.  If you’re good at understanding canned flat files and deriving value (typically analytically, but also just in general), you’ll be just as good as you get closer to the data.

    So today I feel like I conquered ETL.  Once in an interview someone asked me how good at ETL I was.  I had to say that I didn’t have much ETL experience, and I think that’s what cost me the job.  I wish I would have known that an ETL tool like SSIS leverages a GUI that hauntingly resembles Microsoft Visio.  I wish I would have known that.  Because you know what I am really good at?  Visualizing processes, making process maps, streamlining processes, communicating and understanding what processes are happening.

    What makes me feel like I conquered ETL?  Today I had a meeting where I used my communication skills to bridge the gap between two groups of people knee deep in the data weeds.  I moved past the barrier of never having developed an SSIS package, making a stored procedure, or developing in SSRS to solve the problem we were facing.  Does knowing SQL make it easier?  Slightly – only in the sense that I can find the fields and tables the data comes from.  But you want to know what the “problem” we were facing was?  It was “are we deriving the right fields to answer the business ask?’  Huh.  SSIS doesn’t know the answer to that.

    So if you’re out there and want to get more into data, and someone says to you “rate your ETL skills.”  Push back and let them know that you are an amazing communicator, that you take time and effort to fundamentally understand data (from both the system capture end and how business users interpret it).  And today I give my ETL skills a 10, because I bridged the gap between everyone while leveraging knowledge I’ve picked up along the way and not being intimidated by ETL.

    Okay – my “yay” moment is done.  Those of you that do ETL, I thank you.  There are a lot of important components that go into making data systems that contain the information people need to do their jobs and for businesses to succeed.  By no means is my aim to discredit you.  I love your ERDs.

     

     

  • Dot Plots

    Today I was reading Stephen Few’s Information Dashboard Design aloud while Josh was doing some fall PC clean up and was on the chapter “An Ideal Library of Graphs.” As Stephen describes it there are several charts or graphs that should make their way onto dashboards and he goes into detail on the reason behind each and how to properly apply them.

    We stopped to discuss the dot plot, because it is one that both of us under utilize.  From my perspective there’s a lot of under utilized space below each dot.  As we were exploring it deeper, I also felt uncomfortable with dots above a certain threshold, because at that point the actual data point was lost.  We decided to open up Tableau and start playing around.  In my mind I was thinking that using a Gantt bar as the point would better represent the data.  (Mind you, Stephen says that you use dot plots when you may not go to zero, otherwise stick to bar graphs).  I thought Gantt bars presented the perfect “happy medium” between the dot plot and the bar chart without causing the end data reader to incorrectly visualize length.

    Below are the four different representations we came up.  All things equal, I think I might be most partial to the ‘Plus Plot.’  I think the general shape tends to draw your eye directly to the center point, and even if they were larger you’d still know where the center was.  I can see how this might get cluttered quickly, or if more encoding were done things could go awry.

    I also found that I liked the Gantt bar, but needed to shorten the bar significantly to eliminate mentally drawing drop lines and visualizing as bars.

  • Getting started in data

    Continuing on the path I started the other day, here’s another blog post that I first wrote internally back in March of this year.

    The question: How do you think you can encourage more women in data?

    I think the toughest part of getting started in data is understanding the field and feeling confident working within it. It took me some time to realize that working with data is a mindset. It isn’t a functional area reserved for someone who has amazing coding or technical skills. It is for people who are detail oriented and want to understand the “why” behind things.

    Finding the reason or purpose behind causes is one of my favorite components of working in data. Within this field, you get the opportunity to do that everyday. I am also extremely passionate about taking the data and creating a visual story with it.

    If you’re interested in data or want to know if it is the right field for you, ask yourself if you like knowing the “why.” I’d also be curious if you are driven to mastery. Within the data field there are daily opportunities to learn and apply that newly found knowledge to your job. If that’s something you like doing, data and data analysis is the right place for you.

    I’ve been in several analyst roles and they have meant different things. The technical skill set and applications have always varied widely, but there have been a few universal constants. Those constants are extensions of your personality – being a great problem solver, a critical thinker, and someone who can both reach solid conclusions AND (more importantly) communicate them. These few skills are what make people great analysts. The tools they use are just there to help them broadcast their findings.

    If you’re on the fence, I would also encourage you to go find some data available in your daily life. There’s literally mountains of data out there. Find something that you’re interested in knowing more about and analyze it. It is a great exercise to start thinking like a data analyst and I guarantee you that you’ll find something new and interesting.

    Once you’ve analyzed one data set, find something totally different. See what commonalities in the structuring of the data you can find. After having received hundreds of different data sets, I can tell you there are several commonalities. The more you practice with any set of data, the better you’re going to get at overall data analysis.

    My last comment would be to remember there’s never one right answer in data analysis. It is a creative and interpretive field. Your interpretations should be based on the truth of the data, but realize once you’re in the analysis driver’s seat it’s up to you to showcase what’s important.