Animated GIF images are an inescapable part of our online experiences, and more and more tools make it easier and easier to include them in our written communication. Sometimes this can be a good thing.
Sometimes. Not all times.
Before you include a GIF image – especially one that flashes or blinks or strobes – in your next chat message, please pause to consider the impact that this may have on the recipients.
Before you include a GIF, ask yourself:
How many people will see this – is this a 1:1 chat, or is it a large group?
Does the GIF include flashing, strobing, blinking, fast-moving images?
Do any of the people who will see the GIF have photosensitive conditions like seizures or migraines?
Are you sure?
Does the software tool you’re using allow users to disable GIF autoplay?
So if you are on a Teams meeting with 100 people and you post a GIF, everyone sees it. And odds are, that GIF you posted will mean that someone on the call will need to leave the call, or close the chat, or maybe end up in a dark room in pain for the rest of the day.
The last post was about the dangers inherent in measuring the wrong thing – choosing a metric that doesn’t truly represent the business outcome you think it does. This post is about different problems – the problems that come up when you don’t truly know the ins and outs of the the data itself… but you think you do.
This is another “inspired by Twitter” post – it is specifically inspired by this tweet (and corresponding blog post) from Caitlin Hudon. It’s worth reading her blog post before continuing with this one – you go do that now, and I’ll wait.
The scariest ghost stories I know take place when the history of data — how it’s collected, how it’s used, and what it’s meant to represent — becomes an oral one, passed down like campfire stories from one generation of analysts to another. 👻https://t.co/nTQNSmk3oD
Caitlin’s ghost story reminded me of a scary story of my own, back from the days before I specialized in data and BI. Back in the days when I was a werewolf hunter. True story.
Around 15 years ago I was a consultant, working on a project with a company that made point-of-sale hardware and software for the food service industry. I was helping them build a hosted solution for above-store reporting, so customers who had 20 Burger Hut or 100 McTaco restaurants could get insights and analytics from all of them, all in one place. This sounds pretty simple in 2020, but in 2005 it was an exciting first-to-market offering, and a lot of the underlying platform technologies that we can take for granted today simply didn’t exist. In the end, we built a data movement service that took files produced by the in-store back-of-house system and uploaded them over a shared dial-up connection from each restaurant to the data center where they could get processed and warehoused.
The analytics system supported a range of different POS systems, each of which produced files in different formats. This was a fun technical challenge for the team, but it was a challenge we expected. What we didn’t expect was the undocumented failure behavior of one of these systems. Without going into too much detail, this POS system would occasionally produce output files that were incomplete, but which did not indicate failure or violate any documented success criteria.
To make a long story short, because we learned about the complexities of this system very late in the game, we had some very unhappy customers and some very long nights. During a retrospective we engaged with of the project sponsors for the analytics solution because he had – years earlier – worked with the development group that built this POS system. (For the purposes of this story I will call the system “Steve” because I need a proper noun for his quote.)
The project sponsor reviewed all we’d done from a reliability perspective – all the validation, all the error handling, all the logging. He looked at this, then he looked at the project team and he said:
You guys planned for wolves. ‘Steve’ is werewolves.
Even after all these years, I still remember the deadpan delivery for this line. And it was so true.
We’d gone in thinking we were prepared for all of the usual problems – and we were. But we weren’t prepared for the horrifying reality of the data problems that were lying in wait. We weren’t prepared for werewolves.
Digging through my email from those days, I found a document I’d sent to this project sponsor, planning for some follow-up efforts, and was reminded that for the rest of the projects I did for this client, “werewolves” became part of the team vocabulary.
What’s the moral of this story? Back in 2008 I thought the moral was to test early and often. Although this is still true, I now believe that what Past Matthew really needed was a data catalog or data dictionary with information that clearly said DANGER: WEREWOLVES in big red letters.
This line from Caitlin’s blog post could not be more wise, or more true:
The best defense I’ve found against relying on an oral history is creating a written one.
The thing that ended up saving us back in 2005 was knowing someone who knew something – we happened to have a project stakeholder who had insider knowledge about a key data source and its undocumented behavior. What could have better? Some actual <<expletive>> documentation.
Even in 2020, and even in mature enterprise organizations, having a reliable data catalog or data dictionary that is available to the people who could get value from it is still the exception, not the rule. Business-critical data sources and processes rely on tribal knowledge, time after time and team after team.
I won’t try to supplement or repeat the best practices in Caitlin’s post – they’re all important and they’re all good and I could not agree more with her guidance. (If you skipped reading her post earlier, this is the perfect time for you to go read it.) I will, however, supplement her wisdom with one of my favorite posts from the Informatica blog, from back in 2017.
I’m sharing this second link because some people will read Caitlin’s story and dismiss it because she talks about using Google Sheets. Some people will say “that’s not an enterprise data catalog.” Don’t be those people.
Regardless of the tools you’re using, and regardless of the scope of the data you’re documenting, some things remain universally true:
Tribal knowledge can’t be relied upon at any meaningful scale or across any meaningful timeline
Not all data is created equal – catalog and document the important things first, and don’t try to boil the ocean
The catalog needs to be known by and accessible to the people who need to use the data it described
Someone needs to own the catalog and keep it current – if its content is outdated or inaccurate, people won’t trust it, and if they don’t trust it they won’t use it
Sooner or later you’ll run into werewolves of your own, and unless you’re prepared in advance the werewolves will eat you
When I started to share this story I figured I would find a place to fit in a “unless you’re careful, your data will turn into a house when the moon is full” joke without forcing it too much, but sadly this was not the case. Still – who doesn’t love a good data werehouse joke?
Maybe next time…
 Or whatever it is you’re tracking. You do you.
 Apparently I started this post last Halloween. Have I mentioned that the past months have been busy?
 Or Pizza Bell… you get the idea.
 Each restaurant typically had a single “data” phone line that used the same modem for processing credit card transactions. I swear I’m not making this up.
 Or at least short-ish. Brevity is not my forte.
At this point almost 12 years later, the problem itself is no longer relevant. While digging around on an unrelated task today I found this chart, which is. You should look at it now.
The scope of the problem is measured by the blue series on this chart. You should look at it again. Just look at it!
Both the blue series and the yellow series are net satisfaction (NSAT) scores. There’s a lot of context behind the numbers, but for the purposes of this post let’s say that on this scale anything over 150 is “time for a team party and a big round of bonuses” and anything under 100 is “you probably won’t include this job on your resume, and you’re thinking about this a lot because you’ve been sending your resume out a lot this week.”
There are two stories that leap out from this chart.
The first story is pretty obvious: something changed in FY06. That change had a dramatically negative impact on the blue series, and a small (and probably acceptable) negative impact on the yellow series.
The second story may not be as obvious, but it’s vitally important: the yellow series was being used to track the impact of the change. Something changed in FY06, and the people that made the change were measuring its impact.
They were tracking the wrong thing.
Until I joined the team, no one had a chart like this. It wasn’t that the blue series wasn’t being tracked – it was. It just wasn’t recognized as the true success metric until things were well into resume-polishing territory.
The lesson here isn’t that someone made a bad decision and didn’t realize it. The lesson is that sometimes the metric you’re tracking doesn’t mean what you think it means.
As is the case in my personal story, the problem is usually quite obvious in retrospect, but it’s also usually quite opaque in the moment. Although most large companies have a culture of measurement, it’s more rare to see a culture that consistently questions those measurements. Although this approach may not work for everyone, I recommend using this three-year-old approach to defining your most important metrics.
I don’t mean that the approach is three years old. I mean that you should approach the problem like a three-year-old would: by repeatedly asking “why?”
When someone suggests measuring using a given metric, ask why. “Why do you think this is the right way to measure this thing?” When you get an answer, ask why again. “Why do you believe that?” Keep asking why – the more important the metric, the more times you should ask why and expect to get a well-considered answer. And if the answers aren’t forthcoming or aren’t credible… that is an important point to recognize before you’ve invested too much in a project or solution, isn’t it?
 Which is why I’m not going to talk about the problem or the solution here, except in the most general, hand-wavey terms.
 I should also point out that I wasn’t the person who figured out that we’d been measuring the wrong thing. The person who hired me had figured it out, which was why I was hired. Credit where credit is due.
 This someone may or may not be you. But definitely question yourself in the same way, because it’s always hardest to see your own biases.
 The person who introduced me to this idea called it “five whys” but I wouldn’t read too much into that specific number. He also never explained what he meant by this, and for months I thought he was referring to some five word phrase where each word started with the letter Y. True story.
I live 2.6 miles (4.2 km) from the epicenter of the coronavirus outbreak in Washington state. You know, the nursing home that’s been in the news, where over 10 people have died, and dozens more are infected.
As you can imagine, this has started me thinking about self-service BI.
When the news started to come out covering the US outbreak, there was something I immediately noticed: authoritative information was very difficult to find. Here’s a quote from that last link.
This escalation “raises our level of concern about the immediate threat of COVID-19 for certain communities,” Dr. Nancy Messonnier, director of the CDC’s National Center for Immunization and Respiratory Diseases, said in the briefing. Still, the risk to the general public not in these areas is considered to be low, she said.
That’s great, but what about the general public in these areas?
What about me and my family?
When most of what I saw on Twitter was people making jokes about Jira tickets, I was trying to figure out what was going on, and what I needed to do. What actions should I take to stay safe? What actions were unnecessary or unhelpful?
Before I could answer these questions, I needed to find sources of information. This was surprisingly difficult.
Specifically, I needed to find sources of information that I could trust. There was already a surge in misinformation, some of it presumably well-intentioned, and some from deliberately malicious actors. I needed to explore, validate, confirm, cross-check, act, and repeat. And I was doing this while everyone around me seemed to be treating the emerging pandemic as a joke or a curiosity.
I did this work and made my decisions because I was a highly-motivated stakeholder, while others in otherwise similar positions were farther away from the problem, and were naturally less motivated at the time.
And this is what got me thinking about self-service BI.
In many organizations, self-service BI tools like Power BI will spread virally. A highly-motivated business user will find a tool, find some data, explore, iterate, refine, and repeat. They will work with untrusted – and sometimes untrustworthy – data sources to find the information they need to use, and to make the decisions they need to make. And they do it before people in similar positions are motivated enough to act.
But before long, scraping together whatever data is available isn’t enough anymore. As the number of users relying on the insights being produced increases – even if the insights are being produced by a self-service BI solution – the need for trusted data increases as well.
Where an individual might successfully use disparate unmanaged sources successfully, a population needs a trusted source of truth.
At some point a central authority needs to step up, to make available the data that can serve as that single source of truth. This is easier said than done, but it must be done. And this isn’t even the hard part.
The hard part is getting everyone to stop using the unofficial and untrusted sources that they’ve been using to make decisions, and to use the trusted source instead. This is difficult because these users are invested in their current sources, and believe that they are good enough. They may not be ideal, but they work, right? They got me this far, so why should I have to stop using them just because someone says so?
This brings me back to those malicious actors mentioned earlier. Why would someone deliberately share false information about public health issues when lies could potentially cost people their lives? They would do it when the lies would help forward an agenda they value more than they value other people’s lives.
In most business situations, lives aren’t at stake, but people still have their own agendas. I’ve often seen situations where the lack of a single source of truth allows stakeholders to present their own numbers, skewed to make their efforts look more successful than they actually are. Some people don’t want to have to rebuild their reports – but some people want to use falsified numbers so they can get a promotion, or a bonus, or a raise.
Regardless of the reason for using untrusted sources, their use is damaging and should be reduced and eliminated. This is true of business data and analytics, and it is true of the current global health crisis. In both arenas, let’s all be part of the solution, not part of the problem.
 Before you ask, yes, my family and I are healthy and well. I’ve been working from home for over a week now, which is a nice silver lining; I have a small but comfortable home office, and can avoid the obnoxious Seattle-area commute.
 This article is the best single source I know of. It’s not authoritative source for the subject, but it is aggregating and citing authoritative sources and presenting their information in a form closer to the solution domain than to the problem domain.
 This is why I’ve been practicing social media distancing.
 This is the where the “personal pandemic parable” part of the blog post ends. From here on it’s all about SSBI. If you’re actually curious, I erred on the side of caution and started working from home and avoiding crowds before it was recommended or mandated. I still don’t know if all of the actions I’ve taken were necessary, but I’m glad I took them and I hope you all stay safe as well.
 As anyone who has ever implemented a single source of truth for any non-trivial data domain can attest.
 You can enjoy the lyrics even if Kreator’s awesome music isn’t to your taste.
This is my personal blog – I try to be consistently explicit in reminding all y’all about this when I post about topics that are related to my day job as a program manager on the Power BI CAT team. This is one of those posts.
If I had to oversimplify what I do at work, I’d say that I represent the voice of enterprise Power BI customers. I work with key stakeholders from some of the largest companies in the world, and ensure that their needs are well-represented in the Power BI planning and prioritization process, and that we deliver the capabilities that these enterprise customers need.
Looking behind this somewhat grandiose summary, a lot of what I do is tell stories. Not my own stories, mind you – I tell the customers’ stories.
On an ongoing basis, I ask customers to tell me their stories, and I help them along by asking these questions:
What goals are you working to achieve?
How are you using Power BI to achieve these goals?
Where does Power BI make it hard for you to do what you need to do?
When they’re done, I have a pretty good idea what’s going on, and do a bunch of work to make sure that all of these stories are heard by the folks responsible for shipping the features that will make these customers more successful.
Most of the time these stories are never shared outside the Power BI team, but on occasion there are customers who want to share their stories more broadly. My amazing teammate Lauren has been doing the heavy lifting in getting them ready to publish for the world to see, and yesterday the fourth story from her efforts has been published.
Update: Apparently the Cerner story was getting published while I was writing this post. Added to the list above.
I know that some people will look at these stories and discount them as marketing – there’s not a lot I can do to change that – but these are real stories that showcase how real customers are overcoming real challenges using Power BI and Azure. Being able to share these stories with the world is very exciting for me, because it’s an insight into the amazing work that these customers are doing, and how they’re using Power BI and Azure services to improve their businesses and to make people’s lives better. They’re demonstrating the art of the possible in a way that is concrete and real.
And for each public story, there are scores of stories that you’ll probably never hear. But the Power BI team is listening, and as long as they keep listening, I’ll keep helping the customers tell their stories…
 This makes me sound much more important than I actually am. I should ask for a raise.
 Seriously, if I do this, shouldn’t I be be a VP or Partner or something?
 Mainly boring work that is not otherwise mentioned here.
 This is just one more reason why having a diverse team is so important – this is work that would be brutally difficult for me, and she makes it look so easy!
You’ve probably seen it more than once. But you’ve only seen it an order of magnitude less than I’ve thought it, because if I posted it multiple times each day I would be part of the problem. Typically when I tweet a variation on this theme, it’s because someone has been lazy, and has stolen my time, and the time of others.
Consider these scenarios.
Have you ever forwarded a lengthy email thread to a group, with “FYI” or “this is interesting” as your only addition, without adding a summary of the thread? If you have, then each person who receives your mail needs to read through the thread to understand what is important for them.
Have you ever sent an email with a meaningless and non-descriptive subject line that’s unrelated to the message content? If you have, then each person who receives your mail needs to read through the message to understand your intent and to prioritize any follow-up actions.
Have you ever sent an email that includes a document or link to a valuable resource, but you don’t include any relevant search terms in the subject or body? If you have, then when your recipients need to find and use that link or document they will not be able to easily search to locate it. You’ve forced each recipient to implement their own discovery process.
Have you ever sent an email that references a shared resource like a web site or an Excel workbook on a SharePoint site, and didn’t include a link to that resource? If you have, then each recipient has needed to manually locate the shared resource – you have wasted the time of every person who received the mail. And to make matters worse, your laziness has introduced ambiguity, and increased the likelihood that people will end up using the wrong resources.
Have you ever sent an email that includes a general description of a specific problem for which you are requesting assistance? If you have, then you are offloading the responsibility for identifying the problem cause to the recipients – and this often means that multiple people are duplicating the effort that you should have put in proactively.
Have you ever sent an email that includes an acronym that you have not explicitly defined? If you have, then you’re again forcing the recipients to do the heavy lifting to figure out what you mean, when you could have saved them this effort by putting in a little effort on your own…
Your periodic reminder that lazy communication is theft.
If you use an acronym without defining it, you are part of the problem, not part of the solution.
Have you ever sent an email related to an event – a technical conference “call for content” announcement, for example – and you haven’t bothered to include the event dates in the mail? If you have, then you have forced every recipient to look up this information before they can act on your mail.
Have you ever asked someone for help solving a technical problem or error, but you haven’t clearly articulated the scope of the problem? Maybe you couldn’t even be bothered to include key details like error messages? If this is the case, you’ve very clearly told the people who could be helping you that you do not value their time, and that you are choosing to make your problem their problem.
Of course, the impact of this laziness isn’t limited to email – email just happens to be where I personally experience it the most. My most recent periodic reminder came when someone on Twitter asked for help, and included an undefined acronym. By the time I noticed the conversation, three or four members of the Power BI team had replied, either asking for clarification or proposing possible answers if the acronym meant what they thought it meant. (I did not join that conversation.)
The common theme of these scenarios – and many more like them – is that a small effort to be mindful in your communication can help reduce the cost on the people with whom you are communicating. If you choose not to put in that effort, your lazy communication is stealing time and productivity from your teammates, peers, and colleagues.
Is that what you want?
Each of these bad habits is easily and simply corrected. In most situations it only takes a moment to clarify the meaning and context of your message, to add a subject, or summary, or link. A moment of your time can save many minutes wasted by every person who receives your communication.
Will you choose to spend that time, and to respect the time of others?
A few weeks back MVP Paul Turley blogged on Power Query performance and diagnostics. It was a good, useful post, but I wasn’t really the target audience and I probably would have forgotten about it if it weren’t for one thing.
Look at it.
Look at it again, and pause to thoughtfully consider its elegance and beauty.
In the time since Paul shared this post, I’ve been involved in any number of conversations where customer stakeholders had questions about Power BI application performance. This type of conversation isn’t particularly new, but now I’ve started using this diagram as a point of reference.
The results have been very positive. Although nothing in the diagram is new or particularly interesting on its own, having this simple visual reference for the components that make up the canonical end-to-end flow in a Power BI application have made my conversations more useful and productive. Less time is required to get all stakeholders to a point of shared understanding – more time can be devoted to identifying and solving the problem.
I don’t know if Paul truly appreciates the beauty of what he’s created. But I do. And you should too.
 In case you’ve been wondering why my blog and YouTube output has dried up this month, it’s because real life has been kicking my ass. I think I can finally see the light at the end of the tunnel, so hopefully we’ll be back with regular content before too long. Hopefully.