You can’t avoid problems you can’t see

The last post was about the dangers inherent in measuring the wrong thing – choosing a metric that doesn’t truly represent the business outcome[1] you think it does. This post is about different problems – the problems that come up when you don’t truly know the ins and outs of the the data itself… but you think you do.

This is another “inspired by Twitter” post – it is specifically inspired by this tweet (and corresponding blog post) from Caitlin Hudon[2]. It’s worth reading her blog post before continuing with this one – you go do that now, and I’ll wait.

Caitlin’s ghost story reminded me of a scary story of my own, back from the days before I specialized in data and BI. Back in the days when I was a werewolf hunter. True story.

Around 15 years ago I was a consultant, working on a project with a company that made point-of-sale hardware and software for the food service industry. I was helping them build a hosted solution for above-store reporting, so customers who had 20 Burger Hut or 100 McTaco restaurants[3] could get insights and analytics from all of them, all in one place. This sounds pretty simple in 2020, but in 2005 it was an exciting first-to-market offering, and a lot of the underlying platform technologies that we can take for granted today simply didn’t exist. In the end, we built a data movement service that took files produced by the in-store back-of-house system and uploaded them over a shared dial-up connection[4] from each restaurant to the data center where they could get processed and warehoused.

The analytics system supported a range of different POS systems, each of which produced files in different formats. This was a fun technical challenge for the team, but it was a challenge we expected. What we didn’t expect was the undocumented failure behavior of one of these systems. Without going into too much detail, this POS system would occasionally produce output files that were incomplete, but which did not indicate failure or violate any documented success criteria.

To make a long story short[5], because we learned about the complexities of this system very late in the game, we had some very unhappy customers and some very long nights. During a retrospective we engaged with of the project sponsors for the analytics solution because he had – years earlier – worked with the development group that built this POS system. (For the purposes of this story I will call the system “Steve” because I need a proper noun for his quote.)

The project sponsor reviewed all we’d done from a reliability perspective – all the validation, all the error handling, all the logging. He looked at this, then he looked at the project team and he said:

You guys planned for wolves. ‘Steve’ is werewolves.

Even after all these years, I still remember the deadpan delivery for this line. And it was so true.

We’d gone in thinking we were prepared for all of the usual problems – and we were. But we weren’t prepared for the horrifying reality of the data problems that were lying in wait. We weren’t prepared for werewolves.

Digging through my email from those days, I found a document I’d sent to this project sponsor, planning for some follow-up efforts, and was reminded that for the rest of the projects I did for this client, “werewolves” became part of the team vocabulary.

2019-10-30-12-37-25-597--WINWORD

What’s the moral of this story? Back in 2008 I thought the moral was to test early and often. Although this is still true, I now believe that what Past Matthew really needed was a data catalog or data dictionary with information that clearly said DANGER: WEREWOLVES in big red letters.

This line from Caitlin’s blog post could not be more wise, or more true:

The best defense I’ve found against relying on an oral history is creating a written one.

The thing that ended up saving us back in 2005 was knowing someone who knew something – we happened to have a project stakeholder who had insider knowledge about a key data source and its undocumented behavior. What could have better? Some actual <<expletive>> documentation.

Even in 2020, and even in mature enterprise organizations, having a reliable data catalog or data dictionary that is available to the people who could get value from it is still the exception, not the rule. Business-critical data sources and processes rely on tribal knowledge, time after time and team after team.

I won’t try to supplement or repeat the best practices in Caitlin’s post – they’re all important and they’re all good and I could not agree more with her guidance. (If you skipped reading her post earlier, this is the perfect time for you to go read it.) I will, however, supplement her wisdom with one of my favorite posts from the Informatica blog, from back in 2017.

I’m sharing this second link because some people will read Caitlin’s story and dismiss it because she talks about using Google Sheets. Some people will say “that’s not an enterprise data catalog.” Don’t be those people.

Regardless of the tools you’re using, and regardless of the scope of the data you’re documenting, some things remain universally true:

  • Tribal knowledge can’t be relied upon at any meaningful scale or across any meaningful timeline
  • Not all data is created equal – catalog and document the important things first, and don’t try to boil the ocean
  • The catalog needs to be known by and accessible to the people who need to use the data it described
  • Someone needs to own the catalog and keep it current – if its content is outdated or inaccurate, people won’t trust it, and if they don’t trust it they won’t use it
  • Sooner or later you’ll run into werewolves of your own, and unless you’re prepared in advance the werewolves will eat you

When I started to share this story I figured I would find a place to fit in a “unless you’re careful, your data will turn into a house when the moon is full” joke without forcing it too much, but sadly this was not the case. Still – who doesn’t love a good data werehouse joke?[6]

Maybe next time…


[1] Or whatever it is you’re tracking. You do you.

[2] Apparently I started this post last Halloween. Have I mentioned that the past months have been busy?

[3] Or Pizza Bell… you get the idea.

[4] Each restaurant typically had a single “data” phone line that used the same modem for processing credit card transactions. I swear I’m not making this up.

[5] Or at least short-ish. Brevity is not my forte.

[6] Or this data werehouse joke, for that matter?

3 thoughts on “You can’t avoid problems you can’t see

  1. Pingback: Metadata is not a “nice to have” – BI Polar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s