I love to cook, and over the past few days I’ve made a few of my favorite Indian recipes that I haven’t made in a while. So, of course, this has me thinking about metadata.
Going from Indian cooking to metadata isn’t as big a leap as you might think. The bridge is one of my favorite cookbooks: Julie Sahni’s Classic Indian Vegetarian and Grain Cooking.
If you’re just here for the food, you should immediately make this spectacularly delicious Bengal red lentil recipe taken from this book, because it is absolutely phenomenal. If you’re here for the metadata, remember the link but don’t click on it yet.
Every recipe I’ve made from this cookbook has produced fantastic results. It’s one of those go-to cookbooks where I know that anything I try will be good. And yet, I almost never seek it out when I want to cook, except for the recipes I already know. The reason is metadata.
It doesn’t matter how good your data is – without effective and available metadata, your investment in quality data will be undermined.
Let’s look at the recipe for saag paneer. Say those words out loud (“saag paneer”) and images of that rich, vibrant green sauce will start running through your mind.
I found this recipe easily because I have a bookmark. But let’s say I didn’t – it should still be easy to find, because cookbooks have indexes, and indexes are the perfect tool for finding recipes. Let’s find the recipe for saag paneer.
Literally the only place the phrase “saag paneer” exists in this book is below the recipe header. This means that the only way to find the saag paneer recipe is to flip through the book page by page, or to know the specific and arbitrary phrase the author uses to describe the recipe for Western readers. This is why my copy of the book looks like this:
This systemic problem is exacerbated by the book’s complete lack of photos; there’s also no way to skim through the book and quickly visually identify recipes of relevant interest. The reader is forced to carefully evaluate each recipe in turn, looking at ingredients and processes to decide if the recipe is worth making.
At this point you may be asking what this has to do with metadata or you may see the connection already.
The reason I immediately thought of metadata may be related to a BI effort I’m working on. Without going into too much detail, I have built a small Power BI app that presents information from a program I run and makes that information available to other members of my extended team.
I’m currently at the point where my app needs to include data from other sources in order to increase its value. Fortunately, that data already exists, and to make it even easier to work with, it is available as a set of Power BI dataflows. I was able to email the owner to get access and to learn which dataflows to look in, and I was off. But not for very far, or for very long.
Very quickly I was back where this post started: I was faced with the high-quality data I needed, and I lacked the metadata to efficiently use it. I needed to manually evaluate each dataflow and each entity to understand its contents and context and to decide if it was right for me. I made some early progress, but because of the lack of metadata the effort will likely take days not hours, and this means it probably won’t get done this month or next.
Let that sink in: because of a lack of effective metadata, quality curated data is going unused, and business insights are being delayed by weeks or months.
Just like these fantastic recipes sitting on my shelf, largely unused and unmade because a fantastic cookbook lacks a usable index, these fantastic dataflows are going largely unused, at least by me. All because metadata was treated as a “nice to have” rather than as a fundamental high-priority requirement.
Does your data have the metadata it needs, in a format and location that serves the needs of your users? How do you know? Remember that last picture of all the bookmarks?
These bookmarks are a symptom of the underlying metadata problem. Bookmarks aren’t a problem themselves, but if you’re paying attention you can see that they’ve been implemented as a workaround to a problem that might not otherwise be apparent. If you’re familiar with the concept of “code smells”, you probably see where I’m going.
When your data lacks useful metadata to enable its effective use, people will start to take actions because of this lack. Things like emailing you to ask questions. Things like building their own ad hoc data dictionaries. Things like using alternate or derivative sources instead of using your authoritative data source – like the recipe link I shared above.
The more of these actions you identify, the more urgency you should feel about closing the metadata gap. Not every data source is a werewolf, but every data source requires metadata to be effectively and efficiently used.
 Remember this picture. There will be a quiz later.
 You may also be asking if there’s anything in life that doesn’t make me think about metadata. This is a fair question.
 I knew the owner’s email because I had bookmarked it earlier.
 To be fair, my full schedule is also contributing to this delay – I’m not trying to say that the lack of metadata is independently costing months. But it is a key factor: my schedule could accommodate two or three hours for this work, but it doesn’t have room for two or three days until the end of April.
 I told you there would be a quiz.