I love to cook, and over the past few days I’ve made a few of my favorite Indian recipes that I haven’t made in a while. So, of course, this has me thinking about metadata.
Going from Indian cooking to metadata isn’t as big a leap as you might think. The bridge is one of my favorite cookbooks: Julie Sahni’s Classic Indian Vegetarian and Grain Cooking.
If you’re just here for the food, you should immediately make this spectacularly delicious Bengal red lentil recipe taken from this book, because it is absolutely phenomenal. If you’re here for the metadata, remember the link but don’t click on it yet.
Every recipe I’ve made from this cookbook has produced fantastic results. It’s one of those go-to cookbooks where I know that anything I try will be good. And yet, I almost never seek it out when I want to cook, except for the recipes I already know. The reason is metadata.
It doesn’t matter how good your data is – without effective and available metadata, your investment in quality data will be undermined.
Let’s look at the recipe for saag paneer. Say those words out loud (“saag paneer”) and images of that rich, vibrant green sauce will start running through your mind.
I found this recipe easily because I have a bookmark. But let’s say I didn’t – it should still be easy to find, because cookbooks have indexes, and indexes are the perfect tool for finding recipes. Let’s find the recipe for saag paneer.
Literally the only place the phrase “saag paneer” exists in this book is below the recipe header. This means that the only way to find the saag paneer recipe is to flip through the book page by page, or to know the specific and arbitrary phrase the author uses to describe the recipe for Western readers. This is why my copy of the book looks like this:
This systemic problem is exacerbated by the book’s complete lack of photos; there’s also no way to skim through the book and quickly visually identify recipes of relevant interest. The reader is forced to carefully evaluate each recipe in turn, looking at ingredients and processes to decide if the recipe is worth making.
At this point you may be asking what this has to do with metadata or you may see the connection already.
The reason I immediately thought of metadata may be related to a BI effort I’m working on. Without going into too much detail, I have built a small Power BI app that presents information from a program I run and makes that information available to other members of my extended team.
I’m currently at the point where my app needs to include data from other sources in order to increase its value. Fortunately, that data already exists, and to make it even easier to work with, it is available as a set of Power BI dataflows. I was able to email the owner to get access and to learn which dataflows to look in, and I was off. But not for very far, or for very long.
Very quickly I was back where this post started: I was faced with the high-quality data I needed, and I lacked the metadata to efficiently use it. I needed to manually evaluate each dataflow and each entity to understand its contents and context and to decide if it was right for me. I made some early progress, but because of the lack of metadata the effort will likely take days not hours, and this means it probably won’t get done this month or next.
Let that sink in: because of a lack of effective metadata, quality curated data is going unused, and business insights are being delayed by weeks or months.
Just like these fantastic recipes sitting on my shelf, largely unused and unmade because a fantastic cookbook lacks a usable index, these fantastic dataflows are going largely unused, at least by me. All because metadata was treated as a “nice to have” rather than as a fundamental high-priority requirement.
Does your data have the metadata it needs, in a format and location that serves the needs of your users? How do you know? Remember that last picture of all the bookmarks?
These bookmarks are a symptom of the underlying metadata problem. Bookmarks aren’t a problem themselves, but if you’re paying attention you can see that they’ve been implemented as a workaround to a problem that might not otherwise be apparent. If you’re familiar with the concept of “code smells”, you probably see where I’m going.
When your data lacks useful metadata to enable its effective use, people will start to take actions because of this lack. Things like emailing you to ask questions. Things like building their own ad hoc data dictionaries. Things like using alternate or derivative sources instead of using your authoritative data source – like the recipe link I shared above.
The more of these actions you identify, the more urgency you should feel about closing the metadata gap. Not every data source is a werewolf, but every data source requires metadata to be effectively and efficiently used.
 Remember this picture. There will be a quiz later.
 You may also be asking if there’s anything in life that doesn’t make me think about metadata. This is a fair question.
 I knew the owner’s email because I had bookmarked it earlier.
 To be fair, my full schedule is also contributing to this delay – I’m not trying to say that the lack of metadata is independently costing months. But it is a key factor: my schedule could accommodate two or three hours for this work, but it doesn’t have room for two or three days until the end of April.
 I told you there would be a quiz.
15 thoughts on “Metadata is not a “nice to have””
Interesting Matthew for me as I am from Bengal part of India who has red lentils almost every day 🙂 also as a power bi professional enjoyed the way you put forward the importance of metadata. Thanks!
Love the analogy. My personal favorite analogy regards old family photos. If no one takes te time to write on the back of the photo the who/what/where/when/why (i.e. the metadata), that photo will get thrown away.
As regards a dataset, where would you suggest that the metadata live? Is there a way to annotate it within the source or would it have to be a separate place (like an index is) where it lives?
Unfortunately, there is no great general solution today. Enterprise data catalogs are often heavyweight and expensive. Ad hoc catalogs (and enterprise catalogs too, for that matter) are often disconnected from the experiences where people work with the data itself.
In the ideal world, any data consumption or production experience will have capabilities for using and managing metadata as well, and those experiences would make it simple and easy for people to use and curate metadata as part of their core data tasks. But we both know we don’t live in that world today… ;-(
Bad analogy. No one throws away old family photos because they lack who/what/where/when/why. In fact, I would argue that with family photos the metadata lives in the minds of the people in the photograph or some family member you haven’t yet spoken to.
Somethings have value way beyond their metadata. I
Jessica’s analogy is not a bad analogy – it’s just a different one, but Khurt’s response highlights a key part of the challenge.
In far too many business contexts the metadata lives in the minds of the people who create and work with the data. It’s tribal knowledge, just like unlabeled family photographs. But as people move on to new jobs and the business changes over time, that tribal knowledge is lost. Even though the data may still be the same, and may still be valuable, when the people move on the tribal knowledge is lost. At this point it will either be organically rediscovered and recreated, or the data will stagnate because no one remembers anymore why it was important.
I have personally seen this happen on multiple occasions with family photos, in the dark days following a funeral or a divorce.
I think this will probably turn into a new blog post…
New blog post please. 😃
We lost my father-in-law to COVID-19 last week. It’s typical for Indians to collect and share photos of the deceased. Because of metadata on location/date, tagging of people and events, I was able to find and provide digital photographs from my personal catalogue. However, early film photographs of my father-in-law provided no context to my wife. But thank goodness we didn’t throw them out as the elders in the family were able to provide context and I was able to digitize those images.
I’m very sorry for your loss. We’ll have that next post as soon as my schedule permits. I know that there are a lot of bored folks out there, but my team is busier than ever, and blogging has been moved to the back burner…
My kind of post – metadata and a daal recipe. I’m thinking about Persistent Identifiers at the moment, and the same sort of principles apply. A name is very important of course, but without a persistent unique identifier it is very hard to make connections between that name and the things that name represents – publications, place of birth, occupations, aliases etc. However, like metadata, there are a whole host of options for what to use. Still, whatever you use, good structured metadata will help unite the saag paneers of the world!
So saag we all!
(It sounded better in my head.)
As for cookbooks, someone else thought the metadata was an issue, too. Check out: https://www.eatyourbooks.com/
I absolutely love this idea, but it represents another anti-pattern for metadata: separating the data and the metadata. It’s undeniably a helpful workaround for a problem caused by the format in which the data was delivered, which means that the audience for the metadata will be reduced significantly.
In the 80s I had a card catalog for my music collection. For each LP or cassette I would add a 3×5 index card with album and song details, and I would use this to manage the music and make it easier to find things… sometimes, sort of.
But compare that today to a music experience like Spotify or iTunes (or whatever the cool kids use these days) where the data and metadata are delivered together, and the difference is like night and day.
Although there are lots of things that we can do to work around a poor initial metadata implementation, they’re always just band-aids on a deeper problem. Treating metadata as an equal priority to data from day one yields a better integrated experience… but despite this, most software still treats metadata as an afterthought. ;-(
Mathew, The importance of Meta Data very well explained by you in lay terms.
Pingback: When does memory die? – BI Polar