I think about metadata a lot. I probably think about metadata more than I think about swords, and that’s saying something.
I believe my love affair with metadata may have its roots in my college years when I took several anthropology courses from Dr. Ivan Brady. Dr. Brady changed the way I looked at the world, and I will never forget his most frequently used saying:
“Context is practically everything when it comes to determining meaning.”
— Dr. Ivan Brady
Dr. Brady wasn’t talking about metadata, but the statement still applies. Metadata provides context that is lacking from data. Metadata allows a user to understand the meaning of the data – is source, its purpose, its scope, its intended uses – without needing to explore the data itself in exhaustive detail.
In the context of enterprise data, metadata is absolutely vital. But not all metadata is created equal. Some metadata is swords, and some metadata is WiFi.
Please bear with me for a moment – I promise I’m going somewhere with this.
Consider if you will the Oakeshott typology of swords. On the off chance that you’re not already familiar with this vital classification system, here’s a good introduction from the Wikipedia article on Ewart Oakeshott:
Oakeshott’s typology of medieval and early renaissance swords is among his most influential and most lasting works. Though his work was not entirely original, it was certainly groundbreaking. Dr. Jan Peterson had previously developed a typology for Viking swords consisting of twenty-six categories. Peterson’s typology was simplified by Dr. R. E. M. Wheeler in short order to only seven categories (Types I–VII). This simplified typology was then slightly expanded by Oakeshott by the addition of two transitional types into its current nine categories (Types I–IX). From this basis, Oakeshott began work on his own thirteen-category typology of the medieval sword ranging from Type X to Type XXII.
What made Oakeshott’s typology unique was that he was one of the first people either within or outside of academia to seriously and systematically consider the shape and function of the blades of European Medieval swords as well as the hilt, which had been the primary criteria of previous scholars. His typology traced the functional evolution of European swords over a period of five centuries, starting with the late Iron Age Type X, and took into consideration many factors: the shape of blades in cross section, profile taper, fullering, whether blades were stiff and pointed for thrusting or broad and flexible for cutting, etc. This was a breakthrough. Oakeshott’s books also dispelled many popular cliches about Western swords being heavy and clumsy. He listed the weights and measurements of many swords in his collection which have become the basis for further academic work as well as templates for the creation of high quality modern replicas.
And although the quote above doesn’t mention it, in addition to the primary types X through XXII, there are multiple subtypes as well, denoted by a lower-case letter following the roman numeral of the primary type.
- Oakeshott was working from a sample of data that wasn’t necessarily representative, and for which no meaningful metadata existed. He needed to reverse engineer the metadata from the available data, and to manually assign structure and consistency to it.
- Earlier efforts to provide metadata for this data domain had focused on structural characteristics of the data, rather than the functional characteristics in which Oakeshott was interested.
- Oakeshott was building on the efforts of earlier data stewards and expanded the work that they had done in one data domain, while also defining more comprehensive metadata for a new, larger, data domain.
- Oakeshott’s work revealed significant discrepancies between the actual data and users’ perceptions of the data, and in doing so it enabled significant new opportunities to work with that data at scale.
- Each metadata category is defined using an arcane and obtuse combination of letters and numbers to describe its members, such as Xa, XIIIb, and XVIIIb.
Even if you’ve never held a sword, this probably sounds familiar.
A lot of the data used in enterprise analytics wasn’t created with any metadata in mind. Other than table names, object names, and data types, there often isn’t much to go on. In order to understand the data, you need to look at and work with the data, at length. Efforts to develop structured metadata for these existing sources is more data archaeology than it is data science, and it is often difficult to know if you have all of the data, if you have taken into consideration every possible permutation of values… You get the idea. It’s hard, and it’s often very difficult to have strong confidence in the results you reach. Reverse-engineered metadata is better than no metadata, but…
But it’s better to take metadata into account right from the beginning, and to build it at the same time you’re building the data. Like WiFi.
OK, on to WiFi, in particular the IEEE 802.11 standard, also from Wikipedia:
The standard and amendments provide the basis for wireless network products using the Wi-Fi brand. While each amendment is officially revoked when it is incorporated in the latest version of the standard, the corporate world tends to market to the revisions because they concisely denote capabilities of their products. As a result, in the marketplace, each revision tends to become its own standard.
Let’s summarize this as well:
- The metadata was defined before the data was created, rather than being inferred from existing data.
- The metadata includes functional and structural characteristics, based on agreed-up requirements.
- All data is validated against the metadata in a consistent and standard manner as it is created.
- Each metadata category is defined using an arcane and obtuse combination of letters and numbers to describe its members, such as 802.11ax, 802.11b, and 802.11n.
Each approach to metadata adds value, but it should be obvious that prioritizing metadata in your data architecture is key to data consistency, interoperability, and reuse.
When I buy a sword, I can use the Oakeshott type as a concrete way to describe and discuss the sword with its maker, or with my sword-loving friends. This is inherently valuable. But there are many swords that don’t fall neatly into this classification, which reduces that value.
When I buy wireless networking equipment, all I need to do is to look at the standards it implements. From this metadata I can immediately and authoritatively know what other networking equipment it will work with, and what functional characteristics it will implement.
Is your metadata swords, or is it WiFi? Would you rather have swords, or WiFi?
I really think about metadata a lot…
 I never metadata I didn’t like.
 If you’ve been watching Forged in Fire: Knife or Death, you’ve heard this name before. And if you know anything about swords and their classification, you cringed and cried out in pain when you heard this term misused by the hosts of the show.
 If this is the case, you should probably visit http://www.albion-swords.com/ right now. Go on, we’ll wait.
 My favorite arming sword is a type XIIIb. It’s name is Joy.
 And if you’re using a data lake, you’ll be lucky to have this much.
 It will be an Angus Trim type XVII longsword, the younger twin of this one. It will be ready in January. I know this because I ordered it already. No, I haven’t told my wife yet, but she will understand.