Skip to content

BI Polar

Business Intelligence, Data Governance, Mental Health, Diversity, Martial Arts, and Heavy Metal.

  • Home
  • Welcome to BI Polar
  • Building a data culture
  • Dataflows in Power BI
  • BI Polar on YouTube

Category: Metadata

Mini Metadata Metaphors: Food (Part 1)

On November 30, 2018 By Matthew RocheIn Metadata, MetaphorsLeave a comment

What are these?

How do you know?

How much have they been processed since they were produced?

How much metadata is needed to understand what they are?

Carrots, probably
Image from https://upload.wikimedia.org/wikipedia/commons/9/94/Mrkva.JPG

What are these?

How do you know?

How much have they been processed since they were produced?

How much metadata is needed to understand what they are?

IMG_20181124_181922.jpg

What are these?

How do you know?

How much have they been processed since they were produced?

How much metadata is needed to understand what they are?

Why do this payload and the one before it have a standard metadata package, even though the payloads are from different sources? What is the scope of the standard? Under what authority is the standard defined, and enforced?

 

IMG_20181124_181950
IMG_20181124_182001

What are these?

How do you know?

How much have they been processed since they were produced?

Without metadata, how do you evaluate the contents?

Without metadata, would you bother to evaluate the contents, or would you pass them by and instead look for payloads with complete metadata?

IMG_20181124_182138

What are these?

How much have they been processed since they were produced?

Can you infer important details from the payload container format,even though the primary metadata is missing? Is this enough metadata for you to evaluate the payload for use?

IMG_20181124_182153

How does the complexity of a payload relate to the complexity of the metadata? How does it relate to your requirements for considering the metadata to be complete enough?

Do you need more metadata to understand a payload that has been highly processed?

Is it easier or harder to use a simpler payload? How does the complexity of the desired application factor into your answer?

Mini Metadata Metaphors: Music

On November 27, 2018 By Matthew RocheIn Heavy Metal, Metadata, MetaphorsLeave a comment

How is your metadata delivered?

Is your metadata delivered in the same package as the data it describes?

Does it come from the same source?

How rich and complete is your metadata? How consistent is it?

Vinyl_and_pulp_(5289126242)
Image from https://commons.wikimedia.org/wiki/File:Vinyl_and_pulp_(5289126242).jpg

Is your data distributed from a central source, or are there meaningful capabilities for self-service?

Has this changed over time?

Does the answer depend on the data?

How does the richness of the metadata depend on the source from which you obtain the data?

Compact_audio_cassette_1
Image from https://commons.wikimedia.org/wiki/File:Compact_audio_cassette_1.jpg

How does the metadata for your self-service data compare to the metadata for centrally delivered data?

Can you contribute your own metadata for data from either source, or both?

Image from https://commons.wikimedia.org/wiki/File:Bunbury_(23388463359).jpg
Image from https://commons.wikimedia.org/wiki/File:Bunbury_(23388463359).jpg

Does your approach to self-service data introduce new challenges for data quality, and for consistent metadata?

Is the trade-off worth it?

Have you chosen a specific approach to self-service data because your central IT data team hasn’t acknowledged the changing needs of the business?

Image from https://upload.wikimedia.org/wikipedia/it/6/62/Napster_2.0_Beta_7_screenshot.png
Image from https://upload.wikimedia.org/wikipedia/it/6/62/Napster_2.0_Beta_7_screenshot.png

Has your central data team adopted new approaches to delivering data?

Is it as easy to get data through official and supported channels as it is to get data from a peer, circumventing those official channels?

ITunes_Store_Songs_Sales
Image from https://upload.wikimedia.org/wikipedia/commons/2/25/ITunes_Store_Songs_Sales.jpg

When was the last time you needed to add your own metadata to the data you got from a central IT source?

How has the way you discover and obtain data changed over time? Has it made your life easier, harder, or both?

When was the last time you needed to break rules to get the data you needed?

When was the last time you chose to not use a data source because of the lack of available and trustworthy metadata?

Power BI Dataflows and Data Profiling

On October 28, 2018February 16, 2023 By Matthew RocheIn Data Governance, Dataflows, Metadata, Patterns, Power BI4 Comments

Important: This post was written and published in 2018, and the content below no longer represents the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

One of the exciting new preview capabilities in the October 2018 release of Power BI Desktop is support for data profiling in the Power Query editor. Having per-column data profile information available in the query editor is very useful to help understand the data you’re working with…

…but what about understanding data in a broader context?

The Power Query function language “M” contains a Table.Profile function that accepts a table as input and returns a table containing the data profile for the input table.[1] You can use this in Power BI Desktop, but the value of this, at least now that there is a data profiling UI, is limited in scope.

This is where dataflows can help.

Remember the Excel-like, automatically-updating capabilities of linked and computed dataflow entities?[2] The most common use case for linked entities is for data transformation, but with Table.Profile you can also use linked entities to collect, consolidate, and maintain data profile information for the data stored in dataflow entities.

And it’s surprisingly simple.

Start with a workspace[3] and a dataflow, and add linked entities to it for each of the entities you want to profile.

01 - linked entities

For each linked entity in the dataflow, perform the following steps:

  1. Right-click on the entity in the query editor and select “Reference” from the context menu to create a computed entity
  2. Rename the new computed entity to include the word “profile”
  3. Right-click on the renamed entity and select “Advanced editor” from the context menu
  4. In the advanced editor, add a new query step that uses the Table.Profile function

Like this:

02 - reference

03 - Rename

04 - advanced

05 - profile

The edited query is very simple, and because all of the edits made to one query will apply without modification to each of the other queries, once the first one is done it’s just a copy and paste for each new profile entity. You can make this easier by putting the comma at the beginning of the profile line, rather than at the end of the source line, but it will work either way.

let
   Source = Site
  ,Profile = Table.Profile(Source) // note the placement of the comma
in
   Profile

When you’re done, you’ll have a dataflow that contains data profiles for each linked entity, regardless of the workspace in which the linked entity originated.

06 - profiles

Best of all, because the data profiles are stored in Power BI dataflow entities, which are in turn persisted in CDM Folders in Azure Data Lake Storage gen2, they can be consumed and processed in any tool for further analysis.

07 - Save

One of the biggest challenges for data governance is having current and accurate metadata available for enterprise data assets. Data profiles are only one part of this, but they’re a significant part. Because of the nature of linked entities in Power BI, we can now have up-to-date column-level profiles for our data, and can have it without a major engineering effort, and without any complex orchestration or management.

Life is good.


[1] If I understand correctly, the new feature in Power BI Desktop uses this function.

[2] If you don’t, you should probably read this post before you continue.

[3] Remember: to use linked entities this needs to be a new “v2” workspace, and it needs to be backed by Power BI Premium dedicated capacity.

 

Metadata Metaphors: Swords and WiFi

On October 22, 2018November 30, 2018 By Matthew RocheIn Metadata, Metaphors, SwordsLeave a comment

I think about metadata a lot.[1] I probably think about metadata more than I think about swords, and that’s saying something.

I believe my love affair with metadata may have its roots in my college years when I took several anthropology courses from Dr. Ivan Brady. Dr. Brady changed the way I looked at the world, and I will never forget his most frequently used saying:

“Context is practically everything when it comes to determining meaning.”

— Dr. Ivan Brady

Dr. Brady wasn’t talking about metadata, but the statement still applies. Metadata provides context that is lacking from data. Metadata allows a user to understand the meaning of the data – is source, its purpose, its scope, its intended uses – without needing to explore the data itself in exhaustive detail.

In the context of enterprise data, metadata is absolutely vital. But not all metadata is created equal. Some metadata is swords, and some metadata is WiFi.

Please bear with me for a moment – I promise I’m going somewhere with this.

Image from https://en.wikipedia.org/wiki/Oakeshott_typology#/media/File:Oakeshott_types.png
Image from https://en.wikipedia.org/wiki/Oakeshott_typology#/media/File:Oakeshott_types.png

Consider if you will the Oakeshott[2] typology of swords. On the off chance that you’re not already familiar with this vital classification system, here’s a good introduction from the Wikipedia article on Ewart Oakeshott:

Oakeshott’s typology of medieval and early renaissance swords is among his most influential and most lasting works. Though his work was not entirely original, it was certainly groundbreaking. Dr. Jan Peterson had previously developed a typology for Viking swords consisting of twenty-six categories. Peterson’s typology was simplified by Dr. R. E. M. Wheeler in short order to only seven categories (Types I–VII). This simplified typology was then slightly expanded by Oakeshott by the addition of two transitional types into its current nine categories (Types I–IX). From this basis, Oakeshott began work on his own thirteen-category typology of the medieval sword ranging from Type X to Type XXII.

What made Oakeshott’s typology unique was that he was one of the first people either within or outside of academia to seriously and systematically consider the shape and function of the blades of European Medieval swords as well as the hilt, which had been the primary criteria of previous scholars. His typology traced the functional evolution of European swords over a period of five centuries, starting with the late Iron Age Type X, and took into consideration many factors: the shape of blades in cross section, profile taper, fullering, whether blades were stiff and pointed for thrusting or broad and flexible for cutting, etc. This was a breakthrough. Oakeshott’s books also dispelled many popular cliches about Western swords being heavy and clumsy. He listed the weights and measurements of many swords in his collection which have become the basis for further academic work as well as templates for the creation of high quality modern replicas.

And although the quote above doesn’t mention it, in addition to the primary types X through XXII, there are multiple subtypes as well, denoted by a lower-case letter following the roman numeral of the primary type.[4]

To summarize:

  • Oakeshott was working from a sample of data that wasn’t necessarily representative, and for which no meaningful metadata existed. He needed to reverse engineer the metadata from the available data, and to manually assign structure and consistency to it.
  • Earlier efforts to provide metadata for this data domain had focused on structural characteristics of the data, rather than the functional characteristics in which Oakeshott was interested.
  • Oakeshott was building on the efforts of earlier data stewards and expanded the work that they had done in one data domain, while also defining more comprehensive metadata for a new, larger, data domain.
  • Oakeshott’s work revealed significant discrepancies between the actual data and users’ perceptions of the data, and in doing so it enabled significant new opportunities to work with that data at scale.
  • Each metadata category is defined using an arcane and obtuse combination of letters and numbers to describe its members, such as Xa, XIIIb, and XVIIIb.

Even if you’ve never held a sword[3], this probably sounds familiar.

A lot of the data used in enterprise analytics wasn’t created with any metadata in mind. Other than table names, object names, and data types[5], there often isn’t much to go on. In order to understand the data, you need to look at and work with the data, at length. Efforts to develop structured metadata for these existing sources is more data archaeology than it is data science, and it is often difficult to know if you have all of the data, if you have taken into consideration every possible permutation of values… You get the idea. It’s hard, and it’s often very difficult to have strong confidence in the results you reach. Reverse-engineered metadata is better than no metadata, but…

But it’s better to take metadata into account right from the beginning, and to build it at the same time you’re building the data. Like WiFi.

Really.

OK, on to WiFi, in particular the IEEE 802.11 standard, also from Wikipedia:

The standard and amendments provide the basis for wireless network products using the Wi-Fi brand. While each amendment is officially revoked when it is incorporated in the latest version of the standard, the corporate world tends to market to the revisions because they concisely denote capabilities of their products. As a result, in the marketplace, each revision tends to become its own standard.

Let’s summarize this as well:

  • The metadata was defined before the data was created, rather than being inferred from existing data.
  • The metadata includes functional and structural characteristics, based on agreed-up requirements.
  • All data is validated against the metadata in a consistent and standard manner as it is created.
  • Each metadata category is defined using an arcane and obtuse combination of letters and numbers to describe its members, such as 802.11ax, 802.11b, and 802.11n.

Each approach to metadata adds value, but it should be obvious that prioritizing metadata in your data architecture is key to data consistency, interoperability, and reuse.

When I buy a sword[6], I can use the Oakeshott type as a concrete way to describe and discuss the sword with its maker, or with my sword-loving friends. This is inherently valuable. But there are many swords that don’t fall neatly into this classification, which reduces that value.

When I buy wireless networking equipment, all I need to do is to look at the standards it implements. From this metadata I can immediately and authoritatively know what other networking equipment it will work with, and what functional characteristics it will implement.

Is your metadata swords, or is it WiFi? Would you rather have swords, or WiFi?

I really think about metadata a lot…


[1] I never metadata I didn’t like.

[2] If you’ve been watching Forged in Fire: Knife or Death, you’ve heard this name before. And if you know anything about swords and their classification, you cringed and cried out in pain when you heard this term misused by the hosts of the show.

[3] If this is the case, you should probably visit http://www.albion-swords.com/ right now. Go on, we’ll wait.

[4] My favorite arming sword is a type XIIIb. It’s name is Joy.

[5] And if you’re using a data lake, you’ll be lucky to have this much.

[6] It will be an Angus Trim type XVII longsword, the younger twin of this one. It will be ready in January. I know this because I ordered it already. No, I haven’t told my wife yet, but she will understand.

Posts navigation

Newer posts

Categories

  • Azure (15)
    • Azure Data Factory (3)
    • Azure Data Lake (1)
  • Career (25)
  • Communication (21)
  • Community (17)
  • Cooking and Baking (9)
    • Recipe (6)
  • Data Culture (57)
  • Data Governance (31)
  • Diversity (23)
  • Generic Blather (19)
  • Heavy Metal (7)
  • Mental Health (19)
  • Metadata (14)
  • Metaphors (22)
  • Patterns (25)
  • Power BI (156)
    • Dataflows (83)
    • Datamarts (1)
    • Power Query (31)
  • Presentations (19)
  • Swords (7)
  • Video (35)
    • Power BItes (10)

Archives

  • March 2023 (6)
  • January 2023 (1)
  • December 2022 (1)
  • October 2022 (1)
  • July 2022 (1)
  • May 2022 (2)
  • April 2022 (1)
  • March 2022 (1)
  • February 2022 (5)
  • January 2022 (6)
  • December 2021 (5)
  • July 2021 (1)
  • June 2021 (5)
  • May 2021 (3)
  • April 2021 (3)
  • March 2021 (1)
  • February 2021 (1)
  • January 2021 (2)
  • December 2020 (2)
  • November 2020 (2)
  • October 2020 (7)
  • September 2020 (9)
  • August 2020 (11)
  • July 2020 (5)
  • May 2020 (6)
  • April 2020 (1)
  • March 2020 (5)
  • February 2020 (3)
  • January 2020 (7)
  • December 2019 (5)
  • November 2019 (10)
  • October 2019 (14)
  • September 2019 (2)
  • August 2019 (3)
  • July 2019 (4)
  • June 2019 (14)
  • May 2019 (2)
  • April 2019 (1)
  • March 2019 (2)
  • February 2019 (1)
  • January 2019 (1)
  • December 2018 (18)
  • November 2018 (13)
  • October 2018 (19)

Reading List

  • Jen Stirrup
  • Power of Women In Data
  • Data - Marc
  • SQLSwimmer
  • My Life as a Data & Analytics Adviser
  • The Junk Drawer
  • Alluring Analytics
  • Haystacks
  • Chris Webb's BI Blog
  • Data and Dragons
  • Exploring life, parenting, and social justice
  • Guy in a Cube
  • Tidwell Tidbits
  • BI Polar
Blog at WordPress.com.
Jen Stirrup

Global keynote speaker, tech influencer and trusted advisor in AI, Data Science and Business Intelligence

Power of Women In Data

Data - Marc

Blogging about everything related to Data and AI based on Microsoft technology

SQLSwimmer

Swimming through the Sea of SQL

My Life as a Data & Analytics Adviser

The blog of Raphael Branger

The Junk Drawer

I think I have some AAA batteries in here

Alluring Analytics

A Power BI Creator Blog

Haystacks

A data science blog by Caitlin Hudon

Chris Webb's BI Blog

Microsoft Power BI, Analysis Services, DAX, M, MDX, Power Query, Power Pivot and Excel

Data and Dragons

Exploring life, parenting, and social justice

Things that matter

Guy in a Cube

Business Intelligence, Data Governance, Mental Health, Diversity, Martial Arts, and Heavy Metal.

Tidwell Tidbits

SQL/Analytics/AI/Speaker/Diversity & Inclusion

BI Polar

Business Intelligence, Data Governance, Mental Health, Diversity, Martial Arts, and Heavy Metal.

  • Follow Following
    • BI Polar
    • Join 332 other followers
    • Already have a WordPress.com account? Log in now.
    • BI Polar
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...