Skip to content

BI Polar

Business Intelligence, Data Governance, Mental Health, Diversity, Martial Arts, and Heavy Metal.

  • Home
  • Welcome to BI Polar
  • Building a data culture
  • Dataflows in Power BI
  • BI Polar on YouTube

Category: Metadata

When does memory die?

On May 11, 2020May 11, 2020 By Matthew RocheIn Data Culture, Data Governance, Metadata, MetaphorsLeave a comment

My recent post on metadata and Indian food has gotten more traffic than most posts on this blog, and it has also received more comments and discussion as well. One comment in particular, from Jessica Jolly, really resonated with me:

Love the analogy. My personal favorite analogy regards old family photos. If no one takes the time to write on the back of the photo the who/what/where/when/why (i.e. the metadata), that photo will get thrown away.

Not everyone agreed that this was a great analogy. Khürt Williams in particular called out the inherent value of some data independent of any metadata to give it context.

No one throws away old family photos because they lack who/what/where/when/why. In fact, I would argue that with family photos the metadata lives in the minds of the people in the photograph or some family member you haven’t yet spoken to.

Somethings have value way beyond their metadata.

These comments got me thinking, and made me ask myself: when does memory die?

I’ve seen many variations on this quote[1], but I don’t know who said it first:

You only live as long as the last person who remembers you.

This may be a Russian proverb or it may be a quote from Westworld, but I believe the principle applies as much to business data as it does to family photos, despite the obvious differences between the two.

Looking at the family photos context first I can clearly recall times in my life, in those dark days following a funeral or a divorce, when family photos were discarded and the lack of metadata was a contributing factor. The photos of close relatives were kept, but those of more distant relatives were at risk.

When you’re asking “should I keep this photo?” and the next question is “who are these people?” the answer to the second question is going to influence the answer to the first.

As a specific example, I’d like to share a photo that hangs above the one-handed swords[2] in my hallway.

1586442320506-2d7b50e2-5f49-4739-aa05-0c7308bb64a6
I don’t know who this is.

This photo was in the home of my wife’s grandmother, who passed away almost 20 years ago. We found it when we were cleaning out her house after her funeral; it was in the attic, not on display, and no one knew who this young man might be. A few relatives thought that he was a cousin or second cousin of my wife’s late grandmother who went to the great war and never returned – but no one was certain. There was writing on the paper backing the frame, but it was faded and smudged by the years, and by the time we discovered the photo the words were illegible.

By the time we discovered the data, the metadata was no longer usable, and any subject matter expert who could have shared the deeper context of the data had long since moved on.

And once you phrase it like that, it starts to sound familiar again.

In far too many business contexts the metadata lives only in the minds of the people who create and work with the data. It’s tribal knowledge – just like unlabeled family photographs. But as people move on to new jobs and the business changes over time, that tribal knowledge is lost. Even though the data may still be the same, and may still be valuable, when the people move on the tribal knowledge leaves with them. At this point it will either be organically rediscovered and recreated, or the data will stagnate because no one remembers anymore why it was important. Or, as is the case with the photo above, the data may be used and applied to a different purpose.

Tribal knowledge is a lousy metadata solution, no matter the context. Because tribal knowledge is inherently transitory and lossy, we should strive to capture metadata in a more systematic way, and to keep the metadata as close to the data as possible.

Because eventually memory will die. And some things are too important to forget.


[1] My favorite variation may be from Manowar, who remind us that only courage and heroism linger after death… but it would be a stretch even for me to incorporate this into the body of the post. This is why we have footnotes.

[2] I call out the fact that these are the one-handed swords because the two-handed swords hang in a different hallway, and there isn’t enough room for a photo above them.

Metadata is not a “nice to have”

On March 21, 2020June 14, 2020 By Matthew RocheIn Cooking and Baking, Data Culture, Data Governance, Metadata, Metaphors, Power BI15 Comments

I love to cook, and over the past few days I’ve made a few of my favorite Indian recipes that I haven’t made in a while. So, of course, this has me thinking about metadata.

IMG_20200319_193323
Homemade naan and chicken tikka masala

Going from Indian cooking to metadata isn’t as big a leap as you might think. The bridge is one of my favorite cookbooks: Julie Sahni’s Classic Indian Vegetarian and Grain Cooking.

If you’re just here for the food, you should immediately make this spectacularly delicious Bengal red lentil recipe taken from this book, because it is absolutely phenomenal. If you’re here for the metadata, remember the link but don’t click on it yet.

IMG_20200321_080342

Every recipe I’ve made from this cookbook has produced fantastic results. It’s one of those go-to cookbooks where I know that anything I try will be good. And yet, I almost never seek it out when I want to cook, except for the recipes I already know. The reason is metadata.

It doesn’t matter how good your data is – without effective and available metadata, your investment in quality data will be undermined.

Let’s look at the recipe for saag paneer. Say those words out loud (“saag paneer”) and images of that rich, vibrant green sauce will start running through your mind.

IMG_20200321_080455

I found this recipe easily because I have a bookmark. But let’s say I didn’t – it should still be easy to find, because cookbooks have indexes, and indexes are the perfect tool for finding recipes. Let’s find the recipe for saag paneer.

IMG_20200321_080549
Oh, there’s no entry for saag paneer, or for saag?
IMG_20200321_080628
There’s no entry for paneer? There are so many paneer recipes in this book!
IMG_20200321_080656
Ok, we’re making progress… I think.
IMG_20200321_080727
There it is. Maybe? It still doesn’t say saag paneer anywhere.

Literally the only place the phrase “saag paneer” exists in this book is below the recipe header. This means that the only way to find the saag paneer recipe is to flip through the book page by page, or to know the specific and arbitrary phrase the author uses to describe the recipe for Western readers. This is why my copy of the book looks like this[1]:

IMG_20200321_083450

This systemic problem is exacerbated by the book’s complete lack of photos; there’s also no way to skim through the book and quickly visually identify recipes of relevant interest. The reader is forced to carefully evaluate each recipe in turn, looking at ingredients and processes to decide if the recipe is worth making.

At this point you may be asking what this has to do with metadata[2] or you may see the connection already.

The reason I immediately thought of metadata may be related to a BI effort I’m working on. Without going into too much detail, I have built a small Power BI app that presents information from a program I run and makes that information available to other members of my extended team.

I’m currently at the point where my app needs to include data from other sources in order to increase its value. Fortunately, that data already exists, and to make it even easier to work with, it is available as a set of Power BI dataflows. I was able to email the owner to get access[3] and to learn which dataflows to look in, and I was off. But not for very far, or for very long.

Very quickly I was back where this post started: I was faced with the high-quality data I needed, and I lacked the metadata to efficiently use it. I needed to manually evaluate each dataflow and each entity to understand its contents and context and to decide if it was right for me. I made some early progress, but because of the lack of metadata the effort will likely take days not hours, and this means it probably won’t get done this month or next.

Let that sink in: because of a lack of effective metadata, quality curated data is going unused, and business insights are being delayed by weeks or months[4].

Just like these fantastic recipes sitting on my shelf, largely unused and unmade because a fantastic cookbook lacks a usable index, these fantastic dataflows are going largely unused, at least by me. All because metadata was treated as a “nice to have” rather than as a fundamental high-priority requirement.

Does your data have the metadata it needs, in a format and location that serves the needs of your users? How do you know? Remember that last picture of all the bookmarks[5]?

These bookmarks are a symptom of the underlying metadata problem. Bookmarks aren’t a problem themselves, but if you’re paying attention you can see that they’ve been implemented as a workaround to a problem that might not otherwise be apparent. If you’re familiar with the concept of “code smells”, you probably see where I’m going.

When your data lacks useful metadata to enable its effective use, people will start to take actions because of this lack. Things like emailing you to ask questions. Things like building their own ad hoc data dictionaries. Things like using alternate or derivative sources instead of using your authoritative data source – like the recipe link I shared above.

The more of these actions you identify, the more urgency you should feel about closing the metadata gap. Not every data source is a werewolf, but every data source requires metadata to be effectively and efficiently used.

Requires.


[1] Remember this picture. There will be a quiz later.

[2] You may also be asking if there’s anything in life that doesn’t make me think about metadata. This is a fair question.

[3] I knew the owner’s email because I had bookmarked it earlier.

[4] To be fair, my full schedule is also contributing to this delay – I’m not trying to say that the lack of metadata is independently costing months. But it is a key factor: my schedule could accommodate two or three hours for this work, but it doesn’t have room for two or three days until the end of April.

[5] I told you there would be a quiz.

Dataflows, CDM folders and the Common Data Model

On October 16, 2019October 15, 2019 By Matthew RocheIn Azure, Dataflows, Metadata, Power BI8 Comments

You probably already know that Power BI dataflows store their data in CDM folders. But what does this actually mean?

ball-63527_640
Matthew apparently thinks this is what CDM looks like inside of computers.

This is a quick post to share information that I hope will answer some of the most common questions that I hear from time to time, and which I discuss when I present on Power BI dataflows integration with Azure. I don’t believe any of the information in this post is new or unique[1], but I do believe it is delivered in a more targeted manner that might help.

Point #1: CDM is a metadata system

The Common Data Model is a metadata system that simplifies data management and application development by unifying data into a known form and applying structural and semantic consistency across multiple apps and deployments. If you’re coming from a SQL Server background, it may help to think about CDM as the “system tables” for data that’s stored in multiple locations and formats. This analogy doesn’t hold up to particularly close inspection, but it’s a decent place to start.

Point #2: CDM includes standard entity schemas

In addition to being a metadata system, the Common Data Model includes a set of standardized, extensible data schemas that Microsoft and its partners have published. This collection of predefined schemas includes entities, attributes, semantic metadata, and relationships. The schemas represent commonly used concepts and activities, such as Account and Campaign, to simplify the creation, aggregation, and analysis of data.

Point #3: CDM folders are data storage that use CDM metadata

A CDM folder is a folder in a data lake that conforms to specific, well-defined, and standardized metadata structures and self-describing data. These folders facilitate metadata discovery and interoperability between data producers and data consumers.

CDM folders store metadata in a model.json file; this is what makes them self-describing. This metadata conforms to the CDM metadata format, and can be read by any client application or code that knows how to work with CDM.

Point #4: You don’t need to use any standard entities 

The most common misconception I hear about CDM and CDM folders is that you only use them when you’re storing “standard data.” This is not correct. The data in a CDM entity may map to a standard entity schema, but for 99% of the entities I have built or used, this is not the case. There is nothing in CDM or CDM folders that requires you to use a standard schema.

I hope this helps – please let me know if you have questions!


[1] Check out the documentation for CDM and CDM folders here and here, and here for more detail. You’ll probably notice that some chunks of text in this post were simply copied from that documentation.

Power BI workspace lineage

On October 4, 2019October 28, 2019 By Matthew RocheIn Data Governance, Dataflows, Metadata, Power BI, Power BItes, Video7 Comments

Power BI dataflows have included capabilities for data lineage since they were introduced in preview way back in 2018. The design of dataflows, where each entity is defined by the Power Query that provides its data, enables a simple and easy view into its data lineage. The query is the authoritative statement on where the entity’s data comes from, and how it is transformed.

In early 2019 dataflows also introduced a graphical lineage view. Users could now easily see and understand the relationships between all dataflows in a workspace, which made it easier than ever to work with linked and computed entities and to take advantage of the Lego-like capabilities of dataflows as building blocks for BI.

But what about everything else in a workspace? What about datasets, and reports, and dashboards? What about them?

Power BI has your back.

Image by Karolina Grabowska from Pixabay
Lineage – it’s not just for dataflows anymore!

Late last month the Power BI team released a new preview capability that lets users view workspace content in a single end-to-end lineage view, in addition to the familiar list view.

2019-10-03-15-08-41-068--msedge.png

Once the lineage view is selected, all workspace contents – data sources, dataflows, datasets, dashboards, and reports – are displayed, along with the relationships between them. Here’s a big-picture view of a workspace I’ve been working in lately[1]:

2019-10-03-15-11-02-441--msedge

There’s a lot to unpack here, so I’ll break down what feels to me like the important parts:

  1. The primary data source is a set of text files in folders. The text files are produced by various web scraping processes, and each has a different[2] format and contents.
  2. The secondary data source is a set of reference and lookup data stored in Excel workbooks in SharePoint Online. These workbooks contain manually curated data that is used to cleanse, standardize and/or enrich the data from the primary data[3].
  3. The primary data is staged with minimal transformation in a “raw” dataflow.  This data is then progressively processed by a series of downstream dataflows, including mashing up with the secondary data from Excel, and reshaped into facts and dimensions.
  4. There is one dataset based on the fact and dimension entities, and report based on this dataset. There’s a second dataset that includes data quality metrics from entities in multiple dataflows, and a report based on this dataset. And there are two dashboards, one that includes only visuals for data quality metrics, and one that presents the main data along with a few tiles from the  quality report.

That overview is simplified enough as to be worthless from a technical understanding perspective, but it’s still a wall of text. Who wants to read that?

For a real-world workspace that implements a production BI application, there is likely to be more complexity, and less well defined boundaries between objects. How do you document the contents of a complex workspace, and the relationships between those components? How do you understand them well enough to identify and solve problems?

That’s where the lineage view comes in.

Let’s begin by looking at the data sources.

2019-10-03-15-42-49-109--msedge

For data sources that use a gateway, I can easily see the gateway name. For other data sources I can see the data source location. We’re off to a good start, because I have a single place to look to see where my data is coming from.

Next, let’s look at the dataflows.

2019-10-03-15-48-27-530--msedge

In addition to being able to see the dataflows and the dependencies between them, you can click on any dataflow to see the entities it contains, and can jump directly to edit the dataflow from this view.

This part of workspace lineage isn’t completely new – this is essentially what you could do with dataflows already. But now you can do it with datasets, reports, and dashboards as well.

2019-10-03-15-53-29-266--msedge

Selecting a dataset shows me the tables it contains, and selecting a dashboard or report takes me directly to the visualization. But the real power of this view comes from the relationships between objects. The relationships are where data lineage comes to the fore.

The two primary questions asked in the context of data linage are around upstream “where does this data come from?” and downstream “where is this data used?” lineage scenarios.

The first question is often asked in the context of “why am I not seeing what I expect to see?” and the resulting investigation looks at upstream logic and data source to identify the root cause of problems.

The second question is often asked in the context of “what might break if I change this?” and the resulting investigation looks at downstream objects and processes.

The lineage view has a simple way to answer both questions: just click on the “double arrow” icon and the view will change to highlight all upstream and downstream objects. In a single click you can see where the data comes from, and where the data is used. Click again, and the view toggles back to the default view.

ezgif.com-optimize[4]

There’s more to lineage view than this, including support for shared and certified datasets, but this should be enough to get you excited. Be sure to check out the preview documentation as you check out the feature!

Update: We now have a video to supplement the blog post. Check it out!

Update: The Power BI blog now has the official announcement for this exciting feature. The blog post includes a look at where the lineage team is planning to invest to make this feature even better, and[5] that all of the information in the lineage view is now available using the Power BI API. If you want to integrate lineage and impact analysis into your own tools, or if you want to build a cross-workspace lineage view, you now have the APIs you need to be successful!


[1] This is a pet project that may one day turn into a viable demo, assuming work and life let me devote a little more time to it…

[2] Different, annoying, and difficult to clean up.

[3] For example, the source web site allows any user to contribute, and although the contribution process is moderated there is no enforcement of content or quality. One artist may be credited for “guitar” on one album, “guitars” on another, “lead guitar” on a third. This sounds pretty simple until you take into account there were close to 50,000 different “artist roles” in the raw source data, that needed to be standardized down to a few hundred values in the final data model.

[4] I sure hope this gif works!

[5] This is the most exciting part for me.

Governing Power BI just got a little easier

On September 24, 2019 By Matthew RocheIn Data Governance, Metadata, Power BI3 Comments

The reaction to this recent post on lineage and Power BI dataflows highlighted how important lineage is for Power BI[1]. Although (as this post shows) there are lineage user experiences in place for dataflows today, and experiences coming for all artifacts in a workspace.

Even with these new experiences, there will still be times when you want or need to use the Power BI API to get insight into all the workspaces in the Power BI tenant for which you are an administrator. This ability isn’t new, but some recent updates to the Power BI admin API have made it easier.

I don’t do a lot of development these days, so when a post titled Avoiding workspace loops by expanding navigation properties in the GetGroupsAsAdmin API showed up on the Power BI blog, I didn’t pay much attention. I should have.

GetGroupsAsAdmin is an API available to Power BI administrators that returns the workspaces for the Power BI tenant. With the information it returns, an admin can then call additional APIs like GetDatasetsInGroupAsAdmin and GetReportsInGroupAsAdmin to list their contents – and better understand and manage the tenant. This is a relatively straightforward pattern… but you do need to call up to four APIs for each workspace.

Now that GetGroupsAsAdmin supports $expand, you can get the full list of users, reports, dashboards, datasets, and dataflows in the workspace[2] without needing to call any additional APIs. Pretty sweet.

With the information that’s returned, you can get a view of the contents of your Power BI tenant and start examining the relationships between the various objects, and now it’s simpler than ever. The API returns the workspace contents as JSON, which is easy enough to ingest and visualize using Power BI Desktop.

2019-09-24-07-30-08-220--PBIDesktop

The Power BI team is continuing to add more features and experiences focused on governance and lineage[3], but the nature of oversight and governance is that most companies have specific tools and processes that require customization to one degree or another. Having a simple programmatic way to get workspace contents from your Power BI tenant will continue to be valuable even as these new experiences are delivered.


[1] As of when I’m writing, this lineage post has received more “first week views” than any other dataflows post I’ve made this year.

[2] When I wrote the first draft of this post in early July, dataflows were not yet included in the results of this API; they were added a few weeks later. I decided to wait to complete the post, and here it is, almost October. I really should know better by know.

[3] Take a look at the Power BI roadmap session from MBAS for some big examples, including a preview of the upcoming “lineage view” at around the 23:00 mark.

Dataflows in Power BI: Overview Part 9 – Lineage and impact analysis

On July 1, 2019November 14, 2020 By Matthew RocheIn Data Governance, Dataflows, Metadata, Power BI11 Comments

When you build a Power BI dataflow, lineage is built in. It’s not an added feature – it’s a fundamental aspect of how dataflows work. Instead of having an ETL process that performs the data movement and compute in one place and the data storage in another[1], dataflow entities are defined by a Power Query “M” query. The ETL logic in the query and the data storage are defined as a single unit[2]. The result is that not only does a dataflow contain the data, it contains the full lineage about where the data came from and how it was transformed.

Update: The user experience shown in this post is from July 2019, and is no longer current. For a more current view of lineage in Power BI workspaces, please see this post and video from October 2019, and the Power BI documentation.

lineage today

This lineage information is what makes possible the diagram view that’s available today. As shown above, the diagram view provides a workspace-level lineage view of all dataflows in the workspace, their relationships, and the data sources from which they extract data. – including data sources that are dataflows in other workspaces.

It’s worth emphasizing that this diagram view isn’t an editor per se – it’s not where you define lineage relationships, it’s where you can view, explore, and understand the lineage relationships that are automatically created when you build your dataflows. At the same time, the diagram view does let you edit and manage the dataflows in your workspace in the same way as the list view.

The dataflows API lets developers build similar experiences and automated processes. Specifically, the Get Dataflow API returns the model.json file that contains all the dataflow metadata, including the M scripts for the queries that define the dataflow entities.

Of course, dataflows are just one part of a complete lineage and impact analysis story in Power BI. Ideally a Power BI workspace administrator would be able to see all data sources, dataflows, datasets, reports, and dashboards in a single view, and to easily navigate and manage the relationships and dependencies between them. Something like this:

lineage tomorrow

At the Microsoft Business Applications Summit in June, Microsoft shared plans for lineage across all artifacts in a workspace. This represents a continuation of the current functionality available in dataflows and of the recently-announced shared and certified datasets.

With this upcoming functionality, users working with reports and dashboard can easily see where the data comes from, and how and where it has been transformed. I’m hesitant to call this the “holy grail” of business intelligence, but I’ve heard it described this was enough times that I wouldn’t argue too loudly if someone did. If I a dollar for every meeting I’ve been in where the agenda was derailed by and argument about whether the data was right[3]… In any event, this is a very common problem, with few good solutions.

How are you using the lineage capabilities in dataflows today? Do you have processes that rely on the dataflows lineage UI or API? I’d love to hear about what you’re doing now and what you’re planning – let me know.


[1] Such as having a batch job, stored procedure, or SSIS Data Flow that loads into a SQL Server table or HDFS folder.

[2] If you’re not familiar with this aspect of Power BI dataflows, I’d recommend reading through these posts to catch up: Dataflows in Power BI: Overview Part 6 – Linked and Computed Entities, Lego Bricks and the Spectrum of Data Enrichment and Reuse, Dataflows in Power BI: Resources, and maybe Dataflows in Power BI.

[3] Then I could afford the upgraded hosting plans that WordPress keeps pushing at me…

Quick Tip: Making self-documenting Power BI apps

On June 23, 2019June 22, 2019 By Matthew RocheIn Data Governance, Metadata, Power BI8 Comments

Power BI is constantly evolving – there’s a new version of Power BI Desktop every month, and the Power BI service is updated every week. Many of the new capabilities in Power BI represent gradual refinements, but some are significant enough to make you rethink how you your organization uses Power BI[1].

The new app navigation capabilities introduced last month to Power BI probably fall into the former category. But even though they’re a refinement of what the Power BI service has always had, they can still make your apps significantly better. Specifically, these new capabilities can be used to add documentation and training materials directly to the app experience, while keeping that content in its current location.

It’s surprisingly simple and easy to do. When publishing your app, select the Navigation tab.

01

On the Navigation tab, select New and then Link.

02

And finally, provide the URL of the content, the name for the tab in the app navigation, and select where you want the link to open.

03

When you publish the app, there will be a tab that contains the content from the link you specified in addition to tabs for the reports and dashboards from the workspace.

This approach is very simple, but it’s also very important for at least two reasons:

  1. Having self-documenting apps will help ensure that the people who need to use it will be able to do so. End-user training for Power BI apps, reports, and dashboards is easy for BI developers and authors to overlook, but its importance cannot be overstated[2]. To reach the users who need it the most, your training and documentation needs to be discoverable where those users need it – and that is often in the app itself.
  2. Your content can stay in the system or application where it belongs. Power BI is great at many things, but it’s not a content management system. If you create training content in your Power BI reports, you’re probably not going to be as happy as you could be. And you deserve to be happy.

If you’re interested in a different take on the same feature, check out this post from the fine cows[3] over at FourMoo. I think this new capability is exciting more for what you can do with it and less for how it works, so I skimmed over a lot of the details. FourMoo goes into more of the how-to detail, so you may want to check it out.

How do you plan to use these new capabilities? Do you have existing content that you will add to your Power BI apps? I’d love to hear what you think.


[1] Yes, I reused this opening paragraph from my last blog post. Yes, I do think I’m more clever than I actually am.

[2] Watch this session recording for a more in-depth exploration of this statement.

[3] I’m pretty sure they’re cows. They say that on the internet no one knows you’re a cow, but honestly these bovines aren’t even trying.

Mini Metadata Metaphors: Road Signs

On December 7, 2018 By Matthew RocheIn Data Governance, Metadata, MetaphorsLeave a comment

What happens if you ignore this metadata?

Are all metadata attributes equally important?

Are all metadata attributes equally applicable to a given usage scenario?

What are the risks of using the data in ways that are not approved by the policy defined in the metadata?

What mechanisms exist for the enforcement of that policy?

night_speed_limit.jpg
Image from https://upload.wikimedia.org/wikipedia/commons/6/65/Night_speed_limit.jpg

What happens if you ignore this metadata?

Are all metadata attributes equally important?

Are all metadata attributes equally applicable to a given usage scenario?

What are the risks of using the data in ways that are not approved by the policy defined in the metadata?

What mechanisms exist for the enforcement of that policy?

US62_West_-_Uphill_Sharp_Left_Curve_Sign_(43429239950)

Image from https://upload.wikimedia.org/wikipedia/commons/4/47/US62_West_-_Uphill_Sharp_Left_Curve_Sign_%2843429239950%29.jpg

What happens if you ignore this metadata?

Are all metadata attributes equally important?

Are all metadata attributes equally applicable to a given usage scenario?

What are the risks of using the data in ways that are not approved by the policy defined in the metadata?

What mechanisms exist for the enforcement of that policy? How are these mechanisms different from the mechanisms in earlier examples? What do these differences imply about the risks inherent in misuse?

Bridge_Closed_(1060634359)
Image from https://upload.wikimedia.org/wikipedia/commons/c/c2/Bridge_Closed_%281060634359%29.jpg

In each example, how does the metadata format relate to the importance of the metadata, and the data usage scenarios covered by the metadata attributes?

Does a difference in metadata format convey information about the metadata itself?

Is this significant?

Mini Metadata Metaphors: Food (Part 3)

On December 4, 2018 By Matthew RocheIn Metadata, MetaphorsLeave a comment

Does your fridge look like this?

fridge 1.jpg

Or does your fridge look like this?

fridge 2.jpg

What is the difference between the two?

Which fridge would you assume is used by a professional chef, and which would you assume is not?

What value does the metadata in the second fridge add, and to what scenarios?

What effort goes into creating and maintaining the metadata in the second fridge?

At what point does that effort become justified?

Is there any specific metadata in the second fridge that seems unnecessary? What metadata? What assumptions did you make to reach this conclusion?

Is there any specific metadata in the fridge that seems particularly valuable, to prevent misuse of data or undesired consequences of use?

What experiences might cause someone to adopt the practices to maintain the metadata in the second fridge?

What experiences would cause you to adopt the practices to maintain the metadata in the second fridge?

Mini Metadata Metaphors: Food (Part 2)

On December 2, 2018December 1, 2018 By Matthew RocheIn Metadata, MetaphorsLeave a comment

What are the differences between these three varieties of peppers?

Yes please
Image from https://upload.wikimedia.org/wikipedia/commons/e/e1/Jalapeno_scotch_bonnet_bird%27s_eye_chilis.jpg

Which ones are the hottest?

Which are the mildest?

How will you know without consuming one of each?

Which ones are appropriate for which purposes?

Does this metadata help?

scoville
Image taken from https://en.wikipedia.org/wiki/Scoville_scale

Are the numbers meaningful to you?

If the numbers are not meaningful, how can you get meaning from the chart?

Are you familiar with the scale being used?

If you are not already familiar, how would you learn enough to use the scale?

What metadata would be more appropriate for your use?

How are you similar to, or different from, the target audience of this metadata?

Posts navigation

Older posts

Categories

  • Azure (15)
    • Azure Data Factory (3)
    • Azure Data Lake (1)
  • Career (24)
  • Communication (20)
  • Community (14)
  • Cooking and Baking (9)
    • Recipe (6)
  • Data Culture (53)
  • Data Governance (30)
  • Diversity (23)
  • Generic Blather (19)
  • Heavy Metal (7)
  • Mental Health (18)
  • Metadata (14)
  • Metaphors (22)
  • Patterns (26)
  • Power BI (152)
    • Dataflows (84)
    • Datamarts (1)
    • Power Query (31)
  • Presentations (17)
  • Swords (7)
  • Video (35)
    • Power BItes (10)

Archives

  • January 2023 (1)
  • December 2022 (1)
  • October 2022 (1)
  • July 2022 (1)
  • May 2022 (2)
  • April 2022 (1)
  • March 2022 (1)
  • February 2022 (5)
  • January 2022 (6)
  • December 2021 (5)
  • July 2021 (1)
  • June 2021 (5)
  • May 2021 (3)
  • April 2021 (3)
  • March 2021 (1)
  • February 2021 (1)
  • January 2021 (2)
  • December 2020 (2)
  • November 2020 (2)
  • October 2020 (7)
  • September 2020 (9)
  • August 2020 (12)
  • July 2020 (5)
  • May 2020 (6)
  • April 2020 (1)
  • March 2020 (5)
  • February 2020 (3)
  • January 2020 (7)
  • December 2019 (5)
  • November 2019 (10)
  • October 2019 (14)
  • September 2019 (2)
  • August 2019 (3)
  • July 2019 (4)
  • June 2019 (14)
  • May 2019 (2)
  • April 2019 (1)
  • March 2019 (2)
  • February 2019 (1)
  • January 2019 (1)
  • December 2018 (18)
  • November 2018 (13)
  • October 2018 (19)

Reading List

  • Jen Stirrup
  • Power of Women In Data
  • Data - Marc
  • SQLSwimmer
  • My Life as a Data & Analytics Adviser
  • The Junk Drawer
  • Alluring Analytics
  • Haystacks
  • Chris Webb's BI Blog
  • Data and Dragons
  • Exploring life, parenting, and social justice
  • Guy in a Cube
  • Tidwell Tidbits
  • BI Polar
Blog at WordPress.com.
Jen Stirrup

Global keynote speaker, tech influencer and trusted advisor in AI, Data Science and Business Intelligence

Power of Women In Data

Data - Marc

Blogging about everything related to Data and AI based on Microsoft technology

SQLSwimmer

Swimming through the Sea of SQL

My Life as a Data & Analytics Adviser

The blog of Raphael Branger

The Junk Drawer

I think I have some AAA batteries in here

Alluring Analytics

A Power BI Creator Blog

Haystacks

A data science blog by Caitlin Hudon

Chris Webb's BI Blog

Microsoft Power BI, Analysis Services, DAX, M, MDX, Power Query, Power Pivot and Excel

Data and Dragons

Exploring life, parenting, and social justice

Things that matter

Guy in a Cube

Business Intelligence, Data Governance, Mental Health, Diversity, Martial Arts, and Heavy Metal.

Tidwell Tidbits

SQL/Analytics/AI/Speaker/Diversity & Inclusion

BI Polar

Business Intelligence, Data Governance, Mental Health, Diversity, Martial Arts, and Heavy Metal.

  • Follow Following
    • BI Polar
    • Join 324 other followers
    • Already have a WordPress.com account? Log in now.
    • BI Polar
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...