In addition to collaboration and partnership between business and IT, successful data cultures have something else in common: they recognize the need for both discipline and flexibility, and have clear, consistent criteria and responsibilities that let all stakeholders know what controls apply to what data and applications.
Today’s video looks at this key fact, and emphasizes this important point: you need to pick your battles.
If you try to lock everything down and manage all data and applications rigorously, business users who need more agility will not be able to do their jobs – or more likely they will simply work around your controls. This approach puts you back into the bad old days before there were robust and flexible self-service BI tools – you don’t want this.
If you try to let every user do whatever they want with any data, you’ll quickly find yourself in the “wild west” days – you don’t want that either.
Instead, work with your executive sponsor and key stakeholders from business and IT to understand what requires discipline and control, and what supports flexibility and agility.
One approach will never work for all data – don’t try to make it fit.
 The original title of this post and video was “discipline and flexibility” but when the phrase “pick your battles” came out unscripted as I was recording the video, I realized that no other title would be so on-brand for me. And here we are.
 In case you were wondering, it’s all unscripted. Every time I edit and watch a recording, I’m surprised. True story.
Love the analogy. My personal favorite analogy regards old family photos. If no one takes the time to write on the back of the photo the who/what/where/when/why (i.e. the metadata), that photo will get thrown away.
Not everyone agreed that this was a great analogy. Khürt Williams in particular called out the inherent value of some data independent of any metadata to give it context.
No one throws away old family photos because they lack who/what/where/when/why. In fact, I would argue that with family photos the metadata lives in the minds of the people in the photograph or some family member you haven’t yet spoken to.
Somethings have value way beyond their metadata.
These comments got me thinking, and made me ask myself: when does memory die?
I’ve seen many variations on this quote, but I don’t know who said it first:
You only live as long as the last person who remembers you.
This may be a Russian proverb or it may be a quote from Westworld, but I believe the principle applies as much to business data as it does to family photos, despite the obvious differences between the two.
Looking at the family photos context first I can clearly recall times in my life, in those dark days following a funeral or a divorce, when family photos were discarded and the lack of metadata was a contributing factor. The photos of close relatives were kept, but those of more distant relatives were at risk.
When you’re asking “should I keep this photo?” and the next question is “who are these people?” the answer to the second question is going to influence the answer to the first.
As a specific example, I’d like to share a photo that hangs above the one-handed swords in my hallway.
This photo was in the home of my wife’s grandmother, who passed away almost 20 years ago. We found it when we were cleaning out her house after her funeral; it was in the attic, not on display, and no one knew who this young man might be. A few relatives thought that he was a cousin or second cousin of my wife’s late grandmother who went to the great war and never returned – but no one was certain. There was writing on the paper backing the frame, but it was faded and smudged by the years, and by the time we discovered the photo the words were illegible.
By the time we discovered the data, the metadata was no longer usable, and any subject matter expert who could have shared the deeper context of the data had long since moved on.
And once you phrase it like that, it starts to sound familiar again.
In far too many business contexts the metadata lives only in the minds of the people who create and work with the data. It’s tribal knowledge – just like unlabeled family photographs. But as people move on to new jobs and the business changes over time, that tribal knowledge is lost. Even though the data may still be the same, and may still be valuable, when the people move on the tribal knowledge leaves with them. At this point it will either be organically rediscovered and recreated, or the data will stagnate because no one remembers anymore why it was important. Or, as is the case with the photo above, the data may be used and applied to a different purpose.
Tribal knowledge is a lousy metadata solution, no matter the context. Because tribal knowledge is inherently transitory and lossy, we should strive to capture metadata in a more systematic way, and to keep the metadata as close to the data as possible.
Because eventually memory will die. And some things are too important to forget.
 My favorite variation may be from Manowar, who remind us that only courage and heroism linger after death… but it would be a stretch even for me to incorporate this into the body of the post. This is why we have footnotes.
 I call out the fact that these are the one-handed swords because the two-handed swords hang in a different hallway, and there isn’t enough room for a photo above them.
Every recipe I’ve made from this cookbook has produced fantasticresults. It’s one of those go-to cookbooks where I know that anything I try will be good. And yet, I almost never seek it out when I want to cook, except for the recipes I already know. The reason is metadata.
It doesn’t matter how good your data is – without effective and available metadata, your investment in quality data will be undermined.
Let’s look at the recipe for saag paneer. Say those words out loud (“saag paneer”) and images of that rich, vibrant green sauce will start running through your mind.
I found this recipe easily because I have a bookmark. But let’s say I didn’t – it should still be easy to find, because cookbooks have indexes, and indexes are the perfect tool for finding recipes. Let’s find the recipe for saag paneer.
Literally the only place the phrase “saag paneer” exists in this book is below the recipe header. This means that the only way to find the saag paneer recipe is to flip through the book page by page, or to know the specific and arbitrary phrase the author uses to describe the recipe for Western readers. This is why my copy of the book looks like this:
This systemic problem is exacerbated by the book’s complete lack of photos; there’s also no way to skim through the book and quickly visually identify recipes of relevant interest. The reader is forced to carefully evaluate each recipe in turn, looking at ingredients and processes to decide if the recipe is worth making.
At this point you may be asking what this has to do with metadata or you may see the connection already.
The reason I immediately thought of metadata may be related to a BI effort I’m working on. Without going into too much detail, I have built a small Power BI app that presents information from a program I run and makes that information available to other members of my extended team.
I’m currently at the point where my app needs to include data from other sources in order to increase its value. Fortunately, that data already exists, and to make it even easier to work with, it is available as a set of Power BI dataflows. I was able to email the owner to get access and to learn which dataflows to look in, and I was off. But not for very far, or for very long.
Very quickly I was back where this post started: I was faced with the high-quality data I needed, and I lacked the metadata to efficiently use it. I needed to manually evaluate each dataflow and each entity to understand its contents and context and to decide if it was right for me. I made some early progress, but because of the lack of metadata the effort will likely take days not hours, and this means it probably won’t get done this month or next.
Let that sink in: because of a lack of effective metadata, quality curated data is going unused, and business insights are being delayed by weeks or months.
Just like these fantastic recipes sitting on my shelf, largely unused and unmade because a fantastic cookbook lacks a usable index, these fantastic dataflows are going largely unused, at least by me. All because metadata was treated as a “nice to have” rather than as a fundamental high-priority requirement.
Does your data have the metadata it needs, in a format and location that serves the needs of your users? How do you know? Remember that last picture of all the bookmarks?
These bookmarks are a symptom of the underlying metadata problem. Bookmarks aren’t a problem themselves, but if you’re paying attention you can see that they’ve been implemented as a workaround to a problem that might not otherwise be apparent. If you’re familiar with the concept of “code smells”, you probably see where I’m going.
When your data lacks useful metadata to enable its effective use, people will start to take actions because of this lack. Things like emailing you to ask questions. Things like building their own ad hoc data dictionaries. Things like using alternate or derivative sources instead of using your authoritative data source – like the recipe link I shared above.
The more of these actions you identify, the more urgency you should feel about closing the metadata gap. Not every data source is a werewolf, but every data source requires metadata to be effectively and efficiently used.
 Remember this picture. There will be a quiz later.
 You may also be asking if there’s anything in life that doesn’t make me think about metadata. This is a fair question.
 I knew the owner’s email because I had bookmarked it earlier.
 To be fair, my full schedule is also contributing to this delay – I’m not trying to say that the lack of metadata is independently costing months. But it is a key factor: my schedule could accommodate two or three hours for this work, but it doesn’t have room for two or three days until the end of April.
The last post was about the dangers inherent in measuring the wrong thing – choosing a metric that doesn’t truly represent the business outcome you think it does. This post is about different problems – the problems that come up when you don’t truly know the ins and outs of the the data itself… but you think you do.
This is another “inspired by Twitter” post – it is specifically inspired by this tweet (and corresponding blog post) from Caitlin Hudon. It’s worth reading her blog post before continuing with this one – you go do that now, and I’ll wait.
The scariest ghost stories I know take place when the history of data — how it’s collected, how it’s used, and what it’s meant to represent — becomes an oral one, passed down like campfire stories from one generation of analysts to another. 👻https://t.co/nTQNSmk3oD
Caitlin’s ghost story reminded me of a scary story of my own, back from the days before I specialized in data and BI. Back in the days when I was a werewolf hunter. True story.
Around 15 years ago I was a consultant, working on a project with a company that made point-of-sale hardware and software for the food service industry. I was helping them build a hosted solution for above-store reporting, so customers who had 20 Burger Hut or 100 McTaco restaurants could get insights and analytics from all of them, all in one place. This sounds pretty simple in 2020, but in 2005 it was an exciting first-to-market offering, and a lot of the underlying platform technologies that we can take for granted today simply didn’t exist. In the end, we built a data movement service that took files produced by the in-store back-of-house system and uploaded them over a shared dial-up connection from each restaurant to the data center where they could get processed and warehoused.
The analytics system supported a range of different POS systems, each of which produced files in different formats. This was a fun technical challenge for the team, but it was a challenge we expected. What we didn’t expect was the undocumented failure behavior of one of these systems. Without going into too much detail, this POS system would occasionally produce output files that were incomplete, but which did not indicate failure or violate any documented success criteria.
To make a long story short, because we learned about the complexities of this system very late in the game, we had some very unhappy customers and some very long nights. During a retrospective we engaged with of the project sponsors for the analytics solution because he had – years earlier – worked with the development group that built this POS system. (For the purposes of this story I will call the system “Steve” because I need a proper noun for his quote.)
The project sponsor reviewed all we’d done from a reliability perspective – all the validation, all the error handling, all the logging. He looked at this, then he looked at the project team and he said:
You guys planned for wolves. ‘Steve’ is werewolves.
Even after all these years, I still remember the deadpan delivery for this line. And it was so true.
We’d gone in thinking we were prepared for all of the usual problems – and we were. But we weren’t prepared for the horrifying reality of the data problems that were lying in wait. We weren’t prepared for werewolves.
Digging through my email from those days, I found a document I’d sent to this project sponsor, planning for some follow-up efforts, and was reminded that for the rest of the projects I did for this client, “werewolves” became part of the team vocabulary.
What’s the moral of this story? Back in 2008 I thought the moral was to test early and often. Although this is still true, I now believe that what Past Matthew really needed was a data catalog or data dictionary with information that clearly said DANGER: WEREWOLVES in big red letters.
This line from Caitlin’s blog post could not be more wise, or more true:
The best defense I’ve found against relying on an oral history is creating a written one.
The thing that ended up saving us back in 2005 was knowing someone who knew something – we happened to have a project stakeholder who had insider knowledge about a key data source and its undocumented behavior. What could have better? Some actual <<expletive>> documentation.
Even in 2020, and even in mature enterprise organizations, having a reliable data catalog or data dictionary that is available to the people who could get value from it is still the exception, not the rule. Business-critical data sources and processes rely on tribal knowledge, time after time and team after team.
I won’t try to supplement or repeat the best practices in Caitlin’s post – they’re all important and they’re all good and I could not agree more with her guidance. (If you skipped reading her post earlier, this is the perfect time for you to go read it.) I will, however, supplement her wisdom with one of my favorite posts from the Informatica blog, from back in 2017.
I’m sharing this second link because some people will read Caitlin’s story and dismiss it because she talks about using Google Sheets. Some people will say “that’s not an enterprise data catalog.” Don’t be those people.
Regardless of the tools you’re using, and regardless of the scope of the data you’re documenting, some things remain universally true:
Tribal knowledge can’t be relied upon at any meaningful scale or across any meaningful timeline
Not all data is created equal – catalog and document the important things first, and don’t try to boil the ocean
The catalog needs to be known by and accessible to the people who need to use the data it described
Someone needs to own the catalog and keep it current – if its content is outdated or inaccurate, people won’t trust it, and if they don’t trust it they won’t use it
Sooner or later you’ll run into werewolves of your own, and unless you’re prepared in advance the werewolves will eat you
When I started to share this story I figured I would find a place to fit in a “unless you’re careful, your data will turn into a house when the moon is full” joke without forcing it too much, but sadly this was not the case. Still – who doesn’t love a good data werehouse joke?
Maybe next time…
 Or whatever it is you’re tracking. You do you.
 Apparently I started this post last Halloween. Have I mentioned that the past months have been busy?
 Or Pizza Bell… you get the idea.
 Each restaurant typically had a single “data” phone line that used the same modem for processing credit card transactions. I swear I’m not making this up.
 Or at least short-ish. Brevity is not my forte.
I live 2.6 miles (4.2 km) from the epicenter of the coronavirus outbreak in Washington state. You know, the nursing home that’s been in the news, where over 10 people have died, and dozens more are infected.
As you can imagine, this has started me thinking about self-service BI.
When the news started to come out covering the US outbreak, there was something I immediately noticed: authoritative information was very difficult to find. Here’s a quote from that last link.
This escalation “raises our level of concern about the immediate threat of COVID-19 for certain communities,” Dr. Nancy Messonnier, director of the CDC’s National Center for Immunization and Respiratory Diseases, said in the briefing. Still, the risk to the general public not in these areas is considered to be low, she said.
That’s great, but what about the general public in these areas?
What about me and my family?
When most of what I saw on Twitter was people making jokes about Jira tickets, I was trying to figure out what was going on, and what I needed to do. What actions should I take to stay safe? What actions were unnecessary or unhelpful?
Before I could answer these questions, I needed to find sources of information. This was surprisingly difficult.
Specifically, I needed to find sources of information that I could trust. There was already a surge in misinformation, some of it presumably well-intentioned, and some from deliberately malicious actors. I needed to explore, validate, confirm, cross-check, act, and repeat. And I was doing this while everyone around me seemed to be treating the emerging pandemic as a joke or a curiosity.
I did this work and made my decisions because I was a highly-motivated stakeholder, while others in otherwise similar positions were farther away from the problem, and were naturally less motivated at the time.
And this is what got me thinking about self-service BI.
In many organizations, self-service BI tools like Power BI will spread virally. A highly-motivated business user will find a tool, find some data, explore, iterate, refine, and repeat. They will work with untrusted – and sometimes untrustworthy – data sources to find the information they need to use, and to make the decisions they need to make. And they do it before people in similar positions are motivated enough to act.
But before long, scraping together whatever data is available isn’t enough anymore. As the number of users relying on the insights being produced increases – even if the insights are being produced by a self-service BI solution – the need for trusted data increases as well.
Where an individual might successfully use disparate unmanaged sources successfully, a population needs a trusted source of truth.
At some point a central authority needs to step up, to make available the data that can serve as that single source of truth. This is easier said than done, but it must be done. And this isn’t even the hard part.
The hard part is getting everyone to stop using the unofficial and untrusted sources that they’ve been using to make decisions, and to use the trusted source instead. This is difficult because these users are invested in their current sources, and believe that they are good enough. They may not be ideal, but they work, right? They got me this far, so why should I have to stop using them just because someone says so?
This brings me back to those malicious actors mentioned earlier. Why would someone deliberately share false information about public health issues when lies could potentially cost people their lives? They would do it when the lies would help forward an agenda they value more than they value other people’s lives.
In most business situations, lives aren’t at stake, but people still have their own agendas. I’ve often seen situations where the lack of a single source of truth allows stakeholders to present their own numbers, skewed to make their efforts look more successful than they actually are. Some people don’t want to have to rebuild their reports – but some people want to use falsified numbers so they can get a promotion, or a bonus, or a raise.
Regardless of the reason for using untrusted sources, their use is damaging and should be reduced and eliminated. This is true of business data and analytics, and it is true of the current global health crisis. In both arenas, let’s all be part of the solution, not part of the problem.
 Before you ask, yes, my family and I are healthy and well. I’ve been working from home for over a week now, which is a nice silver lining; I have a small but comfortable home office, and can avoid the obnoxious Seattle-area commute.
 This article is the best single source I know of. It’s not authoritative source for the subject, but it is aggregating and citing authoritative sources and presenting their information in a form closer to the solution domain than to the problem domain.
 This is why I’ve been practicing social media distancing.
 This is the where the “personal pandemic parable” part of the blog post ends. From here on it’s all about SSBI. If you’re actually curious, I erred on the side of caution and started working from home and avoiding crowds before it was recommended or mandated. I still don’t know if all of the actions I’ve taken were necessary, but I’m glad I took them and I hope you all stay safe as well.
 As anyone who has ever implemented a single source of truth for any non-trivial data domain can attest.
 You can enjoy the lyrics even if Kreator’s awesome music isn’t to your taste.
I’m running behind on my own YouTube publishing duties, but that doesn’t keep me from watching the occasional data culture YouTube video produced by others.
Like this one:
Ok… you may be confused. You may believe this video is not actually about data culture. This is an easy mistake to make, and you can be forgiven for making it, but the content of the video make its true subject very clear:
A new technology is introduced that changes the way people work and live. This new technology replaces existing and established technologies; it lets people do what they used to do in a new way – easier, faster, and further. It also lets people do things they couldn’t do before, and opens up new horizons of possibility.
The technology also brings risk and challenge. Some of this is because of the new capabilities, and some is because of the collision between the new way and the old way of doing things. The old way and the new way aren’t completely compatible, but they use shared resources and sometimes things go wrong.
At the root of these challenges is users moving faster than any relevant authorities. Increasing numbers of people are seeing the value of the new technology, assuming the inherent risk, and embracing its capabilities while hoping for the best.
Different groups see the rising costs and devise solutions for these challenges. Some solutions are tactical, some are strategic. And eventually some champions emerge to push for the creation of standard solutions. Or standards plural, because there always seems to be more than one of those darned things.
Not everyone buys into the standards at first, but over time the standards are refined and… actually standardized.
This process doesn’t slow down the technology adoption. The process and the standards instead provide the necessary shape and structure for adoption to take place as safely as possible.
With the passage of time, users take for granted the safety standards as much as they take for granted the capabilities of the technology… and can’t imagine using one without the other.
For the life of me I can’t imagine why they kept doubling down on the “lane markings” analogy, but I’m actually happy they did. This approach may get more people paying attention – I can’t find any other data culture videos on YouTube with 488K views…
 Part of this is because my wife has been out of town, and my increased parental responsibilities have reduced the free time I would normally spend filming and editing… but it’s mainly because I’m finding that talking coherently about data culture is harder for me than writing about data culture. I’ll get better, I assume. I hope.
 In this case, I watched while I was folding laundry. As one does.
 Yes, pun intended. No, I’m not sorry.
 Either through knowledge or through ignorance.
You may have seen things that make you say “that’s Power BI AF” but none of them have come close to this. It’s literally thePower BI AF.
That’s right – this week Microsoft published the Power BI Adoption Framework on GitHub and YouTube. If you’re impatient, here’s the first video – you can jump right in. It serves as an introduction to the framework, its content, and its goals.
Without attempting to summarize the entire framework, this content provides a set of guidance, practices, and resources to help organizations build a data culture, establish a Power BI center of excellence, and manage Power BI at any scale.
Even though I blog a lot about Power BI dataflows, most of my job involves working with enterprise Power BI customers – global organizations with thousands of users across the business who are building, deploying, and consuming BI solutions built using Power BI.
Each of these large customers takes their own approach to adopting Power BI, at least when it comes to the details. But with very few exceptions, each successful customer will align with the patterns and practices presented in the Power BI Adoption Framework – and when I work with a customer that is struggling with their global Power BI rollout, their challenges are often rooted in a failure to adopt these practices.
There’s no single “right way” to be successful with Power BI, so don’t expect a silver bullet. Instead, the Power BI Adoption Framework presents a set of roles, responsibilities, and behaviors that have been developed after working with customers in real-world Power BI deployments.
If you look on GitHub today, you’ll find a set of PowerPoint decks broken down into five topics, plus a few templates.
These slide decks are still a little rough. They were originally built for use by partners who could customize and deliver them as training content for their customers, rather than for direct use by the general public, and as of today they’re still a work in progress. But if you can get past the rough edges, there’s definitely gold to be found. This is the same content I used when I put together my “Is self-service business intelligence a two-edged sword?” presentation earlier this year, and for the most part I just tweaked the slide template and added a bunch of sword pictures.
And if the slides aren’t quite ready for you today, you can head over to the official Power BI YouTube channel where this growing playlist contains bite-size training content to supplement the slides. As of today there are two videos published – expect much more to come in the days and weeks ahead.
The real heroes of this story are Manu Kanwarpal and Paul Henwood. They’re both cloud solution architects working for Microsoft in the UK. They’ve put the Power BI AF together, delivered its content to partners around the world, and are now working to make it available to everyone.
What do you think?
To me, this is one of the biggest announcements of the year, but I really want to hear from you after you’ve checked out the Power BI AF. What questions are still unanswered? What does the AF not do today that you want or need it to do tomorrow?
Please let me know in the comments below – this is just a starting point, and there’s a lot that we can do with it from here…
 If you had any idea how long I’ve been waiting to make this joke…
 I can’t think of a single exception at the moment, but I’m sure there must be one or two. Maybe.
 Partners can still do this, of course.
 Other than you, of course. You’re always a hero too – never stop doing what you do.
Power BI dataflows have included capabilities for data lineage since they were introduced in preview way back in 2018. The design of dataflows, where each entity is defined by the Power Query that provides its data, enables a simple and easy view into its data lineage. The query is the authoritative statement on where the entity’s data comes from, and how it is transformed.
But what about everything else in a workspace? What about datasets, and reports, and dashboards? What about them?
Power BI has your back.
Late last month the Power BI team released a new preview capability that lets users view workspace content in a single end-to-end lineage view, in addition to the familiar list view.
Once the lineage view is selected, all workspace contents – data sources, dataflows, datasets, dashboards, and reports – are displayed, along with the relationships between them. Here’s a big-picture view of a workspace I’ve been working in lately:
There’s a lot to unpack here, so I’ll break down what feels to me like the important parts:
The primary data source is a set of text files in folders. The text files are produced by various web scraping processes, and each has a different format and contents.
The secondary data source is a set of reference and lookup data stored in Excel workbooks in SharePoint Online. These workbooks contain manually curated data that is used to cleanse, standardize and/or enrich the data from the primary data.
The primary data is staged with minimal transformation in a “raw” dataflow. This data is then progressively processed by a series of downstream dataflows, including mashing up with the secondary data from Excel, and reshaped into facts and dimensions.
There is one dataset based on the fact and dimension entities, and report based on this dataset. There’s a second dataset that includes data quality metrics from entities in multiple dataflows, and a report based on this dataset. And there are two dashboards, one that includes only visuals for data quality metrics, and one that presents the main data along with a few tiles from the quality report.
That overview is simplified enough as to be worthless from a technical understanding perspective, but it’s still a wall of text. Who wants to read that?
For a real-world workspace that implements a production BI application, there is likely to be more complexity, and less well defined boundaries between objects. How do you document the contents of a complex workspace, and the relationships between those components? How do you understand them well enough to identify and solve problems?
That’s where the lineage view comes in.
Let’s begin by looking at the data sources.
For data sources that use a gateway, I can easily see the gateway name. For other data sources I can see the data source location. We’re off to a good start, because I have a single place to look to see where my data is coming from.
Next, let’s look at the dataflows.
In addition to being able to see the dataflows and the dependencies between them, you can click on any dataflow to see the entities it contains, and can jump directly to edit the dataflow from this view.
This part of workspace lineage isn’t completely new – this is essentially what you could do with dataflows already. But now you can do it with datasets, reports, and dashboards as well.
Selecting a dataset shows me the tables it contains, and selecting a dashboard or report takes me directly to the visualization. But the real power of this view comes from the relationships between objects. The relationships are where data lineage comes to the fore.
The two primary questions asked in the context of data linage are around upstream “where does this data come from?” and downstream “where is this data used?” lineage scenarios.
The first question is often asked in the context of “why am I not seeing what I expect to see?” and the resulting investigation looks at upstream logic and data source to identify the root cause of problems.
The second question is often asked in the context of “what might break if I change this?” and the resulting investigation looks at downstream objects and processes.
The lineage view has a simple way to answer both questions: just click on the “double arrow” icon and the view will change to highlight all upstream and downstream objects. In a single click you can see where the data comes from, and where the data is used. Click again, and the view toggles back to the default view.
There’s more to lineage view than this, including support for shared and certified datasets, but this should be enough to get you excited. Be sure to check out the preview documentation as you check out the feature!
Update: We now have a video to supplement the blog post. Check it out!
Update: The Power BI blog now has the official announcement for this exciting feature. The blog post includes a look at where the lineage team is planning to invest to make this feature even better, and that all of the information in the lineage view is now available using the Power BI API. If you want to integrate lineage and impact analysis into your own tools, or if you want to build a cross-workspace lineage view, you now have the APIs you need to be successful!
 This is a pet project that may one day turn into a viable demo, assuming work and life let me devote a little more time to it…
 Different, annoying, and difficult to clean up.
 For example, the source web site allows any user to contribute, and although the contribution process is moderated there is no enforcement of content or quality. One artist may be credited for “guitar” on one album, “guitars” on another, “lead guitar” on a third. This sounds pretty simple until you take into account there were close to 50,000 different “artist roles” in the raw source data, that needed to be standardized down to a few hundred values in the final data model.
The reaction to this recent post on lineage and Power BI dataflows highlighted how important lineage is for Power BI. Although (as this post shows) there are lineage user experiences in place for dataflows today, and experiences coming for all artifacts in a workspace.
Even with these new experiences, there will still be times when you want or need to use the Power BI API to get insight into all the workspaces in the Power BI tenant for which you are an administrator. This ability isn’t new, but some recent updates to the Power BI admin API have made it easier.
GetGroupsAsAdmin is an API available to Power BI administrators that returns the workspaces for the Power BI tenant. With the information it returns, an admin can then call additional APIs like GetDatasetsInGroupAsAdmin and GetReportsInGroupAsAdmin to list their contents – and better understand and manage the tenant. This is a relatively straightforward pattern… but you do need to call up to four APIs for each workspace.
Now that GetGroupsAsAdmin supports $expand, you can get the full list of users, reports, dashboards, datasets, and dataflows in the workspace without needing to call any additional APIs. Pretty sweet.
With the information that’s returned, you can get a view of the contents of your Power BI tenant and start examining the relationships between the various objects, and now it’s simpler than ever. The API returns the workspace contents as JSON, which is easy enough to ingest and visualize using Power BI Desktop.
The Power BI team is continuing to add more features and experiences focused on governance and lineage, but the nature of oversight and governance is that most companies have specific tools and processes that require customization to one degree or another. Having a simple programmatic way to get workspace contents from your Power BI tenant will continue to be valuable even as these new experiences are delivered.
 As of when I’m writing, this lineage post has received more “first week views” than any other dataflows post I’ve made this year.
 When I wrote the first draft of this post in early July, dataflows were not yet included in the results of this API; they were added a few weeks later. I decided to wait to complete the post, and here it is, almost October. I really should know better by know.
There’s a lot to unpack here. and I don’t expect to do it all justice in this post, but Eric’s thought-provoking tweet made me want to reply, and I knew it wouldn’t fit into 280 characters… but I can tackle some of the more important and interesting elements.
First and foremost, Eric tags me before he tags Marco, Chris, or Curbal. I am officially number one, and I will never let Marco or Chris forget it.
With that massive ego boost out of the way, let’s get to the BI, which is definitely dead. And also definitely not dead.
Eric’s post starts off with a bold and simple assertion: If you have the reactive/historical insights you need today, you have enough business intelligence and should focus on other things instead. I’m paraphrasing, but I believe this effectively captures the essence of his claim. Let me pick apart some of the assumptions I believe underlie this assertion.
First, this claim seems to assume that all organizations are “good w/ BI.” Although this may be true of an increasing number of mature companies, in my experience it is definitely not something that can be taken for granted. The alignment of business and technology, and the cultural changes required to initiate and maintain this alignment, are not yet ubiquitous.
Should they be? Should we be able to take for granted that in 2019 companies have all the BI they need? 
The second major assumption behind Eric’s first point seems to be that “good w/ BI” today translates to “good w/ BI” tomorrow… as if BI capabilities are a blanket solution rather than something scoped and constrained to a specific set of business and data domains. In reality, BI capabilities are developed and deployed incrementally based on priorities and constraints, and are then maintained and extended as the priorities and constraints evolve over time.
My job gives me the opportunity to work with large enterprise companies to help them succeed in their efforts related to data, business intelligence, and analytics. Many of these companies have built successful BI architectures and are reaping the benefits of their work. These companies may well be characterized as being “good w/ BI” but none of them are resting on their laurels – they are instead looking for ways to extend the scope of their BI investments, and to optimize what they have.
I don’t believe BI is going anywhere in the near future. Not only are most companies not “good w/ BI” today, the concept of being “good w/ BI” simply doesn’t make sense in the context in which BI exists. So long as business requirements and environments change over time, and so long as businesses need to understand and react, there will be a continuing need for BI. Being “good w/ BI” isn’t a meaningful concept beyond a specific point in time… and time never slows down.
If your refrigerator is stocked with what your family likes to eat, are you “good w/ food”? This may be the case today, but what about when your children become teenagers and eat more? What about when someone in the family develops food allergies? What about when one of your children goes vegan? What about when the kids go off to college? Although this analogy won’t hold up to close inspection it hopefully shows how difficult it is to be “good” over the long term, even for a well-understood problem domain, when faced with easily foreseeable changes over time.
Does any of this mean that BI represents the full set of capabilities that successful organizations need? Definitely not. More and more, BI is becoming “table stakes” for businesses. Without BI it’s becoming more difficult for companies to simply survive, and BI is no longer a true differentiator that assures a competitive advantage. For that advantage, companies need to look at other ways to get value from their data, including predictive and prescriptive analytics, and the development of a data culture that empowers and encourages more people to do more things with more data in the execution of their duties.
And of course, this may well have been Eric’s point from the beginning…
 I’ve been serving on the jury for a moderately complex civil trial for most of August, and because the trial is in downtown Seattle during business hours I have been working early mornings and evenings in the office, and taking the bus to the courthouse to avoid the traffic and parking woes that plague Seattle. I am very, very tired.
 Please remind me to add “thought leader” to my LinkedIn profile. Also maybe something about blockchain.
 I’ll leave this as an exercise for the reader.
 At least in my reality. Your mileage may vary.
 Did this analogy hold up to even distant observation?