Power BI dataflows – New features

The Power BI dataflows team has just posted a blog update on new dataflows capabilities added this month[1]. You should check it out here.

Once you’re done reading the blog post and are asking “where can I learn more about that new compute engine? you should head over to this post and watch the Microsoft Power BI: Enterprise-grade BI with Power BI dataflows session recording from this month’s Microsoft Business Applications Summit.

Go. Go now!


[1] And thank goodness they did, because I didn’t know what I was going to blog about today!

Quick Tip: Creating “data workspaces” for dataflows and shared datasets

Power BI is constantly evolving – there’s a new version of Power BI Desktop every month, and the Power BI service is updated every week. Many of the new capabilities in Power BI represent gradual refinements, but some are significant enough to make you rethink how you your organization uses Power BI.

Power BI dataflows and the new shared and certified datasets[1] fall into the latter category. Both of these capabilities enable sharing data across workspace boundaries. When building a data model in Power BI Desktop you can connect to entities from dataflows in multiple workspaces, and publish the dataset you create into a different workspace altogether. With shared datasets you can create reports and dashboards in one workspace using a dataset in another[2].

The ability to have a single data resource – dataflow or dataset – shared across workspaces is a significant change in how the Power BI service has traditionally worked. Before these new capabilities, each workspace was largely self-contained. Dashboards could only get data from a dataset in the same workspace, and the tables in the dataset each contained the queries that extracted, transformed, and loaded their data. This workspace-centric design encouraged[3] approaches where assets were grouped into workspaces because of the platform, and not because it was the best way to meet the business requirements.

Now that we’re no longer bound by these constraints, it’s time to start thinking about having workspaces in Power BI whose function is to contain data artifacts (dataflows and/or datasets) that are used by visualization artifacts (dashboards and reports) in other workspaces. It’s time to start thinking about approaches that may look something like this:

data centric workspaces

Please keep in mind these two things when looking at the diagram:

  1. This is an arbitrary collection of boxes and arrows that illustrate a concept, and not a reference architecture.
  2. I do not have any formal art training.

Partitioning workspaces in this way encourages reuse and can reduce redundancy. It can also help enable greater separation of duties during development and maintenance of Power BI solutions. If you have one team that is responsible for making data available, and another team that is responsible for visualizing and presenting that data to solve business problems[4], this approach can given each team a natural space for its work. Work space. Workspace. Yeah.

Many of the large enterprise customers I work with are already evaluating or adopting this approach. Like any big change it’s safer to approach this effort incrementally. The customers I’ve spoken to are planning to apply this pattern to new solutions before they think about retrofitting any existing solutions.

Once you’ve had a chance to see how these new capabilities can change how your teams work with Power BI, I’d love to hear what you think.

Edit 2019-06-26: Adam Saxton from Guy In A Cube has published a video on Shared and Certified datasets. If you want another perspective on how this works, you should watch it.


[1] Currently in preview: blog | docs.

[2] If you’re wondering how these capabilities for data reuse relate to each other, you may want to check out this older post, as the post you’re currently reading won’t go into this topic: Lego Bricks and the Spectrum of Data Enrichment and Reuse.

[3] And in some cases, required.

[4] If you don’t, you probably want to think about it. This isn’t the only pattern for successful adoption of Power BI at scale, but it is a very common and tested pattern.

Quick Tip: Restricting access to linked entities in Power BI dataflows

If you use dataflows with Power BI Premium, you probably use linked and computed entities. There’s an overview post here, and an example of how to use these tools for data profiling here, but in case you don’t want to click through[1], here’s a quick summary:

  • When adding entities to a dataflow, you use another dataflow as a data source
  • This adds linked entities to your new dataflow, which are basically pointers to the entities in the source dataflow
  • You then use these linked entities as building blocks for new entities, using union or merge or similar approaches

This approach is simple and powerful, but[2] it may not always give you exactly what you want. For example, what if you don’t want the users who have access to your new computed entities to also have access to the linked entities your new dataflow references?

Let’s take a look at what this looks like. I’m using the dataflow I build for that older post on data profiling as the starting point[3], so if you’re a regular reader this may look familiar.

01 dataflow before

This is a simple dataflow that contains three linked entities and three computed entities. The computed entities use Table.Profile to generate profiles for the data in the linked entities. When you connect to the dataflow using Power BI Desktop, it looks like this:

02 - consumption before

As you can see, all six entities are available to load into Power BI Desktop.

What if you only wanted users to be able to read the profiles, without also granting them access to the entities being profiled? Why do dataflows give access to both?

The answer is equally simple, and obvious once you see it:

03 - load is enabled by default

As with other dataflow entities[4], the linked entities are enabled for load by default. Removing these entities from the dataflow is as simple as clearing this setting.

04 - load disabled

Once this option is cleared for the linked entities, the dataflow will look like this, with only the three computed entities being included:

05 - dataflow after

And as desired, only these entities are accessible to users in Power BI Desktop:

06 - Consumption after

Hopefully this quick tip is helpful. If this is something that has been making you wonder, please realize you’re in excellent company – you’re not the only one. And if you have other questions about using dataflows in Power BI, please don’t hesitate to ask!


[1] Don’t feel bad – I didn’t want to click through either, and wrote this summary mainly so I didn’t need to read through those older posts to see what I said last year.

[2] As I’ve recently learned by having multiple people ask me about this behavior.

[3] Because I’m lazy.

[4] And Power Query queries in general.

Self-Service BI: Asleep at the wheel?

I’ve long been a fan of the tech new site Ars Technica. They have consistently good writing, and they cover interesting topics that sit at the intersection of technology and life, including art, politics[1], and more.

When Ars published this article earlier this week, it caught my eye – but not necessarily for the reason you might think.

sleeping tesla

This story immediately got me thinking about how falling asleep at the wheel is a surprisingly good analogy[2] for self-service BI, and for shadow data in general. The parallels are highlighted in the screen shot above.

  1. Initial reaction: People are using a specific tool in a way we do not want them to use it, and this is definitely not ideal.
  2. Upon deeper inspection: People are already using many tools in this bad way, and were it not for the capabilities of this particular tool the consequences would likely be much worse.

If you’re falling asleep at the wheel, it’s good to have a car that will prevent you from injuring or killing yourself or others. It’s best to simply not fall asleep at the wheel at all, but that has been sadly shown to be an unattainable goal.

If you’re building a business intelligence solution without involvement from your central analytics or data team, it’s good to have a tool[3] that will help prevent you from misusing organizational data assets and harming your business. It’s best to simply not “go rogue” and build data without the awareness of your central team at all, but that has been sadly shown to be an unattainable goal.

Although this analogy probably doesn’t hold up to close inspection as well as the two-edge sword analogy, it’s still worth emphasizing. I talk with a lot of enterprise Power BI customers, and I’ve had many conversations where someone from IT talks about their desire to “lock down” some key self-service feature or set of features, not fully realizing the unintended consequences that this approach might have.

I don’t want to suggest that this is inherently bad – administrative controls are necessary, and each organization needs to choose the balance that works best for their goals, priorities, and resources. But turning off self-service features can be like turning off Autopilot in a Tesla. Keeping users from using a feature is not going to prevent them from achieving the goal that the feature enables. Instead, it will drive[4] users into using other features and other tools, often with even more damaging consequences.

Here’s a key quote from that Ars Technica article:

We should be crystal clear about one point here: the problem of drivers falling asleep isn’t limited to Tesla vehicles. To the contrary, government statistics show that drowsy driving leads to hundreds—perhaps even thousands—of deaths every year. Indeed, this kind of thing is so common that it isn’t considered national news—which is why most of us seldom hear about these incidents.

In an ideal world, everyone will always be awake and alert when driving, but that isn’t the world we live in. In an ideal world, every organization will have all of the data professionals necessary to engage with every business user in need. We don’t live in that world either.

There’s always room for improvement. Tools like Power BI[5] are getting better with each release. Organizations keep maturing and building more successful data cultures to use those tools. But until we live in an ideal world, we each need to understand the direct and indirect consequences of our choices…


[1] For example, any time I see stories in the non-technical press related to hacking or electronic voting, I visit Ars Technica for a deeper and more informed perspective. Like this one.

[2] Please let me explicitly state that I am in no way minimizing or downplaying the risks of distracted, intoxicated, or impaired driving. I have zero tolerance for these behaviors, and recognize the very real dangers they present. But I also couldn’t let this keep me from sharing the analogy…

[3] As well as the processes and culture that enable the tool to be used to greatest effect, as covered in a recent post: Is self-service business intelligence a two-edged sword?

[4] Pun not intended, believe it or not.

[5] As a member of the Power BI CAT team I would obviously be delighted if everyone used Power BI, but we also don’t live in that world. No matter what self-service BI tool you’ve chosen, these lessons will still apply – only the details will differ.

Customer Stories sessions from MBAS 2019

In addition to many excellent technical sessions, last week’s Microsoft Business Applications Summit (MBAS) event also included a series of “Customer Stories” sessions that may be even more valuable.

My recent “Is self-service business intelligence a two-edged sword?” post has gotten more buzz than any of my recent technical posts. Some of this might be due to the awesome use of swords[1] and the inclusion of a presentation recording and slides, but some is also largely due to how it presents guidance for successfully implementing managed self-service BI at the scale needed by a global enterprise.

Well, if you liked that post, you’re going to love these session recordings from MBAS. These “Customer Stories” sessions are hosted by the Power BI CAT team’s leader Marc Reguera, and are presented by key technical and business stakeholders from Power BI customers around the world. Unlike my presentation that focused on general patterns and success factors, these presentations each tell a real-world story about a specific enterprise-scale Power BI implementation.

Why should you care about these customer stories? I think Sun Tzu summed it up best[2]:

Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat.

Understanding how to use technology and features is a tactical necessity, but unless you have a strategic plan for using them, it’s very likely you won’t succeed in the long term. And just as any military leader will study past battles, today’s technical leaders can get great value from those who have gone before them.

If you’re part of your organization’s Power BI team, add these sessions to your schedule. You’ll thank me later.


[1] Just humor me here, please.

[2] I was also considering going with the Julius Caesar’s famous quote about his empire-wide adoption of Power BI, “Veni, vidi, vizi,” but I think the Sun Tzu quote works a little better. Do you agree?

Power BI dataflows and CDM Sessions from MBAS 2019

Last week Microsoft held its annual Microsoft Business Applications Summit (MBAS) event in Atlanta. This two-day technical conference covers the whole Business Applications platform – including Dynamics, PowerApps, and Flow – and not just Power BI, but there was a ton of great Power BI content to be had. Now that the event is over, the session recordings and resources are available to everyone.

MBAS 2019 Banner

There’s a dedicated page on the Power BI community site with all of the sessions, but I wanted to call out a few sessions on dataflows and the Common Data Model that readers of this blog should probably watch[1].

Power BI dataflows sessions

Microsoft Power BI: Democratizing self-service data prep with dataflows

This session is something of a “deep technical introduction” to dataflows in Power BI. If you’re already familiar with dataflows a lot of this will be a repeat, but there are some gems as well.

Microsoft Power BI: Enterprise-grade BI with Power BI dataflows

This session is probably my favorite dataflows session from any conference. This is a deep dive into the dataflows architecture, including the brand-new-in-preview compute engine for performance and scale.

Common Data Model sessions

As you know, Power BI dataflows build on CDM and CDM folders. As you probably know, CDM isn’t just about Power BI – it’s a major area of investment across Azure data services as well. The session lineup at MBAS reflected this importance with three dedicated CDM sessions.

Common Data Model: All you need to know

This ironically-named session[2] provides a comprehensive overview of CDM. It’s not really everything you need, but it’s the right place to begin if you’re new to CDM and want to the big-picture view.

Microsoft Power BI: Common Data Model and Azure Data Services

This session covers how CDM and CDM folders are used in Power BI and Azure data services. If you’ve been following dataflows and CDM closely over the past six months much of this session might be review, but it’s an excellent “deep overview” nonetheless.

Microsoft Power BI: Advanced concepts in the Common Data Model

This session is probably the single best resource on CDM available today. The presenters are the key technical team behind CDM, and goes into details and concepts that aren’t available in any other presentation I’ve found. I’ve been following CDM pretty closely for the past year or more, and I learned a lot from this session. You probably will too.

Once you’re done watching these sessions, remember that there’s a huge library of technical sessions you can watch on-demand. Also some less-technical sessions.


[1] I have a list of a dozen or more sessions that I want to watch, and only a few of them are dataflows-centric. If you look through the catalog you’ll likely find some unexpected gems.

[2] If this is all you need to know, why do we have these other two sessions?

[3] Including Jeff Bernhardt, the architect behind CDM. Jeff doesn’t have the rock star reputation he deserves, but he’s been instrumental in the design and implementation of many of the products and services on which I’ve built my career. Any time Jeff is talking, I make a point to listen closely.