Power BI dataflows – New features

The Power BI dataflows team has just posted a blog update on new dataflows capabilities added this month[1]. You should check it out here.

Once you’re done reading the blog post and are asking “where can I learn more about that new compute engine? you should head over to this post and watch the Microsoft Power BI: Enterprise-grade BI with Power BI dataflows session recording from this month’s Microsoft Business Applications Summit.

Go. Go now!


[1] And thank goodness they did, because I didn’t know what I was going to blog about today!

Quick Tip: Creating “data workspaces” for dataflows and shared datasets

Power BI is constantly evolving – there’s a new version of Power BI Desktop every month, and the Power BI service is updated every week. Many of the new capabilities in Power BI represent gradual refinements, but some are significant enough to make you rethink how you your organization uses Power BI.

Power BI dataflows and the new shared and certified datasets[1] fall into the latter category. Both of these capabilities enable sharing data across workspace boundaries. When building a data model in Power BI Desktop you can connect to entities from dataflows in multiple workspaces, and publish the dataset you create into a different workspace altogether. With shared datasets you can create reports and dashboards in one workspace using a dataset in another[2].

The ability to have a single data resource – dataflow or dataset – shared across workspaces is a significant change in how the Power BI service has traditionally worked. Before these new capabilities, each workspace was largely self-contained. Dashboards could only get data from a dataset in the same workspace, and the tables in the dataset each contained the queries that extracted, transformed, and loaded their data. This workspace-centric design encouraged[3] approaches where assets were grouped into workspaces because of the platform, and not because it was the best way to meet the business requirements.

Now that we’re no longer bound by these constraints, it’s time to start thinking about having workspaces in Power BI whose function is to contain data artifacts (dataflows and/or datasets) that are used by visualization artifacts (dashboards and reports) in other workspaces. It’s time to start thinking about approaches that may look something like this:

data centric workspaces

Please keep in mind these two things when looking at the diagram:

  1. This is an arbitrary collection of boxes and arrows that illustrate a concept, and not a reference architecture.
  2. I do not have any formal art training.

Partitioning workspaces in this way encourages reuse and can reduce redundancy. It can also help enable greater separation of duties during development and maintenance of Power BI solutions. If you have one team that is responsible for making data available, and another team that is responsible for visualizing and presenting that data to solve business problems[4], this approach can given each team a natural space for its work. Work space. Workspace. Yeah.

Many of the large enterprise customers I work with are already evaluating or adopting this approach. Like any big change it’s safer to approach this effort incrementally. The customers I’ve spoken to are planning to apply this pattern to new solutions before they think about retrofitting any existing solutions.

Once you’ve had a chance to see how these new capabilities can change how your teams work with Power BI, I’d love to hear what you think.

Edit 2019-06-26: Adam Saxton from Guy In A Cube has published a video on Shared and Certified datasets. If you want another perspective on how this works, you should watch it.


[1] Currently in preview: blog | docs.

[2] If you’re wondering how these capabilities for data reuse relate to each other, you may want to check out this older post, as the post you’re currently reading won’t go into this topic: Lego Bricks and the Spectrum of Data Enrichment and Reuse.

[3] And in some cases, required.

[4] If you don’t, you probably want to think about it. This isn’t the only pattern for successful adoption of Power BI at scale, but it is a very common and tested pattern.

Quick Tip: Restricting access to linked entities in Power BI dataflows

If you use dataflows with Power BI Premium, you probably use linked and computed entities. There’s an overview post here, and an example of how to use these tools for data profiling here, but in case you don’t want to click through[1], here’s a quick summary:

  • When adding entities to a dataflow, you use another dataflow as a data source
  • This adds linked entities to your new dataflow, which are basically pointers to the entities in the source dataflow
  • You then use these linked entities as building blocks for new entities, using union or merge or similar approaches

This approach is simple and powerful, but[2] it may not always give you exactly what you want. For example, what if you don’t want the users who have access to your new computed entities to also have access to the linked entities your new dataflow references?

Let’s take a look at what this looks like. I’m using the dataflow I build for that older post on data profiling as the starting point[3], so if you’re a regular reader this may look familiar.

01 dataflow before

This is a simple dataflow that contains three linked entities and three computed entities. The computed entities use Table.Profile to generate profiles for the data in the linked entities. When you connect to the dataflow using Power BI Desktop, it looks like this:

02 - consumption before

As you can see, all six entities are available to load into Power BI Desktop.

What if you only wanted users to be able to read the profiles, without also granting them access to the entities being profiled? Why do dataflows give access to both?

The answer is equally simple, and obvious once you see it:

03 - load is enabled by default

As with other dataflow entities[4], the linked entities are enabled for load by default. Removing these entities from the dataflow is as simple as clearing this setting.

04 - load disabled

Once this option is cleared for the linked entities, the dataflow will look like this, with only the three computed entities being included:

05 - dataflow after

And as desired, only these entities are accessible to users in Power BI Desktop:

06 - Consumption after

Hopefully this quick tip is helpful. If this is something that has been making you wonder, please realize you’re in excellent company – you’re not the only one. And if you have other questions about using dataflows in Power BI, please don’t hesitate to ask!


[1] Don’t feel bad – I didn’t want to click through either, and wrote this summary mainly so I didn’t need to read through those older posts to see what I said last year.

[2] As I’ve recently learned by having multiple people ask me about this behavior.

[3] Because I’m lazy.

[4] And Power Query queries in general.

Power BI dataflows and CDM Sessions from MBAS 2019

Last week Microsoft held its annual Microsoft Business Applications Summit (MBAS) event in Atlanta. This two-day technical conference covers the whole Business Applications platform – including Dynamics, PowerApps, and Flow – and not just Power BI, but there was a ton of great Power BI content to be had. Now that the event is over, the session recordings and resources are available to everyone.

MBAS 2019 Banner

There’s a dedicated page on the Power BI community site with all of the sessions, but I wanted to call out a few sessions on dataflows and the Common Data Model that readers of this blog should probably watch[1].

Power BI dataflows sessions

Microsoft Power BI: Democratizing self-service data prep with dataflows

This session is something of a “deep technical introduction” to dataflows in Power BI. If you’re already familiar with dataflows a lot of this will be a repeat, but there are some gems as well.

Microsoft Power BI: Enterprise-grade BI with Power BI dataflows

This session is probably my favorite dataflows session from any conference. This is a deep dive into the dataflows architecture, including the brand-new-in-preview compute engine for performance and scale.

Common Data Model sessions

As you know, Power BI dataflows build on CDM and CDM folders. As you probably know, CDM isn’t just about Power BI – it’s a major area of investment across Azure data services as well. The session lineup at MBAS reflected this importance with three dedicated CDM sessions.

Common Data Model: All you need to know

This ironically-named session[2] provides a comprehensive overview of CDM. It’s not really everything you need, but it’s the right place to begin if you’re new to CDM and want to the big-picture view.

Microsoft Power BI: Common Data Model and Azure Data Services

This session covers how CDM and CDM folders are used in Power BI and Azure data services. If you’ve been following dataflows and CDM closely over the past six months much of this session might be review, but it’s an excellent “deep overview” nonetheless.

Microsoft Power BI: Advanced concepts in the Common Data Model

This session is probably the single best resource on CDM available today. The presenters are the key technical team behind CDM, and goes into details and concepts that aren’t available in any other presentation I’ve found. I’ve been following CDM pretty closely for the past year or more, and I learned a lot from this session. You probably will too.

Once you’re done watching these sessions, remember that there’s a huge library of technical sessions you can watch on-demand. Also some less-technical sessions.


[1] I have a list of a dozen or more sessions that I want to watch, and only a few of them are dataflows-centric. If you look through the catalog you’ll likely find some unexpected gems.

[2] If this is all you need to know, why do we have these other two sessions?

[3] Including Jeff Bernhardt, the architect behind CDM. Jeff doesn’t have the rock star reputation he deserves, but he’s been instrumental in the design and implementation of many of the products and services on which I’ve built my career. Any time Jeff is talking, I make a point to listen closely.

Unlimited dataflow refresh on Power BI Premium

Last month Microsoft announced on the Power BI blog an exciting new capability:

AUTOMATION & LIFE-CYCLE MANAGEMENT

‘Refresh Now’ API provides unlimited data refresh for Power BI Embedded and Power BI Premium

Using the ‘Refresh now’ API, the limitation  on the number of refreshes you can schedule per day is removed and instead  an unlimited number of refreshes can be triggered for each dataset. Combining the refresh now API with incremental refresh, you can build a near real-time dataset that performs small updates of fresh data very often.

Note: The time of existing refresh is not expected to be shorter, so a new refresh of a dataset cannot start before the previous one finishes. Remember that your resource limitations do not change with the introduction of this API, so use these unlimited refreshes with caution and be careful not to overload your resources with unnecessary refreshes.

Although the blog post only explicitly mentions datasets, the same “as many refreshes as you want” capability applies to Power BI dataflows in workspaces assigned to dedicated (Power BI Embedded or Power BI Premium) capacity.

It’s important to note that this is an API-only feature[1]. If you’re setting up a refresh schedule via the UI, you’ll still see the same daily limits, but using the dataflows API you will now be able to have full control over the refresh schedule for your dataflows.


[1] This is by design, and is unlikely to change. A high-frequency refresh schedule can place a significant load on the capacity resources, and is a configuration that should only be made after careful consideration of the implications.

Session resources: Power BI dataflows and Azure Data Lake integration

Last week I delivered two online sessions on the topic of integrating Power BI dataflows with an organizational Azure Data Lake Storage Gen2 storage account. I’ve blogged on this topic before (link | link | link | link) but sometimes a presentation and demo is worth a thousand words.

On April 30th I delivered a “Power BI dataflows 201 – beyond the basics” session for the PASS Business Intelligence Virtual Chapter. The session recording is online here, and you can download the slides here.

On May 4th I delivered a “Integrating Power BI and Azure Data Lake with dataflows and CDM Folders” session for the SQL Saturday community event in Stockholm, Sweden. I was originally planning to deliver the Stockholm session in person, but due to circumstances beyond my control[1] I ended up presenting remotely, which meant that I could more easily record the session. The session recording is online here, and you can download the slides here.

Each of these sessions covers much of the same material. The Stockholm presentation got off to a bit rocky start[2] but it gets smoother after the first few minutes.

Please feel free to use these slides for your own presentations if they’re useful. And please let me know if you have any questions!


[1] I forgot to book flights. Seriously, I thought I had booked flights in February when I committed to speaking, and by the time I realized that I had not booked them, they were way out of my budget. This was not my finest hour.

[2] The presentation was scheduled to start at 6:00 AM, so I got up at 4:00 and came into the office to review and prepare. Instead I spent the 90 minutes before the session start time fighting with PC issues and got everything working less than a minute before 6:00. I can’t remember ever coming in quite this hot…

General availability of Power BI dataflows

Power BI dataflows have been available in public preview since November 2018. For almost five months, customers around the world have been kicking the tires, testing and providing feedback, and building production capabilities using dataflows.

When Microsoft published the latest Business Applications Release Notes, the “new and planned features” list included dataflows general availability with a target date of April 2019, which could typically mean anything before May 1st.

But… April has just arrived, and so has the dataflows GA!

The full details are on the official Power BI blog, so be sure to check it out. Also keep in mind that although dataflows are now generally available, some specific capabilities are still in preview.