Session resources: Patterns for adopting dataflows in Power BI

This morning I presented a new webinar for the Istanbul Power BI user group, covering one of my favorite subjects: common patterns for successfully using and adopting dataflows in Power BI.

This session represents an intersection of my data culture series in that it presents lessons learned from successful enterprise customers, and my dataflows series in that… in that it’s about dataflows. I probably didn’t need to point out that part.

The session slides can be downloaded here: 2020-09-23 – Power BI Istanbul – Patterns for adopting dataflows in Power BI

The session recording is available for on-demand viewing. The presentation is around 50 minutes, with about 30 minutes of dataflows-centric Q&A at the end. Please check it out, and share it with your friends!

 

Refreshing only upstream dataflows

tl;dr: If you want to refresh a dataflow without refreshing any downstream dataflows that depend on it, just clear the “Enable Load” setting from any linked entities that reference it. This will remove the dependencies that the Power BI service looks for, without breaking any of the downstream dataflow logic. Win!

At the end of a long day I got an email asking for help[1]. The email included a screen shot from the lineage view of a Power BI workspace, some context about working to troubleshoot a problem, and the question “We want to refresh this dataflow, and not have it refresh the downstream dataflows. Is this possible?”

I almost said no, and then I remembered this post and realized the answer was “yes, sort of.”

Composable ETL with linked and computed entities ensure that when an upstream dataflow is refreshed, any downstream dataflow will automatically refresh as well – all without any explicit configuration required. The dataflows engine in the Power BI service just looks at the dependencies and everything just works.[2]

This is great, until you need it to work differently, and the “no explicit configuration” also means “no real way to turn it off.” So long as there are linked entities referencing entities in the current dataflow, refreshing the current dataflow will cause the dataflows containing those linked entities to refresh as well.

Fortunately, that 2018-era blog post illustrates that clearing the “enable load” setting for a linked entity also clears the metadata that the dataflows engine looks at to build the dependencies graph used for composable ETL.

So I send off a quick reply, “Try this – I haven’t tested it end to end, but it should work,” and gave it a quick test because at this point I was curious.

This was my starting point: three dataflows, all related.

When I refresh the first dataflow in the chain, the next two refresh as expected.

To break that refresh chain I can just edit the second dataflow.

The only change I need to make is this one: clearing that “Enable load” check box.

Once this is done, the lineage view looks a little more like this, and I can refresh the upstream dataflow in isolation.

Once the troubleshooting is done, all I need to do to get back to work is to re-enable load on those temporarily disabled linked entities. Boom!

I doubt this technique will get used all that often, but it looks like it worked today. As I was finishing up this post I got an email confirmation that this solved the problem it needed to solve, and the person who had emailed asking for help is now unblocked.

Life is good.

Look, he’s emailing again, probably offering to buy me a coffee or a nice new sword to say thank you. Oh, he wants to show me other problems he’s run into, not offering to buy me anything at all.

Oh.


[1] Typically I respond to emails asking for help with a perfunctory redirection to another location, with a statement about how my schedule and responsibilities don’t allow me to scale by replying to one-off email support requests. The person sending this particular mail got a pass because they have consistently gone out of their way to help the Power BI community, and because they asked an interesting question with all of the information I needed to respond. It also helped that the email came in at the end of the day when my brain was too burned to start the next tasks on my list. I would definitely not recommend trying to use email me asking for help. Definitely not.

[2] Because of cloud magic, I think. Maybe metadata too, but isn’t metadata a kind of magic as well?

Data Culture: Getting stakeholder buy-in

Have you ever heard the term “Excel hell”?

Odds are, if you’re reading this blog you’ve heard it, and possibly lived it once or twice. If not, you may want to take a minute to search online and discover what the internet has to say about it.

Managed self-service BI with Power BI is one way to escape or avoid Excel hell, but any self-service data tool brings with it some of the same risks that Excel brings. Power BI introduces guardrails to make it easier to manage the work that self-service authors produce, but wouldn’t it be nice to address the problem at its root, and prevent unwanted content from being shared in the first place?

The video introduces a set of techniques to gain explicit stakeholder buy-in. For the enterprise Power BI customers I work with, these steps are usually prerequisites to getting a Pro license and permission to publish to the Power BI service, but they may be required for other activities as well.

  1. Ask for it – rather than automatically issuing licenses in bulk, issue licenses only to users who explicitly request them
  2. Sign a data covenant – require users to sign “terms of use” for working with data in ways that align with the goals of the organization.
  3. Take a test – require users to take and pass a simple[1] test
  4. Complete training – require users to attend Dashboard in a Day or similar introductory training

None of these barriers is designed to keep anyone from getting access to the tools and data they need. They’re designed to make people work for it.

As mentioned in earlier posts and videos, humans are complicated and tricky, but most people value what they earn more than they value what they’re given. And if someone works to earn the ability to publish and share data and reports, they’re more likely to think twice before they publish something and forget it.

This is a small but effective step that can reduce the ongoing effort required to manage and govern the content published to the Power BI service. And if you put the entry point[2] to requesting access in your central portal, you’ll be helping reinforce the behaviors that will make your data culture grow, right from the beginning.


[1] Emphasis here on “simple” – every company I talked to who used this approach designed the test so that anyone could pass it.

[2] Power App, Customer Voice form, ServiceNow ticket, whatever fits your processes and requirements.

Dataflows enhanced compute engine – will it fold?

I’ve been seeing more questions lately about the dataflows enhanced compute engine in Power BI. Although I published a video overview and although there is feature coverage in the Power BI documentation, there are a few questions that I haven’t seen readily answered online.

A lot of these questions can be phrased as “what happens when I turn on the enhanced compute engine for my Power BI Premium capacity?”

Most of my responses start with the phrase “it depends” – so let’s look at some of the factors the answer depends on.

First, let’s look at a set of connected dataflows in a Premium workspace:

This workspace has three dataflows we’re interested in:

  1. A “staging” dataflow that contains un- or minimally-transformed data from the external data source or sources
  2. A “cleansed” dataflow that uses linked and computed entities to apply data cleansing logic to the data in the staging dataflow
  3. A final dataflow that uses linked and computed entities to transform the cleansed data into a star schema for analysis

Next, let’s look at the settings for one of these dataflows:

For a given dataflow, the owner can configure the enhanced compute engine to be Off/Disabled, On/Enabled, or “Optimized” which means that the Power BI service will turn it on or off for entities in the dataflow depending on how each entity is used by other dataflows.

This dataflow-level setting, combined with how the dataflows are used, determines whether the enhanced compute engine is enabled on entities in the dataflow:

  • Disabled: The enhanced compute engine is not enabled for any entities in the dataflow
  • Optimized: The enhanced compute engine is enabled for any entity in the dataflow that is referenced by a linked entity in another dataflow
  • On: The enhanced compute engine is enabled for all entities in the dataflow

Important point: To connect to a dataflow using DirectQuery you must explicitly set the enhanced compute engine to “On” even if entities in the dataflow are being referenced by linked entities.

Now let’s put these two factors together, considering that all three dataflows in the first image are using the default setting: Optimized.

  1. The “staging” dataflow uses the enhanced compute engine for any entities that are referenced by linked entities in another dataflow
  2. The “cleansed” dataflow uses the enhanced compute engine for any entities that are referenced by linked entities in another dataflow
  3. The final dataflow does not use the enhanced compute engine because none of its entities is referenced by linked entities in another dataflow[2]

Now we’re ready to revisit the actual question we’re trying to answer[3]: “what happens when I turn on the enhanced compute engine for my Power BI Premium capacity?”

Once the enhanced compute engine is enabled in the Power BI Premium capacity settings, and the dataflow settings and configuration (as illustrated above) dictate that the engine is used for a given dataflow, this is what happens:

  • When the dataflow is refreshed, the Power Query for each entity is executed, and the output of the query is persisted in the dataflow’s CDM folder as CSV data and JSON metadata
  • For any entity for which the enhanced compute engine is enabled, the output of the entity’s Power Query is also loaded into a table in a SQL database instance managed by the Power BI service

This second bullet is where the magic happens. Because the data is now in a storage format that includes a compute engine, supported queries can use the SQL cache instead of the underlying CSV files, and get the increased performance that comes with query folding.

Having the data in SQL also means that the dataflow can server as a DirectQuery data source – without the enhanced compute engine a dataflow can only be used as an Import mode data source.

The next logical question is “what exactly do you mean by supported queries?”

These queries are supported, meaning that they will use the SQL data if the enhanced compute engine is enabled for a dataflow:

  • Dataflow refresh against a dataflow with the enhanced compute engine enabled – for example, the “Cleansed” dataflow or “Final” dataflow in the image above
  • Authoring in Power BI Desktop, when using DirectQuery mode
  • User activity in the BI service, when using DirectQuery mode

These queries are not supported, and will always use the CSV data even if the enhanced compute engine is enabled for a dataflow:

  • Editing a dataflow in Power Query Online
  • Authoring in Power BI Desktop, when using Import mode
  • Dataset refresh in the Power BI service, when using Import mode

The final question[4] is “what Power Query operations will actually fold against the SQL data and take full advantage of the compute capabilities?”

The answer is in this blog post from Cristian Petculescu, the architect of dataflows and much of Power BI. Cristian enumerates well over 100 M constructs and the SQL to which they fold so I’m not going to try to summarize them all here. Go take a look for yourself if you need more information than what’s in this post.

Was this helpful? Did you learn anything new?

If you have any more questions about the enhanced compute engine, please let me know!


[1] This pattern of separating dataflows based on the logical role of the data preparation logic they implement is a general best practice, in addition to aligning well with dataflows best practices.

[2] If we wanted the enhanced compute engine to be used for this final dataflow even though it is not referenced by any linked entities, we would need to change the setting from “Optimized” to “On.”

[3] Did you still remember what the question was? I copied it here because I’d forgotten and was concerned you might have too. It’s September 2020 and the world is on fire, which can make it surprisingly difficult to think…

[4] Which almost no one ever actually asks me, but which fits into the theme of the post, so I’m including it here for completeness.

Data Culture: Now you’re thinking with portal

In an ideal world, everyone knows where to find the resources and tools they need to be successful.

We don’t live in that world.

I’m not even sure we can see that world from here. But if we could see it, we’d be seeing it through a portal[1].

One of the most common themes from my conversations with enterprise Power BI customers is that organizations that are successfully building and growing their data cultures have implemented portals where they share the resources, tools, and information that their users need. These mature companies also treat their portal as a true priority – the portal is a key part of their strategy, not an afterthought.

This is why:

In every organization of non-trivial size there are obstacles that keep people from finding and using the resources, information, and data they need.

Much of the time people don’t know what they need, nor do they know what’s available. They don’t know what questions to ask[2], much less know where to go to get the answers. This isn’t their fault – it’s a natural consequence of working in a complex environment that changes over time on many different dimensions.

As I try to do in these accompanying-the-video blog posts I will let the video speak for itself, but there are a few key points I want to emphasize here as well.

  1. You need a place where people can go for all of the resources created and curated by your center of excellence
  2. You need to engage with your community of practice to ensure that you’re providing the resources they need, and not just the resources you think they need
  3. You need to keep directing users to the portal, again and again and again, until it becomes habit and they start to refer their peers

The last point is worth emphasizing and explaining. If community members don’t use the portal, it won’t do what you need it to do, and you won’t get the return you need on your investments.

Users will continue to use traditional “known good” channels to get information – such as sending you an email or IM – if you let them. You need to not let them.


[1] See what I did there?

[2] Even though they will often argue vehemently against this fact.

Standardizing on Power BI

Last week’s post on migrating to Power BI was intended to be a stand-alone post, but this excellent comment from Matthew Choo via LinkedIn made me realize I had more to say.

I could not agree more, Matthew![1]

In my conversations with leaders from enterprise Power BI customers, if they mention to me that they’re standardizing on Power BI as their analytics tool and platform, I try to ask two questions:

  1. Why?
  2. Why now?

The answer to the first question is almost always a combination of reasons from the previous post on migration and the fact that the customer sees Power BI as the best choice for the organization’s needs moving forward. There’s very little variation in the answers over dozens of conversations.

The answers to the second question are more diverse, but there are some common themes that I hear again and again.

One theme is that the selection of multiple tools happened organically. Sometimes a new BI tool is adopted through a merger or acquisition. Sometimes a new CIO or other senior leader mandates the adoption of their favorite tool, and as leaders change the old tools are left behind while the new tools are added to the stable. Sometimes a key data source only works well with the data source vendor’s reporting tool because the vendor refuses to open their APIs to 3rd parties.[3] Often the plan has been to eliminate excess tools at some point, but the point hasn’t been now… until now.

Very often the factor that makes standardization a priority is pain, which brings me back to a point I made in the introductory post in the “building a data culture” series:

I strongly believe that pain is a fundamental precursor to significant change. If there is no pain, there is no motivation to change. Only when the pain of not changing exceeds the perceived pain of going through the change will most people and organizations consider giving up the status quo.

The most common general reasons for this “pain balance” shifting[4] involve money. Organizations need to eliminate inefficiencies so they can invest in areas of strategic importance. A leader may proactively seek out inefficiencies to eliminate, but it’s more typical for me to hear about external market pressures[5] necessitating a more reactive consolidation.

The other common theme in responses to the question “why now?” is that the organization has a new chief data officer, that the CDO is focused on building a data culture, and for the reasons listed in last week’s post has has made consolidation a priority.

What’s interesting about this theme is that the hiring of a CDO is part of a larger strategic shift in how the organization thinks about data. The C-level executives know that they need to start treating data as a strategic asset, and they realize that it’s too important to be rolled up into the responsibilities of the CIO or another existing leader. Very often, they already have a good idea of the types of changes that need to happen, but want to hire a senior leader who will make the final decisions and own the actions and their outcomes. In other words, the hiring of a CDO is often an early-but-lagging indicator that there’s executive support for a data culture. That’s a good thing.

Before making a big change always important to understand what you hope to achieve, but it’s also important to take a little time to examine how you got to the point where change is necessary, so you can better avoid the mistakes of the past…


[1] Usually when I say this I’m agreeing with myself[2] so it’s nice to be referring to a different Matthew here.

[2] Not that i agree with myself very often, mind you.

[3] Yeah, you know who I’m talking about.

[4] I wasn’t expecting to coin this phrase, but as soon as I typed it, I loved it. I think I may quit my day job and start a new business as a consultant delivering expensive “Shifting the Pain Balance(tm)” workshops for chief data officers and other senior executives.

[5] Since it’s still 2020, I should point out that the COVID-19 has been at the root of many external market pressures. I’ve heard dozens of companies say that they’re increasing their investment in building a data culture because of pandemic-induced challenges.

Data Culture: Showcasing the Art of the Possible

The last post and video in this series looked at the broad topic of training. This post looks at as specific aspect of this topic: letting people know what is possible, and sparking their imagination to do amazing things.

A lot of content and training materials will focus on capabilities: here is a feature, this is what it does, and this is how you use it. Although this type of content is important, it isn’t enough on its own to accelerate the growth of a data culture.

The most successful organizations I’ve worked with have included in their community of practice content specifically targeting the art of the possible. This might be a monthly presentation by community champions across the business. It might be someone from the center of excellence highlighting new features, or the integration between features and tools. The most important thing is planting the seed of an idea in the minds of people who will say “I had no idea you could do that!”

My colleagues Miguel and Chris are some of my greatest personal sources of inspiration for building reports[1] because each of them does amazing things with Power BI that make it powerful, usable, and beautiful – but they’re just two of the many people out there showing me new techniques and possibilities.

Who will you inspire today?


[1] And by now you probably realize that I need all the inspiration I can get for anything related to data visualization.

Migrating to Power BI

One aspect of building a data culture is selecting the right tools for the job. If you want more people working with more data, giving the tools they need to do that work is an obvious[1] requirement. But how many tools do you need, and which tools are the right tools?

Migrating to the cloud

It should be equally obvious that the answer is “it depends.” This is the answer to practically every interesting question. The right tools for an organization depend on the data sources it uses, the people who work with that data, the history that has gotten the organization to the current decision point, and the goals the organization needs to achieve or enable with the tools it selects.

With that said, it’s increasingly common[2] to see large organizations actively working to reduce the number of BI tools they support[3]. The reasons for this move to standardization are often the same:

  • Reduce licensing costs
  • Reduce support costs
  • Reduce training costs
  • Reduce friction involved in driving the behaviors needed to build and grow a data culture

Other than reducing the licensing costs[4], most of these motivations revolve around simplification. Having fewer tools means learning and using fewer tools. It means everyone learning and using fewer tools, which often results in less time and money spent to get more value from the use of those tools.

One of the challenges in eliminating a BI tool is ensuring that the purpose that tool fulfilled is now effectively fulfilled by the tool that replaces it. This is where migration comes in.

The Power BI team at Microsoft has published a focused set of guidance articles focused specifically on migrating from other BI tools to Power BI.

This documentation was written by the inestimable Melissa Coates of Coates Data Strategies, with input and technical review by the Power BI customer advisory team. If you’re preparing to retire another BI tool and move its workload to Power BI – or if you’re wondering where to start – I can’t recommend it highly enough.


[1] If this isn’t obvious to a given organization or individual, I’m reasonably confident that they’re not actively trying to build a data culture, and not reading this blog.

[2] I’m not a market analyst but I do get to talk to BI, data, and analytics leaders at large companies around the world, and I suspect that my sample size is large and diverse enough to be meaningful.

[3] I’m using the word “support” here – and not “use” – deliberately. It’s also quite common to see companies remove internal IT support from deprecated BI tools, but also let individual business units continue to use them – but also to pay for the tools and support out of their own budgets. This is typically a way to allow reluctant “laggard” internal customer groups to align with the strategic direction, but to do it on their own schedules.

[4] I’m pretty consistent in saying I don’t know anything about licensing, but even I understand that paying for two things costs more than paying for one of those things.

Data Culture: Training for the Community of Practice

The last few posts and videos in this series have introduced the importance of a community where your data culture can grow, and ways to help motivate members of the community, so your data culture can thrive.

But what about training? How do we give people the skills, knowledge, and guidance that they need before they are able do work with data and participate in the data culture you need them to help build?

Training is a key aspect of any successful data culture, but it isn’t always recognized as a priority. In fact the opposite is often true.

I’ve worked in tech long enough, and have spent enough of that time close to training to know that training budgets are often among the first things cut during an economic downturn. These short-term savings often produce long-term costs that could be avoided, and more mature organizations are beginning to realize this.

In my conversations with enterprise Power BI customers this year, I’ve noticed a trend emerging. When I ask how the COVID-19 pandemic is affecting how they work with data, I hear “we’re accelerating our efforts around self-service BI and building a data culture because we know this is now more important than ever” a lot more than I hear “we’re cutting back on training to save money.” There’s also a clear correlation between the maturity of the organizations I’m talking with and the response I get. Successful data cultures understand the value of training.

I’ll let the video speak for itself, but I do want to call out a few key points:

  1. Training on tools is necessary, but it isn’t enough. Your users need to know how to use Power BI[1], but they also need to know how to follow organizational processes and work with organizational data sources.
  2. Training material should be located as close as possible to where learners are already working – the people who need it the most will not go out of their way to look for it or to change their daily habits.
  3. There is a wealth of free Power BI training available from Microsoft (link | link | link) as well as a broad ecosystem of free and paid training from partners.

The most successful customers I work with use all of the resources that are available. Typically they will develop internal online training courses that include links to Microsoft-developed training material, Microsoft product documentation, and community-developed content, in a format and structure[2] that they develop and maintain themselves, based on their understanding of the specific needs of their data culture.

Start as small as necessary, listen and grow, and iterate as necessary. There’s no time like the present.


[1] Or whatever your self-service BI tool of choice may be – if you’re reading this blog, odds are it’s Power BI.

[2] I’m tempted to use the term “curriculum” here, but this carries extra baggage that I don’t want to include. Your training solution can be simple or complex and still be successful – a lot of this will depend on your company culture, and the needs of the learners you’re targeting.

Webcast: Patterns for adopting dataflows in Power BI

I haven’t been posting a lot about dataflows in recent months, but that doesn’t mean I love them any less. On Wednesday September 23rd, I’ll be sharing some of that love via a free webcast hosted by the Istanbul Power BI user group[1]. You can sign up here.

In this webcast I’ll be presenting practices for successfully incorporating dataflows into Power BI applications, based on my experience working with enterprise Power BI customers. If you’re interested in common patterns for success, common challenges to avoid, and answers to the most frequently asked dataflows questions, please sign up today.

This webcast won’t cover dataflows basics, so if you’re new to dataflows in Power BI or just need a refresher, please watch this tutorial before joining!


[1] In case it needs to be said, yes, the session will be delivered in English.