Refreshing only upstream dataflows

tl;dr: If you want to refresh a dataflow without refreshing any downstream dataflows that depend on it, just clear the “Enable Load” setting from any linked entities that reference it. This will remove the dependencies that the Power BI service looks for, without breaking any of the downstream dataflow logic. Win!

At the end of a long day I got an email asking for help[1]. The email included a screen shot from the lineage view of a Power BI workspace, some context about working to troubleshoot a problem, and the question “We want to refresh this dataflow, and not have it refresh the downstream dataflows. Is this possible?”

I almost said no, and then I remembered this post and realized the answer was “yes, sort of.”

Composable ETL with linked and computed entities ensure that when an upstream dataflow is refreshed, any downstream dataflow will automatically refresh as well – all without any explicit configuration required. The dataflows engine in the Power BI service just looks at the dependencies and everything just works.[2]

This is great, until you need it to work differently, and the “no explicit configuration” also means “no real way to turn it off.” So long as there are linked entities referencing entities in the current dataflow, refreshing the current dataflow will cause the dataflows containing those linked entities to refresh as well.

Fortunately, that 2018-era blog post illustrates that clearing the “enable load” setting for a linked entity also clears the metadata that the dataflows engine looks at to build the dependencies graph used for composable ETL.

So I send off a quick reply, “Try this – I haven’t tested it end to end, but it should work,” and gave it a quick test because at this point I was curious.

This was my starting point: three dataflows, all related.

When I refresh the first dataflow in the chain, the next two refresh as expected.

To break that refresh chain I can just edit the second dataflow.

The only change I need to make is this one: clearing that “Enable load” check box.

Once this is done, the lineage view looks a little more like this, and I can refresh the upstream dataflow in isolation.

Once the troubleshooting is done, all I need to do to get back to work is to re-enable load on those temporarily disabled linked entities. Boom!

I doubt this technique will get used all that often, but it looks like it worked today. As I was finishing up this post I got an email confirmation that this solved the problem it needed to solve, and the person who had emailed asking for help is now unblocked.

Life is good.

Look, he’s emailing again, probably offering to buy me a coffee or a nice new sword to say thank you. Oh, he wants to show me other problems he’s run into, not offering to buy me anything at all.

Oh.


[1] Typically I respond to emails asking for help with a perfunctory redirection to another location, with a statement about how my schedule and responsibilities don’t allow me to scale by replying to one-off email support requests. The person sending this particular mail got a pass because they have consistently gone out of their way to help the Power BI community, and because they asked an interesting question with all of the information I needed to respond. It also helped that the email came in at the end of the day when my brain was too burned to start the next tasks on my list. I would definitely not recommend trying to use email me asking for help. Definitely not.

[2] Because of cloud magic, I think. Maybe metadata too, but isn’t metadata a kind of magic as well?

Dataflows enhanced compute engine – will it fold?

I’ve been seeing more questions lately about the dataflows enhanced compute engine in Power BI. Although I published a video overview and although there is feature coverage in the Power BI documentation, there are a few questions that I haven’t seen readily answered online.

A lot of these questions can be phrased as “what happens when I turn on the enhanced compute engine for my Power BI Premium capacity?”

Most of my responses start with the phrase “it depends” – so let’s look at some of the factors the answer depends on.

First, let’s look at a set of connected dataflows in a Premium workspace:

This workspace has three dataflows we’re interested in:

  1. A “staging” dataflow that contains un- or minimally-transformed data from the external data source or sources
  2. A “cleansed” dataflow that uses linked and computed entities to apply data cleansing logic to the data in the staging dataflow
  3. A final dataflow that uses linked and computed entities to transform the cleansed data into a star schema for analysis

Next, let’s look at the settings for one of these dataflows:

For a given dataflow, the owner can configure the enhanced compute engine to be Off/Disabled, On/Enabled, or “Optimized” which means that the Power BI service will turn it on or off for entities in the dataflow depending on how each entity is used by other dataflows.

This dataflow-level setting, combined with how the dataflows are used, determines whether the enhanced compute engine is enabled on entities in the dataflow:

  • Disabled: The enhanced compute engine is not enabled for any entities in the dataflow
  • Optimized: The enhanced compute engine is enabled for any entity in the dataflow that is referenced by a linked entity in another dataflow
  • On: The enhanced compute engine is enabled for all entities in the dataflow

Important point: To connect to a dataflow using DirectQuery you must explicitly set the enhanced compute engine to “On” even if entities in the dataflow are being referenced by linked entities.

Now let’s put these two factors together, considering that all three dataflows in the first image are using the default setting: Optimized.

  1. The “staging” dataflow uses the enhanced compute engine for any entities that are referenced by linked entities in another dataflow
  2. The “cleansed” dataflow uses the enhanced compute engine for any entities that are referenced by linked entities in another dataflow
  3. The final dataflow does not use the enhanced compute engine because none of its entities is referenced by linked entities in another dataflow[2]

Now we’re ready to revisit the actual question we’re trying to answer[3]: “what happens when I turn on the enhanced compute engine for my Power BI Premium capacity?”

Once the enhanced compute engine is enabled in the Power BI Premium capacity settings, and the dataflow settings and configuration (as illustrated above) dictate that the engine is used for a given dataflow, this is what happens:

  • When the dataflow is refreshed, the Power Query for each entity is executed, and the output of the query is persisted in the dataflow’s CDM folder as CSV data and JSON metadata
  • For any entity for which the enhanced compute engine is enabled, the output of the entity’s Power Query is also loaded into a table in a SQL database instance managed by the Power BI service

This second bullet is where the magic happens. Because the data is now in a storage format that includes a compute engine, supported queries can use the SQL cache instead of the underlying CSV files, and get the increased performance that comes with query folding.

Having the data in SQL also means that the dataflow can server as a DirectQuery data source – without the enhanced compute engine a dataflow can only be used as an Import mode data source.

The next logical question is “what exactly do you mean by supported queries?”

These queries are supported, meaning that they will use the SQL data if the enhanced compute engine is enabled for a dataflow:

  • Dataflow refresh against a dataflow with the enhanced compute engine enabled – for example, the “Cleansed” dataflow or “Final” dataflow in the image above
  • Authoring in Power BI Desktop, when using DirectQuery mode
  • User activity in the BI service, when using DirectQuery mode

These queries are not supported, and will always use the CSV data even if the enhanced compute engine is enabled for a dataflow:

  • Editing a dataflow in Power Query Online
  • Authoring in Power BI Desktop, when using Import mode
  • Dataset refresh in the Power BI service, when using Import mode

The final question[4] is “what Power Query operations will actually fold against the SQL data and take full advantage of the compute capabilities?”

The answer is in this blog post from Cristian Petculescu, the architect of dataflows and much of Power BI. Cristian enumerates well over 100 M constructs and the SQL to which they fold so I’m not going to try to summarize them all here. Go take a look for yourself if you need more information than what’s in this post.

Was this helpful? Did you learn anything new?

If you have any more questions about the enhanced compute engine, please let me know!


[1] This pattern of separating dataflows based on the logical role of the data preparation logic they implement is a general best practice, in addition to aligning well with dataflows best practices.

[2] If we wanted the enhanced compute engine to be used for this final dataflow even though it is not referenced by any linked entities, we would need to change the setting from “Optimized” to “On.”

[3] Did you still remember what the question was? I copied it here because I’d forgotten and was concerned you might have too. It’s September 2020 and the world is on fire, which can make it surprisingly difficult to think…

[4] Which almost no one ever actually asks me, but which fits into the theme of the post, so I’m including it here for completeness.

Data Culture: Now you’re thinking with portal

In an ideal world, everyone knows where to find the resources and tools they need to be successful.

We don’t live in that world.

I’m not even sure we can see that world from here. But if we could see it, we’d be seeing it through a portal[1].

One of the most common themes from my conversations with enterprise Power BI customers is that organizations that are successfully building and growing their data cultures have implemented portals where they share the resources, tools, and information that their users need. These mature companies also treat their portal as a true priority – the portal is a key part of their strategy, not an afterthought.

This is why:

In every organization of non-trivial size there are obstacles that keep people from finding and using the resources, information, and data they need.

Much of the time people don’t know what they need, nor do they know what’s available. They don’t know what questions to ask[2], much less know where to go to get the answers. This isn’t their fault – it’s a natural consequence of working in a complex environment that changes over time on many different dimensions.

As I try to do in these accompanying-the-video blog posts I will let the video speak for itself, but there are a few key points I want to emphasize here as well.

  1. You need a place where people can go for all of the resources created and curated by your center of excellence
  2. You need to engage with your community of practice to ensure that you’re providing the resources they need, and not just the resources you think they need
  3. You need to keep directing users to the portal, again and again and again, until it becomes habit and they start to refer their peers

The last point is worth emphasizing and explaining. If community members don’t use the portal, it won’t do what you need it to do, and you won’t get the return you need on your investments.

Users will continue to use traditional “known good” channels to get information – such as sending you an email or IM – if you let them. You need to not let them.


[1] See what I did there?

[2] Even though they will often argue vehemently against this fact.

Standardizing on Power BI

Last week’s post on migrating to Power BI was intended to be a stand-alone post, but this excellent comment from Matthew Choo via LinkedIn made me realize I had more to say.

I could not agree more, Matthew![1]

In my conversations with leaders from enterprise Power BI customers, if they mention to me that they’re standardizing on Power BI as their analytics tool and platform, I try to ask two questions:

  1. Why?
  2. Why now?

The answer to the first question is almost always a combination of reasons from the previous post on migration and the fact that the customer sees Power BI as the best choice for the organization’s needs moving forward. There’s very little variation in the answers over dozens of conversations.

The answers to the second question are more diverse, but there are some common themes that I hear again and again.

One theme is that the selection of multiple tools happened organically. Sometimes a new BI tool is adopted through a merger or acquisition. Sometimes a new CIO or other senior leader mandates the adoption of their favorite tool, and as leaders change the old tools are left behind while the new tools are added to the stable. Sometimes a key data source only works well with the data source vendor’s reporting tool because the vendor refuses to open their APIs to 3rd parties.[3] Often the plan has been to eliminate excess tools at some point, but the point hasn’t been now… until now.

Very often the factor that makes standardization a priority is pain, which brings me back to a point I made in the introductory post in the “building a data culture” series:

I strongly believe that pain is a fundamental precursor to significant change. If there is no pain, there is no motivation to change. Only when the pain of not changing exceeds the perceived pain of going through the change will most people and organizations consider giving up the status quo.

The most common general reasons for this “pain balance” shifting[4] involve money. Organizations need to eliminate inefficiencies so they can invest in areas of strategic importance. A leader may proactively seek out inefficiencies to eliminate, but it’s more typical for me to hear about external market pressures[5] necessitating a more reactive consolidation.

The other common theme in responses to the question “why now?” is that the organization has a new chief data officer, that the CDO is focused on building a data culture, and for the reasons listed in last week’s post has has made consolidation a priority.

What’s interesting about this theme is that the hiring of a CDO is part of a larger strategic shift in how the organization thinks about data. The C-level executives know that they need to start treating data as a strategic asset, and they realize that it’s too important to be rolled up into the responsibilities of the CIO or another existing leader. Very often, they already have a good idea of the types of changes that need to happen, but want to hire a senior leader who will make the final decisions and own the actions and their outcomes. In other words, the hiring of a CDO is often an early-but-lagging indicator that there’s executive support for a data culture. That’s a good thing.

Before making a big change always important to understand what you hope to achieve, but it’s also important to take a little time to examine how you got to the point where change is necessary, so you can better avoid the mistakes of the past…


[1] Usually when I say this I’m agreeing with myself[2] so it’s nice to be referring to a different Matthew here.

[2] Not that i agree with myself very often, mind you.

[3] Yeah, you know who I’m talking about.

[4] I wasn’t expecting to coin this phrase, but as soon as I typed it, I loved it. I think I may quit my day job and start a new business as a consultant delivering expensive “Shifting the Pain Balance(tm)” workshops for chief data officers and other senior executives.

[5] Since it’s still 2020, I should point out that the COVID-19 has been at the root of many external market pressures. I’ve heard dozens of companies say that they’re increasing their investment in building a data culture because of pandemic-induced challenges.

Data Culture: Showcasing the Art of the Possible

The last post and video in this series looked at the broad topic of training. This post looks at as specific aspect of this topic: letting people know what is possible, and sparking their imagination to do amazing things.

A lot of content and training materials will focus on capabilities: here is a feature, this is what it does, and this is how you use it. Although this type of content is important, it isn’t enough on its own to accelerate the growth of a data culture.

The most successful organizations I’ve worked with have included in their community of practice content specifically targeting the art of the possible. This might be a monthly presentation by community champions across the business. It might be someone from the center of excellence highlighting new features, or the integration between features and tools. The most important thing is planting the seed of an idea in the minds of people who will say “I had no idea you could do that!”

My colleagues Miguel and Chris are some of my greatest personal sources of inspiration for building reports[1] because each of them does amazing things with Power BI that make it powerful, usable, and beautiful – but they’re just two of the many people out there showing me new techniques and possibilities.

Who will you inspire today?


[1] And by now you probably realize that I need all the inspiration I can get for anything related to data visualization.

Migrating to Power BI

One aspect of building a data culture is selecting the right tools for the job. If you want more people working with more data, giving the tools they need to do that work is an obvious[1] requirement. But how many tools do you need, and which tools are the right tools?

Migrating to the cloud

It should be equally obvious that the answer is “it depends.” This is the answer to practically every interesting question. The right tools for an organization depend on the data sources it uses, the people who work with that data, the history that has gotten the organization to the current decision point, and the goals the organization needs to achieve or enable with the tools it selects.

With that said, it’s increasingly common[2] to see large organizations actively working to reduce the number of BI tools they support[3]. The reasons for this move to standardization are often the same:

  • Reduce licensing costs
  • Reduce support costs
  • Reduce training costs
  • Reduce friction involved in driving the behaviors needed to build and grow a data culture

Other than reducing the licensing costs[4], most of these motivations revolve around simplification. Having fewer tools means learning and using fewer tools. It means everyone learning and using fewer tools, which often results in less time and money spent to get more value from the use of those tools.

One of the challenges in eliminating a BI tool is ensuring that the purpose that tool fulfilled is now effectively fulfilled by the tool that replaces it. This is where migration comes in.

The Power BI team at Microsoft has published a focused set of guidance articles focused specifically on migrating from other BI tools to Power BI.

This documentation was written by the inestimable Melissa Coates of Coates Data Strategies, with input and technical review by the Power BI customer advisory team. If you’re preparing to retire another BI tool and move its workload to Power BI – or if you’re wondering where to start – I can’t recommend it highly enough.


[1] If this isn’t obvious to a given organization or individual, I’m reasonably confident that they’re not actively trying to build a data culture, and not reading this blog.

[2] I’m not a market analyst but I do get to talk to BI, data, and analytics leaders at large companies around the world, and I suspect that my sample size is large and diverse enough to be meaningful.

[3] I’m using the word “support” here – and not “use” – deliberately. It’s also quite common to see companies remove internal IT support from deprecated BI tools, but also let individual business units continue to use them – but also to pay for the tools and support out of their own budgets. This is typically a way to allow reluctant “laggard” internal customer groups to align with the strategic direction, but to do it on their own schedules.

[4] I’m pretty consistent in saying I don’t know anything about licensing, but even I understand that paying for two things costs more than paying for one of those things.

Data Culture: Training for the Community of Practice

The last few posts and videos in this series have introduced the importance of a community where your data culture can grow, and ways to help motivate members of the community, so your data culture can thrive.

But what about training? How do we give people the skills, knowledge, and guidance that they need before they are able do work with data and participate in the data culture you need them to help build?

Training is a key aspect of any successful data culture, but it isn’t always recognized as a priority. In fact the opposite is often true.

I’ve worked in tech long enough, and have spent enough of that time close to training to know that training budgets are often among the first things cut during an economic downturn. These short-term savings often produce long-term costs that could be avoided, and more mature organizations are beginning to realize this.

In my conversations with enterprise Power BI customers this year, I’ve noticed a trend emerging. When I ask how the COVID-19 pandemic is affecting how they work with data, I hear “we’re accelerating our efforts around self-service BI and building a data culture because we know this is now more important than ever” a lot more than I hear “we’re cutting back on training to save money.” There’s also a clear correlation between the maturity of the organizations I’m talking with and the response I get. Successful data cultures understand the value of training.

I’ll let the video speak for itself, but I do want to call out a few key points:

  1. Training on tools is necessary, but it isn’t enough. Your users need to know how to use Power BI[1], but they also need to know how to follow organizational processes and work with organizational data sources.
  2. Training material should be located as close as possible to where learners are already working – the people who need it the most will not go out of their way to look for it or to change their daily habits.
  3. There is a wealth of free Power BI training available from Microsoft (link | link | link) as well as a broad ecosystem of free and paid training from partners.

The most successful customers I work with use all of the resources that are available. Typically they will develop internal online training courses that include links to Microsoft-developed training material, Microsoft product documentation, and community-developed content, in a format and structure[2] that they develop and maintain themselves, based on their understanding of the specific needs of their data culture.

Start as small as necessary, listen and grow, and iterate as necessary. There’s no time like the present.


[1] Or whatever your self-service BI tool of choice may be – if you’re reading this blog, odds are it’s Power BI.

[2] I’m tempted to use the term “curriculum” here, but this carries extra baggage that I don’t want to include. Your training solution can be simple or complex and still be successful – a lot of this will depend on your company culture, and the needs of the learners you’re targeting.

Webcast: Patterns for adopting dataflows in Power BI

I haven’t been posting a lot about dataflows in recent months, but that doesn’t mean I love them any less. On Wednesday September 23rd, I’ll be sharing some of that love via a free webcast hosted by the Istanbul Power BI user group[1]. You can sign up here.

In this webcast I’ll be presenting practices for successfully incorporating dataflows into Power BI applications, based on my experience working with enterprise Power BI customers. If you’re interested in common patterns for success, common challenges to avoid, and answers to the most frequently asked dataflows questions, please sign up today.

This webcast won’t cover dataflows basics, so if you’re new to dataflows in Power BI or just need a refresher, please watch this tutorial before joining!


[1] In case it needs to be said, yes, the session will be delivered in English.

Data Culture: Motivation and Encouragement

The last post in our ongoing series on building a data culture focused on the importance of community, and on ways organizations can create and promote successful communities around data. But while a community is where the data culture can grow, how can you motivate people to participate, to contribute, and to be part of that growth?

Business intelligence is more about business than it is about technology, and business is really about people. Despite this, many BI professionals focus their learning largely on the technology – not the people.

Do you remember the first time you were involved in a performance tuning and optimization effort? The learning process involved looking at the different parts of the tech stack, and in understanding what each part does, how it does it, and how it relates to all of the other parts. Only when you understood these “internals” could you start looking at applying your knowledge to optimizing a specific query or workload.

You need to know how a system works before you can make it work for you. This is true of human systems too.

This video[1] looks at motivation in the workplace, and how you can motivate the citizen analysts in your data culture to help it – and them – grow and thrive. If you think about these techniques as “performance tuning and optimization” for the human components in a data culture, you’re on the right track.

This image makes a lot more sense after you’ve watched the video – I promise

People are motivated by extrinsic motivators (doing something to get rewards) and intrinsic motivators (doing something because doing it makes them happy)[2], and while it’s important to understand both types of motivators, it’s the intrinsic motivators that are more likely to be interesting – and that’s where we spend the most time in the video.

When you’re done with the video, you probably want to take a moment to read this Psychology Today article, and maybe not stop there. People are complicated, and if you’re working to build a data culture, you need to understand how you can make people more likely to want to participate. Even with an engaged executive sponsor, it can be difficult to drive personal change.

In my personal experience, task identity and task significance are the biggest success factors when motivating people to contribute in a data culture. If someone knows that their work is a vital part of an important strategic effort, and if they know that their work makes other people’s lives better, they’re more likely to go the extra mile, and to willingly change their daily habits. That’s a big deal.


[1] If you’re not old enough to recognize the opening line in the video, please take a moment to appreciate how awesome commercials were in the 1980s.

[2] Yes, I’m oversimplifying.

Data Culture: The Importance of Community

The last two videos  in our series on building a data culture covered different aspects of  how business and IT stakeholders can partner and collaborate to achieve the goals of the data culture. One video focused on the roles and responsibilities of each group, and one focused on the fact that you can’t treat all data as equal. Each of these videos builds on the series introduction, where we presented core concepts about cultures in general, and data culture in particular.

Today’s video takes a closer look at where much of that business/IT collaboration takes place – in a community.

Having a common community space – virtual, physical, or both – where your data culture can thrive is an important factor in determining success. In my work with global enterprise Power BI customers, when I hear about increasing usage and business value, I invariably hear about a vibrant, active community. When I hear about a central BI team or a business group that is struggling, and I ask about a community, I usually hear that this is something they want to do, but never seem to get around to prioritizing.

Community is important.[1]

woman-1594711

A successful data culture lets IT do what IT does well, and enables business to focus on solving their problems themselves… but sometimes folks on both sides of this partnership need help. Where do they find it, and who provides that help?

This is where the community comes in. A successful community brings together people with questions and people with the answer to these questions. A successful community recognizes and motivates people who share their knowledge, and encourages people to increase their own knowledge and to share it as well.

Unfortunately, many organizations overlook this vital aspect of the data culture. It’s not really something IT traditionally owns, and it’s not really something business can run on their own, and sometimes it falls through the cracks[2] because it’s not part of how organizations think about solving problems.

If you’re part of your organization’s journey to build and grow a data culture and you’re not making the progress you want, look more closely at how you’re running your community. If you look online you’ll find lots of resources that can give you inspiration and ideas, anything from community-building ideas for educators[3] to tips for creating a corporate community of practice.


[1] Really important. Really really.

[2] This is a pattern you will likely notice in other complex problem spaces as well: the most interesting challenges come not within a problem domain, but at the overlap or intersection of related problem domains. If you haven’t noticed it already, I suspect you’ll start to notice it now. That’s the value (or curse) of reading the footnotes.

[3] You may be surprised at how many of these tips are applicable to the workplace as well. Or you may not be surprised, since some workplaces feel a lot like middle school sometimes…