Dataflows enhanced compute engine – will it fold?

Important: This post was written and published in 2020, and the content below may no longer represent the current capabilities of Power BI. Please consider this post to be more of an historical record and less of a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

I’ve been seeing more questions lately about the dataflows enhanced compute engine in Power BI. Although I published a video overview and although there is feature coverage in the Power BI documentation, there are a few questions that I haven’t seen readily answered online.

A lot of these questions can be phrased as “what happens when I turn on the enhanced compute engine for my Power BI Premium capacity?”

Most of my responses start with the phrase “it depends” – so let’s look at some of the factors the answer depends on.

First, let’s look at a set of connected dataflows in a Premium workspace:

This workspace has three dataflows we’re interested in:

  1. A “staging” dataflow that contains un- or minimally-transformed data from the external data source or sources
  2. A “cleansed” dataflow that uses linked and computed entities to apply data cleansing logic to the data in the staging dataflow
  3. A final dataflow that uses linked and computed entities to transform the cleansed data into a star schema for analysis

Next, let’s look at the settings for one of these dataflows:

For a given dataflow, the owner can configure the enhanced compute engine to be Off/Disabled, On/Enabled, or “Optimized” which means that the Power BI service will turn it on or off for entities in the dataflow depending on how each entity is used by other dataflows.

This dataflow-level setting, combined with how the dataflows are used, determines whether the enhanced compute engine is enabled on entities in the dataflow:

  • Disabled: The enhanced compute engine is not enabled for any entities in the dataflow
  • Optimized: The enhanced compute engine is enabled for any entity in the dataflow that is referenced by a linked entity in another dataflow
  • On: The enhanced compute engine is enabled for all entities in the dataflow

Important point: To connect to a dataflow using DirectQuery you must explicitly set the enhanced compute engine to “On” even if entities in the dataflow are being referenced by linked entities.

Now let’s put these two factors together, considering that all three dataflows in the first image are using the default setting: Optimized.

  1. The “staging” dataflow uses the enhanced compute engine for any entities that are referenced by linked entities in another dataflow
  2. The “cleansed” dataflow uses the enhanced compute engine for any entities that are referenced by linked entities in another dataflow
  3. The final dataflow does not use the enhanced compute engine because none of its entities is referenced by linked entities in another dataflow[2]

Now we’re ready to revisit the actual question we’re trying to answer[3]: “what happens when I turn on the enhanced compute engine for my Power BI Premium capacity?”

Once the enhanced compute engine is enabled in the Power BI Premium capacity settings, and the dataflow settings and configuration (as illustrated above) dictate that the engine is used for a given dataflow, this is what happens:

  • When the dataflow is refreshed, the Power Query for each entity is executed, and the output of the query is persisted in the dataflow’s CDM folder as CSV data and JSON metadata
  • For any entity for which the enhanced compute engine is enabled, the output of the entity’s Power Query is also loaded into a table in a SQL database instance managed by the Power BI service

This second bullet is where the magic happens. Because the data is now in a storage format that includes a compute engine, supported queries can use the SQL cache instead of the underlying CSV files, and get the increased performance that comes with query folding.

Having the data in SQL also means that the dataflow can server as a DirectQuery data source – without the enhanced compute engine a dataflow can only be used as an Import mode data source.

The next logical question is “what exactly do you mean by supported queries?”

These queries are supported, meaning that they will use the SQL data if the enhanced compute engine is enabled for a dataflow:

  • Dataflow refresh against a dataflow with the enhanced compute engine enabled – for example, the “Cleansed” dataflow or “Final” dataflow in the image above
  • Authoring in Power BI Desktop, when using DirectQuery mode
  • User activity in the BI service, when using DirectQuery mode

These queries are not supported, and will always use the CSV data even if the enhanced compute engine is enabled for a dataflow:

  • Editing a dataflow in Power Query Online
  • Authoring in Power BI Desktop, when using Import mode
  • Dataset refresh in the Power BI service, when using Import mode

The final question[4] is “what Power Query operations will actually fold against the SQL data and take full advantage of the compute capabilities?”

The answer is in this blog post from Cristian Petculescu, the architect of dataflows and much of Power BI. Cristian enumerates well over 100 M constructs and the SQL to which they fold so I’m not going to try to summarize them all here. Go take a look for yourself if you need more information than what’s in this post.

Was this helpful? Did you learn anything new?

If you have any more questions about the enhanced compute engine, please let me know!


[1] This pattern of separating dataflows based on the logical role of the data preparation logic they implement is a general best practice, in addition to aligning well with dataflows best practices.

[2] If we wanted the enhanced compute engine to be used for this final dataflow even though it is not referenced by any linked entities, we would need to change the setting from “Optimized” to “On.”

[3] Did you still remember what the question was? I copied it here because I’d forgotten and was concerned you might have too. It’s September 2020 and the world is on fire, which can make it surprisingly difficult to think…

[4] Which almost no one ever actually asks me, but which fits into the theme of the post, so I’m including it here for completeness.

15 thoughts on “Dataflows enhanced compute engine – will it fold?

  1. Pingback: Query Folding and the Power BI Dataflows Enhanced Compute Engine – Curated SQL

  2. Hananto

    Hi Matthew,

    thank you for this insightful information. Last time I read your article about DataFlows vs. Shared Dataset: https://ssbipolar.com/2019/10/14/power-bite-dataflows-vs-datasets/.

    Since my data source (AWS Redshift) has million of records and sadly does not support incremental refresh on the Dataset directly, I used an incremental refresh on the DataFlow. It worked fine as the query folding took place. Then I tried to set up an incremental refresh on the Dataset based on the DataFlow and the query is not folded because the data from the DataFlow is stored as flat file.

    I thought with DataFlow via DirectQuery (with enhanced compute engine) might solve this issue. However – as you mentioned in the article – “Dataset refresh in the Power BI service, when using Import mode” can’t utilize Query Folding.

    Is it somehow still possible to find a solution / workaround for this issue? or should I consider moving the data to a different data source which supports query folding ( + incremental refresh ) natively like SQL Server / SQL Synapse?

    Thanks a lot in advance

    Best regards,

    Hananto

    Like

    1. Hi Hananto – As of today there is no timeline I can share for when dataset refresh might be able to take advantage of the dataflows compute engine, you may want to look at a solution like Azure Synapse Analytics.

      Like

  3. Pingback: Power BI Dataflows FAQ – BI Polar

  4. Pingback: DirectQuery and dataflows in Power BI – BI Polar

  5. derek

    Hi,

    are you sure that when Enhanced Compute Engine configuration: Optimized, that ONLY those entities that are linked in downstream dataflows are added to the ECE memory? or ALL entities in that dataflow? (which sounds like the situation when the dataflow configuration for ECE is turned to: On?

    Like

  6. derek

    is the enhanced compute engine also used WITHIN dataflows? or only BETWEEN.

    within dataflows, i have some layering (linked entity 1, computed entity 2, and a computed entity 3 that is on top of entity 2.).

    if ECE or some other optimized compute is not used within one dataflow, then it would be better to add another dataflow for entitity 3.

    Like

  7. Nicolas Straw

    Hi Matthew,

    do you know a way to check if Query Folding is really happening ? From a dataflow to another dataflow with ECE On ? There is a “View Data Source Query” item when right-click on a step but it never worked for me… I do some Filter, Group By, Merge operations and I would like to be sure that they are folded to the SQL cache of the dataflow.

    Thanks again for your help and the great insight you provide to the dataflow community 🙂

    Nicolas

    Liked by 1 person

  8. Pingback: Chris Webb's BI Blog: Testing The Performance Of Importing Data From ADLSgen2 Common Data Model Folders In Power BI Chris Webb's BI Blog

  9. Wouter

    Hi Matthew,

    Do you know if ECE and query folding is available with a premium per user license?

    I have 2 dataflows. Dataflow 1 is getting data from the source and putting it on Data Lake. ECE is configured as “ON”. Dataflow 2 got linked entities from DF1 which I use for building a new computed entitie with transformations. I seems like query folding is not working when I look to the new folding indicators.

    Wouter

    Like

    1. Apologies for the delayed response – I’m just starting to surface following an extended heads-down period.

      Yes, the enhanced compute engine is supported for Premium Per User. I would expect this to work.

      Like

  10. Pingback: Chris Webb's BI Blog: Power BI Dataflow Performance, Premium Per User And The Enhanced Compute Engine Chris Webb's BI Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s