Important: This post was written and published in 2019, and the content below may no longer represent the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.
I received today what is becoming a common question:
I have a customer who is using dataflows to populate a 255M row fact table, refreshing the last 2 months of data (which is working) but experiencing issues when trying to connect to that Dataflow using Power BI Desktop. Suggestions?
Power BI dataflows are an import-only data source when being used from Power BI Desktop. This means that if you’re building a model in Power BI Desktop and are using dataflows as a source, all of the data from the entities you select will be downloaded from Azure and loaded into your data model.
All of it.

To exacerbate the challenge, dataflows – and the CDM folders on which they’re built – do not support query folding, so even if you filter the records in your query, that filter will be applied in Power BI Desktop, not at the sources.
If one of these entities includes very wide records and/or a very large number of records, this can result in a slow and frustrating experience in Power BI Desktop, and in some circumstances can cause the data load to fail. So what can you do?
My preferred approach is to use a pattern I’ve long used when working with other data sources[1] that don’t support DirectQuery or query folding: work with a subset of the data in Power BI Desktop, and then work with the full data set after you publish to the Power BI service.
For dataflows, it looks like this:
- Develop and test the dataflows in the Power BI service
- Add filters to the too-large dataflow entities’ queries[2] to reduce the number of records in the entities
- Refresh the dataflow
- Develop the model and reports in Power BI Desktop
- Publish the model and reports to the Power BI service
- Remove the filters from the filtered dataflow entities
- Refresh the dataflow
- Refresh the dataset
The dataflows team announced at MBAS in June that the enhanced compute engine currently in preview will be enable DirectQuery access and query folding when using dataflows from Power BI Desktop at some point – but that point is not today. Until then, I hope this quick pattern will help…
[1] Like large folders full of large files.
[2] This is typically just the one or two largest entities, not all of them.
Pingback: Dataflows in Power BI – BI Polar
Thanks Matthew. This wont work if the dataflow in question is already being used by other datasets that are now “Production”. Any ideas for that scenario?
LikeLike
That’s definitely more complicated. For that you would probably need to build a dataflow with linked/computed entities that include the filters, develop in Desktop using those filtered entities as the source, and then manually edit the M script in for the queries in Desktop before publishing to the service.
The general approach is similar, but it’s definitely more step intensive…
LikeLike
This seems like a great place to employ parameters. The M script to reference a dataflow is mostly GUIDs, but the last step is something like this:
FactTable = #”{GUID}”{[entity=”FactTable”]}[Data]
if you replace “FactTable” with a parameter you can switch between two different entities within the same Dataflow. You could probably even switch between workspaces and dataflows by parameterizing the GUIDs, but I’m not an expert in that.
FactTable = #”{GUID}”{[entity=Parameter]}[Data]
So what you’d do is have your Fact Table Dataflow have both a production and a filtered Dev Fact Table with a parameter that holds the name of both tables. This makes switching the .pbix back and forth between the dev and the full fact table even easier.
I feel like you should even be able to configure this parameter AFTER publication in the power bi service so you never have to do a full local refresh. However, when I tested this, the parameter was greyed out to prevent editing in the service. Apparently you can only ” edit some parameter settings in the Power BI service — not parameters that restrict the available data, but parameters that define and describe acceptable values.” https://docs.microsoft.com/en-us/power-bi/service-parameters#review-and-edit-parameters-in-power-bi-service
LikeLiked by 1 person
Thank you for the tip Matthew! I was wondering when ‘import’ mode starts to be an issue & how to work around it.
LikeLike
This is still an edge case, but as more and more projects use dataflows for larger and larger data volumes, it will become more common…
LikeLike
I’m just getting started with Dataflows in Power BI Premium. In addition to my “SourceData_Ingestion” Dataflows, I create a non-linked Dataflow with simply:
let
Source = #”SourceData_Ingestion”,
#”Kept top rows” = Table.FirstN(Source, 10000)
in
#”Kept top rows”
And save as “SourceData_Ingestion_Sample” to enable easier modeling in Power BI Desktop.
Users use this Dataflow when exploring or modeling, and then simply remove “_Sample” from the Power Query M when they’re ready to leverage the whole set.
LikeLike
This is a great refinement of that general pattern – thank you for sharing!
LikeLiked by 1 person
Matthew,
If you have the output of dataflows in a dedicated area in ADLSgen2, a model.json file with metadata is created to link to the latest snapshot of your data.
Is it possible to approach this latest data via de json meta data file with Azure Databricks or Factory to do some ML or data science with it?
René Imthorn
LikeLike
Thanks for the question – yes you can.
Because dataflows are writing data to ADLSg2 in CDM folder format, Azure Machine Learning and Azure Databricks can both read the data using the metadata in the model.json file.
Check out this page for a tutorial and sample code: https://azure.microsoft.com/en-us/resources/samples/cdm-azure-data-services-integration/
LikeLike
Matthew, this is a great approach. But sometimes when you edit a dataflow removing filters an error message appears saying that data refresh is more than 2 hours and the refresh fails. Is there any workaround for it to?
Thank you in advance!
LikeLike
There are lots of factors that would go into an error – the best approach is to open a support ticket.
LikeLike
Pingback: Power BI dataflows and query folding – BI Polar
I didn’t catch when listening on the MBSA video that also PBI Desktop was to be supported, but this was good news. It then looks like PBI Dataflows are to be backed-up by Spark clusters the same way as Azure Data Factory, any confirmation available on this?
LikeLike
I can’t speak to any possible future functionality unless it is mentioned on the Power BI release plan, or is otherwise mentioned publicly by the folks who own the Power BI roadmap…
LikeLike
Pingback: Power BI dataflows November updates – BI Polar
I like the subtext of the image 😉
LikeLike