Important: This post was written and published in 2019, and the content below may no longer represent the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.
This is still one of the most common dataflows questions: what’s the difference between Power BI dataflows and Power BI datasets?
For the last year I have resisted tackling this question head-on. This isn’t because it’s a bad or “dumb” question. Just the opposite – this is a very simple question, and the simpler a question is, the more complex and nuanced the answer is likely to be.
If you’re a regular reader of this blog, you probably already know the answer, because I’ve answered it already. Sort of. The existing answer is distributed across dozens of posts, and if you’ve read all of them and picked up the answer along the way. But I keep hearing this question, and I keep thinking that there must be a more direct answer I could share.
Here it is, in a single, simple table.
|Power BI dataflows||Power BI datasets|
|Implementation||CDM folder||Analysis Services tabular model|
|Metadata||Common Data Model – model.json||BISM|
|Development||Power Query Online||Power Query in Power BI Desktop|
|Primary purpose||Data reuse||Data analysis|
|Reuse||Acts as data source in multiple datasets||Shared datasets across workspaces|
|Scope of reuse||Entity level reuse||Dataset level reuse|
|Mashup with other data sources||Yes||No|
|Used for reporting||Not directly||Yes|
|Reuse outside Power BI||Yes, through ADLSg2||Yes, through XMLA|
|Data access methods||Import||Import, DirectQuery|
|Connection methods||Import||Live Connection|
|Certification and promotion||Not yet||Yes|
|What else am I missing?||Please let me know!||Seriously, you should let me know.|
Update: I’ve added a few rows to the table after the post was originally published, to incorporate feedback from readers on differences I had missed. Thank you!
Each of the rows in this table could easily be an in-depth topic in and of itself, so if you’re looking at any of them and thinking “that’s not quite right” I might very well agree with you. There’s a lot of context and a lot of nuance here, and we’re trying to sum things up in a word or two… which is kind of the whole point.
Oh yeah, there’s a video too.
I can’t wait to hear what you think!
 A simple table with ten footnotes.
 The storage aspect of dataflows and datasets is one of the most significant differences between the two. Datasets use the Vertipaq column store to load data into an optimized and highly compressed in-memory representation that is optimized for analysis. Dataflows use text files in folders, which are optimized for interoperability.
 The Analysis Services Tabular engine uses the BI Semantic Model (BISM) to represent its metadata. This is a metadata model originally included in SQL Server 2012 Analysis Services, and used by the Tabular engine ever since.
 Saying “this is the primary purpose” of any complex tool is fraught with risk, because no matter what you say, there are other valid things that remain unsaid. With this said… the big gap that dataflows close is that of self-service data preparation for the purpose of data sharing and reuse. Power BI has always had self-service data preparation through Power Query, but before dataflows the data that was prepared was “locked” in a dataset, for analysis, and not for sharing or reuse.
 Once you have loaded data into dataflows, authorized users can reuse entities from multiple dataflows, and use them as the building blocks for new dataflows or new datasets. Once you have loaded data into a dataset (and published it to the Power BI service) you can enable users to connect to it.
 With dataflows, users can pick and choose the entities they want, but a dataset can only be reused as-is.
 Dataflow entities can be used as data sources in the same Power BI Desktop file as other data sources, and can serve as part of a mashup or composite model, but a dataset can only be reused as-is.
 Although you can obviously use dataflows for reporting, you do so by first importing the data from the dataflow into a dataset.
 It’s interesting to point out that using your own organizational ADLSg2 account does not require Power BI Premium, but using the XMLA endpoint to connect to Power BI datasets from non-Power BI clients does.
 You can only import data into your dataflow entities, but tables in your dataset can import data or use DirectQuery, and a dataset can use a combination of the two.
 You can only import data from a dataflow into a dataset. When connecting to a shared dataset you can only use Live Connections.
 I’ve been thinking of making videos to supplement this blog for almost as long as I’ve been hearing the question that inspired this post. Please take a moment to share your thoughts on the video. This is something of a “soft launch” and although I have plans for a few dozen more videos already, your feedback will be a main factor in how the video series evolves.
21 thoughts on “Power BIte: Dataflows vs. datasets”
Great article on a very trendy topic – thanks !
I have been thinking about Dataflows vs Datasets for a while now. We might even do a dedicated meetup on this topic at our French Power BI User Group. I would add the following items to the comparison:
– Row-Level-Security (dataflows no / datasets yes)
– Ability to add a measure on top of (dataflows yes / datasets yes)
– Ability to add columns or another source on top of (dataflows yes, datasets no [… until we have THE feature we’re all waiting for : composite models with Analysis Services as a data source or, in other words, DirectQuery for AS] )
– Ability to connect directly from Excel (dataflows no [unfortunately…when can we have this??] , datasets yes)
– Promotion & Certification (dataflows no, datasets yes)
Great article on a very trendy topic – thank you very much !
I have been thinking about Dataflows vs Datasets for a while now. We might even do a dedicated meetup on this topic at our French Power BI User Group very soon.
I would add the following items to the comparison:
– Row-Level Security: Dataflows no // Datasets yes
– Ability to add a measure (from Power BI Desktop) on top of: Dataflows yes // Datasets yes
– Ability to add a column or another data source (from Power BI Desktop) on top of: Dataflows yes // Datasets no [You cover more or less this point on 7. and 11.)
– Certification & Promotion: Dataflows no // Datasets yes
– Ability to connect directly from Excel: Dataflows no (unless you use ADLS Gen2?) // Datasets yes
I’d argue that development of data sets is not only in Power Query, but also in Power BI Desktop itself when you expand the model with measures, calculated columns etc.
Really great video, you have a knack for explaining things clearly. I’ve gone from ‘quite confused’ to ‘much more informed’ about dataflows now, thank you!
Keep the videos coming.
It’s my intent to publish a new video each week. Hopefully this will make an effective supplement to the blog.
Pingback: Power BIte: Dataflows vs. datasets — BI Polar | MS Excel | Power Pivot | DAX | SSIS |SQL
Pingback: Power BIte: Sharing and reuse with dataflows and datasets – BI Polar
As I understand if we share a dataflow – only loaded and transformed data is shared, but if we share a dataset then measures are shared as well. So if you share a dataset then all you did in DAX is shared as well.
I’m not sure if there was a question in there, but… yes?
Dataflows and datasets are fundamentally different constructs, so this type of mismatch should be expected. Datasets have concepts like measures for which there is no corresponding concept in dataflows.
It was a feedback from reader 🙂 We can add it as a difference if needed…
Keep up the great work!
Could you share a video / blog on end-to-end best practice for dataflow / dataset / report access across workspaces?
That sounds like a book, not a blog post. Can you please be more specific in what you’re looking for?
I’m thinking a distinction between:
-Shared dataset, used as a source for a report in another workspace needs to have permission granted to the dataset in order for an end user to consume it
-Shared dataflow, used as a datasource in a dataset doesn’t need any additional access for end users.
I think it’s worth discussing, even if the reason is intuitive – Imported data in a dataset has entirely new access rules, so we apply appropriate access and are not concerned with propagating the dataflow access.
What are you looking for that isn’t already covered in an existing post and video? In my mind these topics are already well covered:
What am I missing?
In your table, there could be a distinction between “Inherited Data Access”
-Dataflows don’t need to have access granted to the consumers, as the access is controlled on the dataset.
-Datasets need read permission granted for consumers of Apps that include the dataset across workspaces – App permission alone creates an unusable experience.
Again, this may be obvious and perhaps it is implied somewhere, but I think the explicit comparison may add value to the table. Or maybe it wouldn’t. 🙂
I’ll gladly update that table – thank you!
Great site and videos Matthew. I’m trying to use Power BI to query Dynamics 365 entities. I have an fetchXML file that joins entities which I’d like to use in a dataflow and refresh it daily. I see the option for xml but don’t see example on how to use it. Do I need to first ask the Power BI administrator to create an on-premise gateway? i also tried to get this to work using PBI desktop and get error messages. thanks
Your best best is to take your scenario to https://community.powerbi.com/ and ask there. I do not have a Dynamics environment with which to test.
New Data access method : Dataflows now support Direct Query as well.
Direct access requires the Enhanced Compute Engine to be enabled, so this isn’t a general feature available to all dataflows, but this is good update.