This is still one of the most common dataflows questions: what’s the difference between Power BI dataflows and Power BI datasets?
For the last year I have resisted tackling this question head-on. This isn’t because it’s a bad or “dumb” question. Just the opposite – this is a very simple question, and the simpler a question is, the more complex and nuanced the answer is likely to be.
If you’re a regular reader of this blog, you probably already know the answer, because I’ve answered it already. Sort of. The existing answer is distributed across dozens of posts, and if you’ve read all of them and picked up the answer along the way. But I keep hearing this question, and I keep thinking that there must be a more direct answer I could share.
Here it is, in a single, simple table.
|Power BI dataflows||Power BI datasets|
|Implementation||CDM folder||Analysis Services tabular model|
|Metadata||Common Data Model – model.json||BISM|
|Development||Power Query Online||Power Query in Power BI Desktop|
|Primary purpose||Data reuse||Data analysis|
|Reuse||Acts as data source in multiple datasets||Shared datasets across workspaces|
|Scope of reuse||Entity level reuse||Dataset level reuse|
|Mashup with other data sources||Yes||No|
|Used for reporting||Not directly||Yes|
|Reuse outside Power BI||Yes, through ADLSg2||Yes, through XMLA|
|Data access methods||Import||Import, DirectQuery|
|Connection methods||Import||Live Connection|
|Certification and promotion||Not yet||Yes|
|What else am I missing?||Please let me know!||Seriously, you should let me know.|
Update: I’ve added a few rows to the table after the post was originally published, to incorporate feedback from readers on differences I had missed. Thank you!
Each of the rows in this table could easily be an in-depth topic in and of itself, so if you’re looking at any of them and thinking “that’s not quite right” I might very well agree with you. There’s a lot of context and a lot of nuance here, and we’re trying to sum things up in a word or two… which is kind of the whole point.
Oh yeah, there’s a video too.
I can’t wait to hear what you think!
 A simple table with ten footnotes.
 The storage aspect of dataflows and datasets is one of the most significant differences between the two. Datasets use the Vertipaq column store to load data into an optimized and highly compressed in-memory representation that is optimized for analysis. Dataflows use text files in folders, which are optimized for interoperability.
 The Analysis Services Tabular engine uses the BI Semantic Model (BISM) to represent its metadata. This is a metadata model originally included in SQL Server 2012 Analysis Services, and used by the Tabular engine ever since.
 Saying “this is the primary purpose” of any complex tool is fraught with risk, because no matter what you say, there are other valid things that remain unsaid. With this said… the big gap that dataflows close is that of self-service data preparation for the purpose of data sharing and reuse. Power BI has always had self-service data preparation through Power Query, but before dataflows the data that was prepared was “locked” in a dataset, for analysis, and not for sharing or reuse.
 Once you have loaded data into dataflows, authorized users can reuse entities from multiple dataflows, and use them as the building blocks for new dataflows or new datasets. Once you have loaded data into a dataset (and published it to the Power BI service) you can enable users to connect to it.
 With dataflows, users can pick and choose the entities they want, but a dataset can only be reused as-is.
 Dataflow entities can be used as data sources in the same Power BI Desktop file as other data sources, and can serve as part of a mashup or composite model, but a dataset can only be reused as-is.
 Although you can obviously use dataflows for reporting, you do so by first importing the data from the dataflow into a dataset.
 It’s interesting to point out that using your own organizational ADLSg2 account does not require Power BI Premium, but using the XMLA endpoint to connect to Power BI datasets from non-Power BI clients does.
 You can only import data into your dataflow entities, but tables in your dataset can import data or use DirectQuery, and a dataset can use a combination of the two.
 You can only import data from a dataflow into a dataset. When connecting to a shared dataset you can only use Live Connections.
 I’ve been thinking of making videos to supplement this blog for almost as long as I’ve been hearing the question that inspired this post. Please take a moment to share your thoughts on the video. This is something of a “soft launch” and although I have plans for a few dozen more videos already, your feedback will be a main factor in how the video series evolves.