Power BIte: Dataflows vs. datasets

This is still one of the most common dataflows questions: what’s the difference between Power BI dataflows and Power BI datasets?

For the last year I have resisted tackling this question head-on. This isn’t because it’s a bad or “dumb” question. Just the opposite – this is a very simple question, and the simpler a question is, the more complex and nuanced the answer is likely to be.

See how complex this is?
A graphical representation of the answer’s likely complexity.

If you’re a regular reader of this blog, you probably already know the answer, because I’ve answered it already. Sort of. The existing answer is distributed across dozens of posts, and if you’ve read all of them and picked up the answer along the way. But I keep hearing this question, and I keep thinking that there must be a more direct answer I could share.

Here it is, in a single, simple table[1].

Power BI dataflows Power BI datasets
Implementation CDM folder Analysis Services tabular model
Storage[2] CSV files Vertipaq
Metadata[3] Common Data Model – model.json BISM
Development Power Query Online Power Query in Power BI Desktop
Primary purpose[4] Data reuse Data analysis
Reuse[5] Acts as data source in multiple datasets Shared datasets across workspaces
Scope of reuse[6] Entity level reuse Dataset level reuse
Mashup with other data sources[7] Yes No
Used for reporting[8] Not directly Yes
Reuse outside Power BI[9] Yes, through ADLSg2 Yes, through XMLA
Data access methods[10] Import Import, DirectQuery
Connection methods[11] Import Live Connection
Row-level security No Yes
Certification and promotion Not yet Yes
What else am I missing? Please let me know! Seriously, you should let me know.

Update: I’ve added a few rows to the table after the post was originally published, to incorporate feedback from readers on differences I had missed. Thank you!

Each of the rows in this table could easily be an in-depth topic in and of itself, so if you’re looking at any of them and thinking “that’s not quite right” I might very well agree with you. There’s a lot of context and a lot of nuance here, and we’re trying to sum things up in a word or two… which is kind of the whole point.

Oh yeah, there’s a video too.[12]

I can’t wait to hear what you think!


[1] A simple table with ten footnotes.

[2] The storage aspect of dataflows and datasets is one of the most significant differences between the two. Datasets use the Vertipaq column store to load data into an optimized and highly compressed in-memory representation that is optimized for analysis. Dataflows use text files in folders, which are optimized for interoperability.

[3] The Analysis Services Tabular engine uses the BI Semantic Model (BISM) to represent its metadata. This is a metadata model originally included in SQL Server 2012 Analysis Services, and used by the Tabular engine ever since.

[4] Saying “this is the primary purpose” of any complex tool is fraught with risk, because no matter what you say, there are other valid things that remain unsaid. With this said… the big gap that dataflows close is that of self-service data preparation for the purpose of data sharing and reuse. Power BI has always had self-service data preparation through Power Query, but before dataflows the data that was prepared was “locked” in a dataset, for analysis, and not for sharing or reuse.

[5] Once you have loaded data into dataflows, authorized users can reuse entities from multiple dataflows, and use them as the building blocks for new dataflows or new datasets. Once you have loaded data into a dataset (and published it to the Power BI service) you can enable users to connect to it.

[6] With dataflows, users can pick and choose the entities they want, but a dataset can only be reused as-is.

[7] Dataflow entities can be used as data sources in the same Power BI Desktop file as other data sources, and can serve as part of a mashup or composite model, but a dataset can only be reused as-is.

[8] Although you can obviously use dataflows for reporting, you do so by first importing the data from the dataflow into a dataset.

[9] It’s interesting to point out that using your own organizational ADLSg2 account does not require Power BI Premium, but using the XMLA endpoint to connect to Power BI datasets from non-Power BI clients does.

[10] You can only import data into your dataflow entities, but tables in your dataset can import data or use DirectQuery, and a dataset can use a combination of the two.

[11] You can only import data from a dataflow into a dataset. When connecting to a shared dataset you can only use Live Connections.

[12] I’ve been thinking of making videos to supplement this blog for almost as long as I’ve been hearing the question that inspired this post. Please take a moment to share your thoughts on the video. This is something of a “soft launch” and although I have plans for a few dozen more videos already, your feedback will be a main factor in how the video series evolves.

6 thoughts on “Power BIte: Dataflows vs. datasets

  1. Great article on a very trendy topic – thank you very much !
    I have been thinking about Dataflows vs Datasets for a while now. We might even do a dedicated meetup on this topic at our French Power BI User Group very soon.
    I would add the following items to the comparison:
    – Row-Level Security: Dataflows no // Datasets yes
    – Ability to add a measure (from Power BI Desktop) on top of: Dataflows yes // Datasets yes
    – Ability to add a column or another data source (from Power BI Desktop) on top of: Dataflows yes // Datasets no [You cover more or less this point on 7. and 11.)
    – Certification & Promotion: Dataflows no // Datasets yes
    – Ability to connect directly from Excel: Dataflows no (unless you use ADLS Gen2?) // Datasets yes

    Like

  2. aoifoc

    Really great video, you have a knack for explaining things clearly. I’ve gone from ‘quite confused’ to ‘much more informed’ about dataflows now, thank you!

    Keep the videos coming.

    Like

  3. Pingback: Power BIte: Sharing and reuse with dataflows and datasets – BI Polar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s