Power BIte: Dataflows vs. datasets

Important: This post was written and published in 2019, and the content below may no longer represent the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

This is still one of the most common dataflows questions: what’s the difference between Power BI dataflows and Power BI datasets?

For the last year I have resisted tackling this question head-on. This isn’t because it’s a bad or “dumb” question. Just the opposite – this is a very simple question, and the simpler a question is, the more complex and nuanced the answer is likely to be.

See how complex this is?
A graphical representation of the answer’s likely complexity.

If you’re a regular reader of this blog, you probably already know the answer, because I’ve answered it already. Sort of. The existing answer is distributed across dozens of posts, and if you’ve read all of them and picked up the answer along the way. But I keep hearing this question, and I keep thinking that there must be a more direct answer I could share.

Here it is, in a single, simple table[1].

Power BI dataflows Power BI datasets
Implementation CDM folder Analysis Services tabular model
Storage[2] CSV files Vertipaq
Metadata[3] Common Data Model – model.json BISM
Development Power Query Online Power Query in Power BI Desktop
Primary purpose[4] Data reuse Data analysis
Reuse[5] Acts as data source in multiple datasets Shared datasets across workspaces
Scope of reuse[6] Entity level reuse Dataset level reuse
Mashup with other data sources[7] Yes No
Used for reporting[8] Not directly Yes
Reuse outside Power BI[9] Yes, through ADLSg2 Yes, through XMLA
Data access methods[10] Import Import, DirectQuery
Connection methods[11] Import Live Connection
Row-level security No Yes
Certification and promotion Not yet Yes
What else am I missing? Please let me know! Seriously, you should let me know.

Update: I’ve added a few rows to the table after the post was originally published, to incorporate feedback from readers on differences I had missed. Thank you!

Each of the rows in this table could easily be an in-depth topic in and of itself, so if you’re looking at any of them and thinking “that’s not quite right” I might very well agree with you. There’s a lot of context and a lot of nuance here, and we’re trying to sum things up in a word or two… which is kind of the whole point.

Oh yeah, there’s a video too.[12]

I can’t wait to hear what you think!


[1] A simple table with ten footnotes.

[2] The storage aspect of dataflows and datasets is one of the most significant differences between the two. Datasets use the Vertipaq column store to load data into an optimized and highly compressed in-memory representation that is optimized for analysis. Dataflows use text files in folders, which are optimized for interoperability.

[3] The Analysis Services Tabular engine uses the BI Semantic Model (BISM) to represent its metadata. This is a metadata model originally included in SQL Server 2012 Analysis Services, and used by the Tabular engine ever since.

[4] Saying “this is the primary purpose” of any complex tool is fraught with risk, because no matter what you say, there are other valid things that remain unsaid. With this said… the big gap that dataflows close is that of self-service data preparation for the purpose of data sharing and reuse. Power BI has always had self-service data preparation through Power Query, but before dataflows the data that was prepared was “locked” in a dataset, for analysis, and not for sharing or reuse.

[5] Once you have loaded data into dataflows, authorized users can reuse entities from multiple dataflows, and use them as the building blocks for new dataflows or new datasets. Once you have loaded data into a dataset (and published it to the Power BI service) you can enable users to connect to it.

[6] With dataflows, users can pick and choose the entities they want, but a dataset can only be reused as-is.

[7] Dataflow entities can be used as data sources in the same Power BI Desktop file as other data sources, and can serve as part of a mashup or composite model, but a dataset can only be reused as-is.

[8] Although you can obviously use dataflows for reporting, you do so by first importing the data from the dataflow into a dataset.

[9] It’s interesting to point out that using your own organizational ADLSg2 account does not require Power BI Premium, but using the XMLA endpoint to connect to Power BI datasets from non-Power BI clients does.

[10] You can only import data into your dataflow entities, but tables in your dataset can import data or use DirectQuery, and a dataset can use a combination of the two.

[11] You can only import data from a dataflow into a dataset. When connecting to a shared dataset you can only use Live Connections.

[12] I’ve been thinking of making videos to supplement this blog for almost as long as I’ve been hearing the question that inspired this post. Please take a moment to share your thoughts on the video. This is something of a “soft launch” and although I have plans for a few dozen more videos already, your feedback will be a main factor in how the video series evolves.

21 thoughts on “Power BIte: Dataflows vs. datasets

  1. Great article on a very trendy topic – thanks !
    I have been thinking about Dataflows vs Datasets for a while now. We might even do a dedicated meetup on this topic at our French Power BI User Group. I would add the following items to the comparison:
    – Row-Level-Security (dataflows no / datasets yes)
    – Ability to add a measure on top of (dataflows yes / datasets yes)
    – Ability to add columns or another source on top of (dataflows yes, datasets no [… until we have THE feature we’re all waiting for : composite models with Analysis Services as a data source or, in other words, DirectQuery for AS] )
    – Ability to connect directly from Excel (dataflows no [unfortunately…when can we have this??] , datasets yes)
    – Promotion & Certification (dataflows no, datasets yes)

    Like

  2. Great article on a very trendy topic – thank you very much !
    I have been thinking about Dataflows vs Datasets for a while now. We might even do a dedicated meetup on this topic at our French Power BI User Group very soon.
    I would add the following items to the comparison:
    – Row-Level Security: Dataflows no // Datasets yes
    – Ability to add a measure (from Power BI Desktop) on top of: Dataflows yes // Datasets yes
    – Ability to add a column or another data source (from Power BI Desktop) on top of: Dataflows yes // Datasets no [You cover more or less this point on 7. and 11.)
    – Certification & Promotion: Dataflows no // Datasets yes
    – Ability to connect directly from Excel: Dataflows no (unless you use ADLS Gen2?) // Datasets yes

    Like

  3. aoifoc

    Really great video, you have a knack for explaining things clearly. I’ve gone from ‘quite confused’ to ‘much more informed’ about dataflows now, thank you!

    Keep the videos coming.

    Like

  4. Pingback: Power BIte: Dataflows vs. datasets — BI Polar | MS Excel | Power Pivot | DAX | SSIS |SQL

  5. Pingback: Power BIte: Sharing and reuse with dataflows and datasets – BI Polar

  6. Rick

    As I understand if we share a dataflow – only loaded and transformed data is shared, but if we share a dataset then measures are shared as well. So if you share a dataset then all you did in DAX is shared as well.

    Like

    1. I’m not sure if there was a question in there, but… yes?

      Dataflows and datasets are fundamentally different constructs, so this type of mismatch should be expected. Datasets have concepts like measures for which there is no corresponding concept in dataflows.

      Like

  7. Nathan

    Hi Matthew,
    Keep up the great work!
    Could you share a video / blog on end-to-end best practice for dataflow / dataset / report access across workspaces?
    Thanks!
    Nathan

    Like

      1. Nathan

        I’m thinking a distinction between:

        -Shared dataset, used as a source for a report in another workspace needs to have permission granted to the dataset in order for an end user to consume it

        vs

        -Shared dataflow, used as a datasource in a dataset doesn’t need any additional access for end users.

        I think it’s worth discussing, even if the reason is intuitive – Imported data in a dataset has entirely new access rules, so we apply appropriate access and are not concerned with propagating the dataflow access.

        Like

      2. Nathan

        In your table, there could be a distinction between “Inherited Data Access”

        -Dataflows don’t need to have access granted to the consumers, as the access is controlled on the dataset.
        -Datasets need read permission granted for consumers of Apps that include the dataset across workspaces – App permission alone creates an unusable experience.

        Again, this may be obvious and perhaps it is implied somewhere, but I think the explicit comparison may add value to the table. Or maybe it wouldn’t. 🙂

        Like

  8. LD

    Great site and videos Matthew. I’m trying to use Power BI to query Dynamics 365 entities. I have an fetchXML file that joins entities which I’d like to use in a dataflow and refresh it daily. I see the option for xml but don’t see example on how to use it. Do I need to first ask the Power BI administrator to create an on-premise gateway? i also tried to get this to work using PBI desktop and get error messages. thanks

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s