How do Power BI datasets and dataflows relate to each other? Do you need one if you have the other?
I received this question as a comment on another post, and I think it warrants a full post as a reply:
Hi Matthew, my organization is currently evaluating where to put BI data models for upcoming PBI projects. Central in the debates is the decision of whether to use PBI Datasets, SSAS or DataFlows. I know a lot of factors need considering. I’m interested in hearing your thoughts.
Rather than answering the question directly, I’m gong to rephrase and re-frame it in a slightly different context.
I’m currently evaluating how to best chop and prepare a butternut squash. Central in the debates is the decision of whether to use a 6″ chef’s knife, a 10″ chef’s knife, or a cutting board.
(I’ll pause for a moment to let that sink in.)
It doesn’t really make sense to compare two knives and a cutting board in this way, does it? You can probably get the job done with either knife, and the cutting board will certainly make the job easier… but it’s not like you’d need to choose one of the three, right? Right?
Your choice of knife will depend on multiple factors including the size of the squash, the size of your hand, and whether or not you already have one or the other or both.
Your choice of using a cutting board will come down to your workflow and priorities. Do you already have a cutting board? Is it more important to you to have a safe place to chop the squash and not damage the edge of your knife, or is it more important to not have one more thing to clean?
Both of these are valid decisions that need to be made – but they’re not dependent on each other.
Let’s get back to the original question by setting some context for dataflows and datasets in Power BI.
This image is from one of the standard slides in my dataflows presentation deck, and I stole it from the dataflows team. It shows where datasets and dataflows fit in Power BI from a high-level conceptual perspective.
Here’s what seems most important in the context of the original question:
- Power BI visualizations are built using datasets as their sources
- Power BI includes datasets, which are tabular BI models hosted in the Power BI service
- Dataflows are a data preparation capability in Power BI for loading data into Azure Data Lake Storage gen2
- Dataflows can be used as a data source when building datasets in Power BI, but cannot currently be used as a data source for models outside of Power BI, including SSAS and AAS
- Dataflows and datasets solve different problems and serve different purposes, and cannot be directly compared to each other as the original question tries to do – that’s like comparing chef’s knives and cutting boards
What’s not shown in this diagram is SQL Server Analysis Services (SSAS) or Azure Analysis Services (AAS) because the diagram is limited in scope to capabilities that are natively part of Power BI. SSAS and AAS are both analytics services that can host tabular BI models that are very similar to Power BI datasets, and which can be used as a data source for Power BI datasets. Each option – SSAS, AAS, or Power BI datasets – is implemented using the same underlying technology, but each has different characteristics that make it more or less desirable for specific scenarios.
This list isn’t exhaustive, and I make no claims to being an expert on this topic, but these are the factors that seem most significant when choosing between SSAS, AAS, or Power BI datasets as your analytics engine of choice:
- Cost and pricing model – if you choose SSAS you’ll need to own and manage your own physical or virtual server. If you choose AAS or Power BI you’ll pay to use the managed cloud service. Dedicated Power BI Premium capacity and shared Power BI Pro capacity have different licensing models and costs tp target different usage patterns.
- Model size – you can scale SSAS to pretty much any workload if you throw big enough hardware at it. AAS can scale to models that are hundreds of gigabytes in size. Power BI Premium can support PBIX files up to 10GB, and Power BI Pro supports PBIX files up to 1GB.
- Deployment and control scenarios – with SSAS and AAS, you have a huge range of application lifecycle management (ALM) and deployment capabilities that are enabled by the services’ XMLA endpoint and a robust tool ecosystem. Power BI Premium will support this before too long as well.
I’m sure I’m missing many things, but this is what feels most important to me. Like I said, I’m far from being an expert on this aspect of Power BI and the Microsoft BI stack.
So let’s close by circling back to the original question, and that delicious analogy. You need a knife, but the knife you choose will depend on your requirements. Having a cutting board will probably also help, but it’s not truly required.
Now I’m hungry.
 If you want to watch a conference presentation or two that includes this slide, head on over to the Dataflows in Power BI: Resources post.
 This feels like an oversimplification, but it’s technically correct at the level of abstraction at which I’m writing it. If anyone is interested in arguing this point, please reply with a comment that links to your article or blog post where the salient differences are listed.
 Remember I’m not an expert on this, so feel free to correct me by pointing me to documentation. Thanks!
 This is not a direct size-to-size comparison. The services measure things differently.
 As announced at Microsoft Ignite a few months back, no firm dates shared yet.