Important: This post was written and published in 2018, and the content below no longer represents the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.
It’s been two weeks since Power BI dataflows became publicly available in preview, and there’s been a flood of interest and excitement as people have started exploring and using this new capability for self-service data preparation.
There has also been a growing trickle of confusion. To illustrate this confusion, I will share a few examples.
Here’s the first example: A comment thread from a few weeks ago on this post, initiated by Neville:
The potential confusion here may come from the fact that Power BI dataflow entities are similar in purpose to tables in a data warehouse or data mart. But due to many factors (including the underlying storage being in files in a data lake) it is not generally appropriate to think of dataflows as a replacement for a data warehouse. While there may be exceptions to this rule, any time you hear someone saying you don’t need a data warehouse because you have Power BI dataflows, you should be skeptical, and cautious.
Which leads me to another source of potential confusion. I had a meeting yesterday with two senior technical stakeholders from a big consulting firm. We had a delightful conversation[1] that centered primarily on dataflows. During the conversation they mentioned that they had heard some people from the Microsoft sales team explicitly telling their customers that they didn’t need a data warehouse now that Power BI had dataflows.
I’m going to tell you the same thing I told them: If you’re ever in this situation, please ask those sales people to contact me.
This is directly related to the second example: a wonderful blog post by Teo Lachev of Prologika. From this post, in the context of “the bad” about dataflows:
Let’s start with positioning. I’ve heard Microsoft position dataflows for any data integration task, from data staging to loading data warehouses and even replacing data warehouses and ETL (heard that vibe before?) I’ve seen business users doing impressive things with Power BI. But I’ve seen them also attempting to implement organizational solutions that collapse from their own weight. Dataflows are not an exception and I don’t think it’s a business user’s job to tackle ambitious data integration tasks.
I already responded to Teo via Twitter, so I’ll reuse that response here: we need to get better at that positioning. Dataflows don’t replace data warehouses and ETL any more than Power BI desktop replaces a full set of BI pro tools. Dataflows complement these pro tools, and enable users who are not BI pros to fill in more gaps in a BI solution, similar to other Power BI capabilities. Although dataflows mean that self-service users can do more without help, the same patterns still apply.
And this leads us to the final example: this tweet from Olivier:
I also replied to this on Twitter, and I’ll use that reply as my starting point here[2].
The main reason to select Power BI dataflows over a professional ETL tool like Azure Data Factory is the user persona. You could not ask a business user or analyst to build data prep processes using ADF and have a realistic expectation of success. Power BI dataflows and ADF Data Flows (not to mention SSIS Dataflows and DTS DataPumps) provide similar solutions to a common set of problems.
The synergy between Power BI dataflows and ADF (as well as other Azure data services) is enabled through CDM Folders in Azure Data Lake Storage gen2. This is a deeper and more strategic approach than simply trying to apply the same technology to two different services. Since the full capabilities of Power BI dataflows integrating with ADLSg2 are not yet available in the dataflows public preview, it’s difficult to give concrete examples, so instead I’ll refer you to the dataflows whitepaper and other available dataflows resources.
In general, I expect this type of confusion to continue for a while. The Power BI team – as well as the ADF team and other data platform teams – need to be aggressive and consistent in messaging for customers, partners, and our sales and support folks. There are enough nuances involved that it’s not particularly easy to navigate all of the different options in complex real world scenarios, and at this point in the preview there are additional challenges. Not only is the feature still being developed[3], there’s also a lack of official guidance. That guidance will exist, but in the interim during the preview we’ll probably need to rely on posts like this one…
…so please feel encouraged to share this post, and to share any questions related to positioning Power BI dataflows that aren’t answered here.
[1] They were both Canadian, and apologized for scheduling the meeting on the day before US Thanksgiving, when most people were planning to take the day off. I let them know that it was just fine, and that if I’d realized this myself when receiving the invitation I would have declined – but since I wasn’t paying attention, this one was on me.
[2] Although here I’ll avoid expressing any opinions about the naming overlap. I wasn’t involved in that decision, so I’ll continue to observe, judging, from a distance.
[3] Until everyone can use the deeper Azure Data Lake Storage gen2 integration that (as of this writing) isn’t yet available in public preview, there’s a big part of the dataflows story that isn’t yet “real” to most people.
Pingback: Dataflows in Power BI – BI Polar
Reblogged this on MS Excel | Power Pivot | DAX | SSIS |SQL.
LikeLike
Great article!
I also see dataflows is a complementary tool that could build the bridge between self-service and enterprise BI. It can be the bridge between the data warehouse and analysts. With dataflows you can prep and shape the dimensions and fact tables for a specific business unit (whole company for that matter) and present the data warehouse as entities in Power BI Service. Combined with entities for other data sources you will have one single source for data models.
The data warehouse is maybe not reachable outside the company network and many users may not have access or skills to extract and shape the data. Dataflows can make the data extraction less complex and increase flexibility in work environment in a secure way.
LikeLiked by 1 person
Very interesting – does the data flows in power BI bring live query capability to all data sources ? How has, if it has, the API been developed to continue the SQL story ?
LikeLike
Hi Keith – I don’t understand your question. Can you please rephrase it?
LikeLike
Thank you for posting this article as it helps me a lot in briefing the BI team about dataflows and importantly where we can and shouldn’t use it.
Will the files, created by dataflows, be accessible by other analytics platforms?
LikeLike
Pingback: Power BI Dataflows and Slow Data Sources – BI Polar
Pingback: Positioning Power BI Dataflows (Part 2) – BI Polar
Pingback: Power BI Dataflows FAQ – BI Polar