Q: What are Power BI dataflows?
A: Dataflows are a capability in Power BI for self-service ETL and data preparation that enable analysts and business users to define and share reusable data entities. Each dataflow is created in a Power BI workspace and can contain one or more entities. Each entity is defined by a Power Query “M” query. When the dataflow is refreshed, the queries are executed, and the entities are populated with data.
Q: Where is the data stored?
A: Data is stored in Azure Data Lake Storage gen2 (ADLSg2) in the CDM folder format. Each dataflow is saved in a folder in the data lake. The folder contains one or more files per entity. If an entity does not use incremental refresh, there will be one file for the entity’s data. For entities that do use incremental refresh, there will be multiple files based on the refresh settings. The folder also contains a model.json file that has all of the metadata for the dataflow and the entities.
Q: Do I need to pay for Power BI dataflows?
A: Yes, but you don’t need to pay extra for them. Dataflows are available to Power BI Pro and Premium users.
Q: Do I need Power BI Premium to use dataflows?
A: No. Although some specific features (incremental refresh of dataflow entities and linked/computed entities) do require Premium, dataflows are not a Premium-only capability.
Q: How much data storage do I get?
A: Storage for dataflow entities counts against the existing limits for Power BI. Each user with a Power BI Pro license has a limit of 10GB, and each Premium capacity node has a limit of 100TB.
Q: Do dataflows support incremental refresh?
A: Yes. Incremental refresh can be configured on a per-entity basis. Incremental refresh is supported only in Power BI Premium.
Q: Can I use on-premises data sources with dataflows?
A: Yes. Dataflows use the same gateways used by Power BI datasets to access on-premises data sources.
Q: How do I do X with dataflows? I can do it in a query in Power BI Desktop, but I don’t see it in the dataflows query editor UI!
A: Most Power Query functionality is available in dataflows, even if it isn’t exposed through the query editor in the browser. If you have a query that works in Power BI Desktop, copy the “M” script into a “blank” query to create a new dataflow entity. In most cases it will work.
Q: Do I still need a data warehouse if I use dataflows?
A: If you needed a data warehouse before Power BI dataflows, you probably still need a data warehouse. Although dataflows serve a similar logical function as a data warehouse or data mart, modern data warehouse platforms provide capabilities that dataflows do not.
Q: Do I need dataflows if I already have a data warehouse?
A: Dataflows fill a gap in data warehousing and BI tools by allowing business users and analysts to prepare and share data without needing help from IT. With dataflows, users can build a “self service data mart” in Power BI that can be used in their solutions. Because each dataflow entity is defined by a Power Query “M” query, handing off the definitions to an IT team for operationalization/industrialization is more straightforward.
Q: Do dataflows replace Azure Data Factory?
A: No. Azure Data Factory (ADF) is a hybrid data integration platform designed to support enterprise-scale ETL and data integration needs. ADF is designed for use by professional data engineers. Power BI dataflows are designed for use by analysts and business users – people familiar with the Power Query experience from Power BI Desktop and Excel – to load data into ADLSg2.
Q: With the Wrangling Data Flow in Azure Data Factory do we still need Dataflows in Power BI?
A: Probably. Power BI dataflows are a self-service data preparation tool that enable analysts and other business users who may not be comfortable using SSIS or ADF to solve data prep problems without IT involvement. This is still relevant even after ADF now includes Power Query with Wrangling Data Flows.
Q: Can I use dataflows for realtime / streaming data?
A: No. Dataflows are for batch data, not streaming data.
Q: Do dataflows replace Power BI datasets?
A: No. Power BI datasets are tabular analytic models that contain data from various sources. Power BI dataflows can be some or all of the sources used by a dataset. You cannot build a Power BI report directly against a dataflow – you need to build reports against datasets.
Q: How can I use the data in a dataflow?
A: No. Oh, wait, that doesn’t make sense – this wasn’t even a yes or no question, but I was on a roll… Anyway, you the use data in a dataflow by connecting to the Power BI dataflows connector in Power Query. This will give you a list of all workspaces, dataflows, and entities that you have permission to access, and you can use them like any other data source.
Q: Can I connect to dataflows via Direct Query?
A: No. Dataflows are an import-only data source.
Q: Can I use the data in dataflows in one workspace from other workspaces?
A: Yes! You can import entities from any combination of workspaces and dataflows in your PBIX file and publish it to any workspace where you have the necessary permissions.
Q: Can I use the data in a dataflow from tools other than Power BI?
A: Yes. You can configure your Power BI workspace to store dataflow data in an Azure Data Lake Storage gen2 resource that is part of your Azure subscription. Once this is done, refreshing the dataflow will create files in the CDM Folder in the location you specify. The files can then be consumed by other Azure services and applications.
Q: Why do dataflows use CSV files? Why not a cooler file format like Parquet or Avro?
A: The dataflows whitepaper answers this one, but it’s still a frequently asked question. From the whitepaper: “CSV format is the most ubiquitously supported format in Azure Data Lake and data lake tools in general, and CSV is generally the fastest and simplest to write for data producers.” You should probably read the whole thing, and not just this excerpt, because later on it says that Avro and Parquet will also be supported.
Q: Is this an official blog or official FAQ?
A: No, no. Absolutely not. Oh my goodness no. This is my personal blog, and I always suspect the dataflows team cringes when they read it.