Part 1 of this series introduced Power BI dataflows from a largely conceptual perspective. This post is going to be much more hands-on. Let’s look at what’s involved with creating a dataflow in Power BI.
To get started, you’ll need an app workspace. Dataflows are created in Power BI workspaces just like datasets, reports, and dashboards. But because dataflows are designed for sharing and reuse, you can’t create them in your personal “My Workspace” which is designed for… not-sharing-and-reuse.
I’m going to start with this workspace:
You’ll probably notice that this is a new “v2” workspace. You’ll probably also notice that this workspace is assigned to Power BI Premium capacity. Neither of these factors is required to get started with Power BI dataflows, but for some specific features they will be important.
In my workspace I will select the “create” button, and will select “dataflow” from the menu.
I swear I didn’t plan this, but let’s use it as a learning experience.
The use of dataflows needs to be enabled by a Power BI capacity administrator for a given Premium capacity node. Dataflows use compute and storage resources. Requiring a capacity administrator to explicitly enable dataflows gives them control over how the capacity they manage is used. This is great for the capacity administrator… but annoying for the blogger.
Anyway, as I was saying, I’m using this new workspace, which is on Power BI shared capacity, since I’m not doing anything in this post that relies on Premium. Yeah.
So… In my new workspace I will select the “create” button, and will select “dataflow” from the menu. For real this time.
I’m now prompted to add new entities, which is good, because this is exactly what I want to do. If you think about a dataflow as being a self-service data warehouse or data lake, an entity is like a table within it.
Creating a new entity opens Power Query, where I can select a data source and provide my connection details. The details are a little different from Power Query in Power BI Desktop or Excel, but the experience is the same. I have an interactive preview of the data, I can transform at the column or table level, I can view and edit the transformation steps I’ve applied, and I can view the underlying “M” Power Query script in the advanced editor. User who are familiar with Power Query in Power BI Desktop should find this to be a very familiar experience.
When I’m done – when the query has been given a meaningful name and all transformations have been applied, I can create more entities or I can save the dataflow, giving it a name and an optional description.
When I save the dataflow, I am given the option to refresh now, or to set up a refresh schedule. I can do both of these things later from the dataflows list in the workspace page, but it’s convenient to be prompted when saving the dataflow. Refreshing the dataflow will execute the query for each entity in the dataflow, and will populate the underlying Azure Data Lake Storage with the results of the queries.
Once I’ve created and refreshed the dataflow, it’s time to use it. Dataflows are designed to be used as data sources for analysis so it makes sense to connect to the dataflow from Power BI Desktop, and to use it to start building my data model.
Power BI Desktop now includes a “Power BI dataflows (Beta)” option in the Get Data menu. When I select it, I can see all of the workspaces in my Power BI tenant that I have permission to access, and which contain dataflows. Inside the workspace I can see each of the dataflows it contains, and inside the dataflow I can see each entity. And just like I can with other data sources, I can select, preview, and load into my data model – or edit in Power BI desktop to further refine and prepare the data. Dataflows are built and managed in Power BI, but they act like any other data source that an analyst or business user might use.
When I’m done building my PBIX file which can contain data from multiple dataflows in multiple workspaces, I can publish it to the Power BI service into any workspace where I have the necessary permissions. A dataflow in one workspace can serve as a data source for datasets and reports in any number of workspaces in the tenant.
That’s the basic end-to-end flow for creating and using a dataflow in Power BI. As you can imagine, there’s more to it, but that will wait for another post…
 None of which will be used or mentioned in this post.
 If you’re wondering how this Premium workspace contains dataflows but I can’t create dataflows in this Premium capacity, ask me in person some time. It’s a long story.
 If a dataflow is like a database, and an entity is like a table, who wants a database with just one table in it?
 Remember the last post?