I haven’t been posting a lot about dataflows in recent months, but that doesn’t mean I love them any less. On Wednesday September 23rd, I’ll be sharing some of that love via a free webcast hosted by the Istanbul Power BI user group. You can sign up here.
In this webcast I’ll be presenting practices for successfully incorporating dataflows into Power BI applications, based on my experience working with enterprise Power BI customers. If you’re interested in common patterns for success, common challenges to avoid, and answers to the most frequently asked dataflows questions, please sign up today.
This webcast won’t cover dataflows basics, so if you’re new to dataflows in Power BI or just need a refresher, please watch this tutorial before joining!
 In case it needs to be said, yes, the session will be delivered in English.
In the post I mentioned that I was simplifying things a bunch to only focus on the core pattern. One of the things I didn’t mention is that the diagrams I shared were just one piece of the puzzle. Another part was the need to define dataflows in one workspace, and then use those as a template for creating other dataflows in bulk.
The rest of this post is not from me – it’s from the dataflows engineering team. it describes a set of PowerShell scripts they’ve published on GitHub, and which address this specific problem. The rest is pretty self-explanatory, so I’ll just mention that these are unsupported scripts presented as-is, and I’ll let the rest speak for itself.
Microsoft Power BI dataflows samples
The document below describes the various PowerShell scripts available for Power BI dataflows. These rely on the Power BI public REST APIs and the Power BI PowerShell modules.
Power BI Dataflow PowerShell scripts
Below is a table of the various Power BI PowerShell modules found in this repository.
These days more and more data lives behind an API, rather than in a more traditional data source like databases or files. These APIs are often designed and optimized for small, chatty interactions that don’t lend themselves well to use as a source for business intelligence.
These APIs are often slower than a database, which can increase load/refresh times. Sometimes the load time is so great that a refresh may not fit within the minimum window based on an application’s functional requirements.
These APIs may also be throttled. SaaS application vendors often have a billing model that doesn’t directly support frequent bulk operations, so to avoid customer behaviors that affect their COGS and their bottom line, their APIs may be limited to a certain number of calls for a given period.
The bottom line is that when you’re using APIs as a data source in Power BI, you need to take the APIs’ limitations into consideration, and often dataflows can help deliver a solution that accommodates those limitations while delivering the functionality your application needs.
In late July I met with a customer to discuss their Power BI dataflows architecture. They showed me a “before” picture that looked something like this:
One of their core data sources was the API for a commercial software application. Almost every Power BI application used some data from this API, because the application supports almost every part of their business. This introduced a bunch of familiar challenges:
Training requirements and a steeper-then-necessary learning curve for SSBI authors due to the complex and unintuitive API design
Long refresh times for large data extracts
Complex, redundant queries in many applications
Technical debt and maintenance due to the duplicate logic
Then they showed me an “after” picture that looked something like this:
They had implemented a set of dataflows in a dedicated workspace. These dataflows have entities that pull data from the source APIs, and make them available for consumption by IT and business Power BI authors. All data transformation logic is implemented exactly once, and each author can easily connect to trusted tabular data without needing to worry about technical details like connection strings, API parameters, authentication, or paging. The dataflows are organized by the functional areas represented in the data, mapping roughly to the APIs in the source system as viewed through the lens of their audience.
The diagram above simplifies things a bit. For their actual implementation they used linked and computed entities to stage, prepare, and present the dataflows, but for the general pattern the diagram above is probably the right level of abstraction. Each IT-developed or business-developed application uses the subset of the dataflows that it needs, and the owners of the dataflows keep them up to date and current.
Life is good.
There’s a lot of information available online covering chatty and chunky patterns in API design, so if you’re not sure what I’m talking about here, you might want to poke around and take a peek at what’s out there.
 Please let me know if you found the joke in footnote.
 Possibly by multiple orders of magnitude.
 Or a given data volume, etc. There are probably as many variations in licensing as there are SaaS vendors with APIs.
 If you’re wondering about the beehive image and the obtuse joke it represents, the genus for honey bees is “apis.” I get all of my stock photography from the wonderful web site Pixabay. I will typically search on a random term related to my blog post to find a thumbnail image, and when the context is completely different like it was today I will pick one that I like. For this post I searched on “API” and got a page full of bees, which sounds like something Bob Ross would do…
There are over 20 articles in this new resource, and the content they contain answers many of the questions I hear most frequently. For some reason WordPress isn’t letting me upload the screen shot I took of the content outline, so instead here is a picture of me when I first learned about this new content:
Please check out this new information, and please submit feedback to the documentation team using the feedback form on each page. When you let the docs team know what you like, what you don’t like, or what is missing, they read every comment!
INTERIOR: pan over cubicles of happy, productive office workers
CLOSE-UP: office worker at desk
NARRATOR: Is that Susie I see, giving Power Platform dataflows a try?
SUSIE: That’s right! With dataflows I can have all of the data I need, right where I need it!!
NARRATOR: Dataflows. They’re not just for Power BI anymore.
OK, you may not remember that orange juice ad campaign from the late 1970s and early 80s, but I’ve had it stuck in my head since I started working on this post and video. I couldn’t figure out how to work it into the video itself, so here it is in written form.
Anyway, with that awkward moment behind us, you probably want to watch the video. Here is it:
Power Platform dataflows use Power Query Online – and the same set of connectors, gateways, and transformation capabilities as Power BI dataflows. But there are a few key differences that are worth emphasizing.
Power Platform dataflows can load data into the Common Data Service, either into the standard Common Data Model entities used by Dynamics 365 apps, or into custom entities used by custom Power Apps. This is important – this makes dataflows more like a traditional ETL tool like SSIS data flows in that at the end of the dataflow creation process you can map the columns in your queries to the columns in these existing tables.
Power Platform dataflows can load data into ADLSg2 for analytical scenarios, but Power Apps doesn’t have the same concept of “built-in storage” that Power BI does. That means if you want to use Power Platform dataflows to create CDM folders, you must configure your Power Apps environment to use an ADLSg2 resource in your Azure subscription.
The “link to data lake” feature in Power Apps feels to me like a better integration experience than what’s currently available in Power BI. In Power Apps you define the link at the environment level, not the tenant level – this provides more flexibility, and enables non-tenant admins to configure and use the data lake integration.
The first time you create a Power Platform dataflow and select the “analytical entities” option, you’ll be prompted – and required – to link the Power Apps environment to an Azure Data Lake Storage resource. You’ll need to have an Azure subscription to use, but the process itself if pretty straightforward.
I can’t wait to hear what you think of this new capability. Please let me know in the comments or via Twitter.
See you in the new year!
 I just realized that this was 40 years ago. Were you even born yet?
 CDS entities aren’t tables by the strictest definition, but it’s close enough for our purposes today.
 I honestly don’t know enough about Power Apps security to go into too much depth on this point, but I am not a Power Apps admin and I was able to create a trial environment and link it to my own ADLSg2 resource in my own Azure subscription without affecting other users.
Most of my blog posts that discuss the integration of Azure data services and Power BI dataflows via Common Data Model folders include links to a tutorial and sample originally published in late 2018 by the Azure team. This has long been the best resource to explain in depth how CDM folders fit in with the bigger picture of Azure data.
Microsoft Solutions Architect Ted Malone has used the Azure sample as a starting point for a GitHub project of his own, and has extended this sample project to start making it suitable for more scenarios.
The thing that has me the most excited (beyond having Ted contributing to a GitHub repo, and having code that works with large datasets) is the plan to integrate with Apache Atlas for lineage and metadata. That’s the good stuff right there.
If you’re following my blog for more than just Power BI and recipes, this is a resources you need in your toolkit. Check it out, and be sure to let Ted know if it solves your problems.
The enhanced compute engine in Power BI dataflows has been in preview since June. It’s not really new, and I’ve posted about it before. But I still keep hearing questions about it, so I thought it might make sense to record a video.
I won’t go into too much more depth here – just watch the video, and if you want more details check out one of these existing posts:
 Also, I’m behind on my video schedule – this was a motivating factor as well. November was an unexpectedly busy month, and between work, life, and not really having the video editing skills I need to keep to a schedule… Yeah.
At this point I’ve said “Power BI dataflows enable reuse” enough times that I feel like a broken record. What does this phrase actually mean, and how can you take advantage of dataflows to enable reuse in your Power BI applications?
This Power BIte video is a bit longer than its predecessors, and part of this is because it covers both the problem and the solution.
The problem is that self-service BI applications often start out as one-off efforts, but don’t stay that way. At least in theory, if the problem solved by the application was widespread and well understood, there would be an existing solution already developed and maintained by IT, and business users wouldn’t need to develop their own solutions.
Successful applications have a tendency to grow. For self-service BI, this could mean that more and more functionality gets added to the application, or it could mean that someone copies the relevant portions of the application and uses them as the starting point for a new, different-but-related, application.
Once this happens, there is a natural and gradual process of drift that occurs, as each branch of the application tree grows in its own direction. A predictable consequence of this drift in Power BI applications is that query definitions that start off as common will gradually become out of sync, meaning that “the same data” in two datasets will actually contain different values.
Moving queries that need to be shared across multiple applications from multiple datasets into a single dataflow is a simple and effective solution to this problem. There’s no dedicated tooling for this solution in Power BI today, but the steps are still simple and straightforward.
P.S. This is the first Power BIte video recorded in my home office. After struggling unsuccessfully to get decent audio quality in my office at work, I’m trying out a new environment and some new tools. I know there’s still work to be done, but hopefully this is a step in the right direction. As always, I’d love to know what you think…
 For my younger readers, this phrase is a reference to when Spotify used to be called “records” and the most common service outage symptom was a repeat of the audio buffer until the user performed a hard reset of the client application. True story.
 Is there a better term for this? I feel like there should be an existing body of knowledge that I could reference, but my searching did not yield any promising results. The fact that “Logical Drift” is the name of a band probably isn’t helping.
This week’s Power BIte is the fourth and final entry in a series of videos that present different ways to create new Power BI dataflows, and the results of each approach.
When creating a dataflow by attaching an external CDM folder, the dataflow will have the following characteristics:
Data ingress path
Ingress via Azure Data Factory, Databricks, or whatever Azure service or app has created the CDM folder.
Data stored in ADLSg2 in the CDM folder created by the data ingress process.
The data is refreshed based on the execution schedule and properties of the data ingress process, not by any setting in Power BI.
The key to this scenario is the CDM folder storage format. CDM folders provide a simple and open way to persist data in a data lake. Because CDM folders are implemented using CSV data files and JSON metadata, any application can read from and write to CDM folders. This includes multiple Azure services that have libraries for reading and writing CDM folders and 3rd party data tools like Informatica that have implemented their own CDM folder connectors.
CDM folders enable scenarios like this one, which is implemented in a sample and tutorial published on GitHub by the Azure data team:
Create a Power BI dataflow by ingesting order data from the Wide World Importers sample database and save it as a CDM folder
Use an Azure Databricks notebook that prepares and cleanses the data in the CDM folder, and then writes the updated data to a new CDM folder in ADLS Gen2
Attach the CDM folder created by Databricks as an external dataflow in Power BI
Use Azure Machine Learning to train and publish a model using data from the CDM folder
Use an Azure Data Factory pipeline to load data from the CDM folder into staging tables in Azure SQL Data Warehouse and then invoke stored procedures that transform the data into a dimensional model
Use Azure Data Factory to orchestrate the overall process and monitor execution
That’s it for this mini-series!
If all this information still doesn’t make sense yet, now is the time to ask questions.
 New videos every Monday morning!
 I added this bullet to the list because it fits in with the rest of the post – the other bullets are copied from the sample description.
This capability has been in preview since early this year, so it’s not really new, but there are enough pieces involved that it may not be obvious how to begin – and I continue to see enough questions about this topic that another blog post seemed warranted.
The key point is that because dataflows are writing data to ADLSg2 in CDM folder format, Azure Machine Learning and Azure Databricks can both read the data using the metadata in the model.json file.
This json file serves as the “endpoint” for the data in the CDM folder; it’s a single resource that you can connect to, and not have to worry about the complexities in the various subfolders and files that the CDM folder contains.
This tutorial is probably the best place to start if you want to know more. It includes directions and sample code for creating and consuming CDM folders from a variety of different Azure services – and Power BI dataflows. If you’re one of the people who has recently asked about this, please go through this tutorial as your next step!
 It’s the best resource I’m aware of – if you find a better one, please let me know!