Dataflows and API data sources

These days more and more data lives behind an API, rather than in a more traditional data source like databases or files. These APIs are often designed and optimized for small, chatty[1] interactions that don’t lend themselves well to use as a source for business intelligence.

beehive-337695_1280
I’d explain the choice of image for APIs, but it doesn’t take a genus to figure it out

These APIs are often slower[3] than a database, which can increase load/refresh times. Sometimes the load time is so great that a refresh may not fit within the minimum window based on an application’s functional requirements.

These APIs may also be throttled. SaaS application vendors often have a billing model that doesn’t directly support frequent bulk operations, so to avoid customer behaviors that affect their COGS and their bottom line, their APIs may be limited to a certain number of calls[4] for a given period.

The bottom line is that when you’re using APIs as a data source in Power BI, you need to take the APIs’ limitations into consideration, and often dataflows can help deliver a solution that accommodates those limitations while delivering the functionality your application needs.

I’ve covered some of the fundamental concepts of this approach in past blog posts , specifically this one from 2018 on Power BI Dataflows and Slow Data Sources, and this one from 2019 on Creating “data workspaces” for dataflows and shared datasets.  I hadn’t planned on having a dedicated new blog post, but after having this pattern come up in 4 or 5 different contexts in the past week or so, I thought maybe a post was warranted.

In late July I met with a customer to discuss their Power BI dataflows architecture. They showed me a “before” picture that looked something like this:

2020-08-01-13-40-14-887--POWERPNT

One of their core data sources was the API for a commercial software application. Almost every Power BI application used some data from this API, because the application supports almost every part of their business. This introduced a bunch of familiar challenges:

  • Training requirements and a steeper-then-necessary learning curve for SSBI authors due to the complex and unintuitive API design
  • Long refresh times for large data extracts
  • Complex, redundant queries in many applications
  • Technical debt and maintenance due to the duplicate logic

Then they showed me an “after” picture that looked something like this:

2020-08-01-13-40-32-810--POWERPNT

They had implemented a set of dataflows in a dedicated workspace. These dataflows have entities that pull data from the source APIs, and make them available for consumption by IT and business Power BI authors. All data transformation logic is implemented exactly once, and each author can easily connect to trusted tabular data without needing to worry about technical details like connection strings, API parameters, authentication, or paging. The dataflows are organized by the functional areas represented in the data, mapping roughly to the APIs in the source system as viewed through the lens of their audience.

The diagram above simplifies things a bit. For their actual implementation they used linked and computed entities to stage, prepare, and present the dataflows, but for the general pattern the diagram above is probably the right level of abstraction. Each IT-developed or business-developed application uses the subset of the dataflows that it needs, and the owners of the dataflows keep them up to date and current.

Life is good[5].


[1]There’s a lot of information available online covering chatty and chunky patterns in API design, so if you’re not sure what I’m talking about here, you might want to poke around and take a peek[2] at what’s out there.

[2] Please let me know if you found the joke in footnote[1].

[3] Possibly by multiple orders of magnitude.

[4] Or a given data volume, etc. There are probably as many variations in licensing as there are SaaS vendors with APIs.

[5] If you’re wondering about the beehive image and the obtuse joke it represents, the genus for honey bees is “apis.” I get all of my stock photography from the wonderful web site Pixabay. I will typically search on a random term related to my blog post to find a thumbnail image, and when the context is completely different like it was today I will pick one that I like. For this post I searched on “API” and got a page full of bees, which sounds like something Bob Ross would do…

Data Culture: Roles and Responsibilities

In last week’s post and video we looked at how business intelligence tools have evolved, with each evolution solving one set of problems and introducing another set. The video described how self-service BI tools have enabled business users to work with data in ways that used to require specialized technical skills, freeing up IT to focus on other tasks – but at the same time introducing challenges around oversight, consistency, and control.

And that brings us to today’s video.

The video includes a few different graphics, but for this post I want to highlight this one, which I’ve taken from the Power BI Adoption Framework[1].

2020-07-31-20-44-46-355--POWERPNT

Successful cultures balance freedom and restriction in ways that benefit the culture as a whole. It’s a compromise – no one gets everything they want, but everyone gets the things they need the most.

For a data culture, this typically involves letting IT do the things they do best, and letting business get down to business.

When an organization adopts a managed self-service model for business intelligence[2], the central BI team in IT[3] does the heavy lifting. This means they prepare, cleanse, warehouse, and deliver the biggest and most important[4] data. They deliver the data that would be hard for a business user to create using a self-service tool, and which will be used by a large audience. They do the things that have broad reach, strategic impact, and strategic importance. They create things that need to be correct, consistent, and supported.

And business users do the rest. This is a broad bucket, but it’s a reasonable generalization as well. Business users create their own reports using the IT-managed data sources and data models. And they prepare and use their own data where there is no IT-managed data for a given purpose.

Over time a given application or a given data source may become more or less managed, as the culture adopts, adapts, and improves.

Managed self-service BI isn’t the only way to be successful, but in my experience working with many large Power BI customers it’s the best way and most predictable way. By having clearly defined roles and responsibilities – who does what – moving from either extreme can overcome the problems that that extreme creates, without going too far in the other direction.

Does your organization take this approach?

If it doesn’t today, what will it take to make this change a reality?


[1] Which I have shamelessly stolen from the Power BI Adoption Framework, because it is an awesome graphic and because I love to stand on the shoulders of giants.

[2] Which is the approach most likely to drive short-and long term efficiencies and successes.

[3] Please understand that this is a gross simplification. This “central BI team in IT” may be a single central team for the whole company, it may be a single central team for a specific business unit, or it may be one of dozens of BI teams that are designed to support a distributed global enterprise. This is less about the technology and the culture than it is about organizational structure, so I don’t expect to ever try to tackle this diversity of approaches in this series.

[4] Next week’s video will talk a little more about what data is likely to qualify for this distinction. And since the video is done already, I’m pretty confident to say “next week” this week.