Session resources: Patterns for adopting dataflows in Power BI

This morning I presented a new webinar for the Istanbul Power BI user group, covering one of my favorite subjects: common patterns for successfully using and adopting dataflows in Power BI.

This session represents an intersection of my data culture series in that it presents lessons learned from successful enterprise customers, and my dataflows series in that… in that it’s about dataflows. I probably didn’t need to point out that part.

The session slides can be downloaded here: 2020-09-23 – Power BI Istanbul – Patterns for adopting dataflows in Power BI

The session recording is available for on-demand viewing. The presentation is around 50 minutes, with about 30 minutes of dataflows-centric Q&A at the end. Please check it out, and share it with your friends!

 

Data Culture: The Importance of Community

The last two videos  in our series on building a data culture covered different aspects of  how business and IT stakeholders can partner and collaborate to achieve the goals of the data culture. One video focused on the roles and responsibilities of each group, and one focused on the fact that you can’t treat all data as equal. Each of these videos builds on the series introduction, where we presented core concepts about cultures in general, and data culture in particular.

Today’s video takes a closer look at where much of that business/IT collaboration takes place – in a community.

Having a common community space – virtual, physical, or both – where your data culture can thrive is an important factor in determining success. In my work with global enterprise Power BI customers, when I hear about increasing usage and business value, I invariably hear about a vibrant, active community. When I hear about a central BI team or a business group that is struggling, and I ask about a community, I usually hear that this is something they want to do, but never seem to get around to prioritizing.

Community is important.[1]

woman-1594711

A successful data culture lets IT do what IT does well, and enables business to focus on solving their problems themselves… but sometimes folks on both sides of this partnership need help. Where do they find it, and who provides that help?

This is where the community comes in. A successful community brings together people with questions and people with the answer to these questions. A successful community recognizes and motivates people who share their knowledge, and encourages people to increase their own knowledge and to share it as well.

Unfortunately, many organizations overlook this vital aspect of the data culture. It’s not really something IT traditionally owns, and it’s not really something business can run on their own, and sometimes it falls through the cracks[2] because it’s not part of how organizations think about solving problems.

If you’re part of your organization’s journey to build and grow a data culture and you’re not making the progress you want, look more closely at how you’re running your community. If you look online you’ll find lots of resources that can give you inspiration and ideas, anything from community-building ideas for educators[3] to tips for creating a corporate community of practice.


[1] Really important. Really really.

[2] This is a pattern you will likely notice in other complex problem spaces as well: the most interesting challenges come not within a problem domain, but at the overlap or intersection of related problem domains. If you haven’t noticed it already, I suspect you’ll start to notice it now. That’s the value (or curse) of reading the footnotes.

[3] You may be surprised at how many of these tips are applicable to the workplace as well. Or you may not be surprised, since some workplaces feel a lot like middle school sometimes…

Power BI dataflows PowerShell scripts on GitHub

Last week I shared a post highlighting a common pattern for making API data available through dataflows in Power BI, and included a few diagrams to show how a customer was implementing this pattern.

In the post I mentioned that I was simplifying things a bunch to only focus on the core pattern. One of the things I didn’t mention is that the diagrams I shared were just one piece of the puzzle. Another part was the need to define dataflows in one workspace, and then use those as a template for creating other dataflows in bulk.

This is simple enough to do via the Power BI portal for an individual dataflow, but if you need to do it for every dataflow in a workspace, you might need a little more power – PowerShell, to be specific.

The rest of this post is not from me – it’s from the dataflows engineering team. it describes a set of PowerShell scripts they’ve published on GitHub, and which address this specific problem. The rest is pretty self-explanatory, so I’ll just mention that these are unsupported scripts presented as-is, and I’ll let the rest speak for itself.

nautilus-1029360_1920

Microsoft Power BI dataflows samples

The document below describes the various PowerShell scripts available for Power BI dataflows. These rely on the Power BI public REST APIs and the Power BI PowerShell modules.

Power BI Dataflow PowerShell scripts

Below is a table of the various Power BI PowerShell modules found in this repository.

Description Module Name Download
Export all dataflows from a workspace ExportWorkspace.ps1 GitHub Location
Imports all dataflows into a workspace ImportWorkspace.ps1 GitHub Location
Imports a single dataflow ImportModel.ps1 GitHub Location

For more information on Powershell support for Power BI, please visit powerbi-powershell on GitHub

Supported environments and PowerShell versions

  • Windows PowerShell v3.0 and up with .NET 4.7.1 or above.
  • PowerShell Core (v6) and up on any OS platform supported by PowerShell Core.

Installation

  1. The scripts depend on the MicrosoftPowerBIMgmt module which can be installed as follows:
Install-Module -Name MicrosoftPowerBIMgmt

If you have an earlier version, you can update to the latest version by running:

Update-Module -Name MicrosoftPowerBIMgmt
  1. Download all the scripts from the GitHub Location into a local folder.
  2. Unblock the script by right click on the files and select “Unblock” after you download. Otherwise you might get a warning when you run the script.

Uninstall

If you want to uninstall all the Power BI PowerShell cmdlets, run the following in an elevated PowerShell session:

Get-Module MicrosoftPowerBIMgmt* -ListAvailable | Uninstall-Module -Force

Usage

The APIs below supports two optional parameters:

  • -Environment: A flag to indicate specific Power BI environments to log in to (Public, Germany, USGov, China, USGovHigh, USGovMil). Default is Public
  • -V: A flag to indicate whether to produce verbose output. Default is false.

Export workspace

Exports all the dataflow model.json from a Power BI workspace into a folder:

.\ExportWorkspace.ps1 -Workspace "Workspace1" -Location C:\dataflows

Import workspace

Imports all the dataflow model.json from a folder into a Power BI workspace. This script also fixes the reference models to point to the right dataflow in the current workspace:

.\ImportWorkspace.ps1 -Workspace "Workspace1" -Location C:\dataflows -Overwrite

Import dataflow

Imports a dataflow model.json into a Power BI workspace:

.\ImportModel.ps1 -Workspace "Workspace1" -File C:\MyModel.json -Overwrite

Data Culture: Picking Your Battles

Not all data is created equal.

One size does not fit all.

In addition to collaboration and partnership between business and IT, successful data cultures have something  else in common: they recognize the need for both discipline and flexibility, and have clear, consistent criteria and responsibilities that let all stakeholders know what controls apply to what data and applications.

2020-08-01-19-55-59-794--POWERPNT

Today’s video looks at this key fact, and emphasizes this important point: you need to pick your battles[1].

If you try to lock everything down and manage all data and applications rigorously, business users who need more agility will not be able to do their jobs – or more likely they will simply work around your controls. This approach puts you back into the bad old days before there were robust and flexible self-service BI tools – you don’t want this.

If you try to let every user do whatever they want with any data, you’ll quickly find yourself in the “wild west” days – you don’t want that either.

Instead, work with your executive sponsor and key stakeholders from business and IT to understand what requires discipline and control, and what supports flexibility and agility.

One approach will never work for all data – don’t try to make it fit.


[1] The original title of this post and video was “discipline and flexibility” but when the phrase “pick your battles” came out unscripted[2] as I was recording the video, I realized that no other title would be so on-brand for me. And here we are.

[2] In case you were wondering, it’s all unscripted. Every time I edit and watch a recording, I’m surprised. True story.

Dataflows and API data sources

These days more and more data lives behind an API, rather than in a more traditional data source like databases or files. These APIs are often designed and optimized for small, chatty[1] interactions that don’t lend themselves well to use as a source for business intelligence.

beehive-337695_1280
I’d explain the choice of image for APIs, but it doesn’t take a genus to figure it out

These APIs are often slower[3] than a database, which can increase load/refresh times. Sometimes the load time is so great that a refresh may not fit within the minimum window based on an application’s functional requirements.

These APIs may also be throttled. SaaS application vendors often have a billing model that doesn’t directly support frequent bulk operations, so to avoid customer behaviors that affect their COGS and their bottom line, their APIs may be limited to a certain number of calls[4] for a given period.

The bottom line is that when you’re using APIs as a data source in Power BI, you need to take the APIs’ limitations into consideration, and often dataflows can help deliver a solution that accommodates those limitations while delivering the functionality your application needs.

I’ve covered some of the fundamental concepts of this approach in past blog posts , specifically this one from 2018 on Power BI Dataflows and Slow Data Sources, and this one from 2019 on Creating “data workspaces” for dataflows and shared datasets.  I hadn’t planned on having a dedicated new blog post, but after having this pattern come up in 4 or 5 different contexts in the past week or so, I thought maybe a post was warranted.

In late July I met with a customer to discuss their Power BI dataflows architecture. They showed me a “before” picture that looked something like this:

2020-08-01-13-40-14-887--POWERPNT

One of their core data sources was the API for a commercial software application. Almost every Power BI application used some data from this API, because the application supports almost every part of their business. This introduced a bunch of familiar challenges:

  • Training requirements and a steeper-then-necessary learning curve for SSBI authors due to the complex and unintuitive API design
  • Long refresh times for large data extracts
  • Complex, redundant queries in many applications
  • Technical debt and maintenance due to the duplicate logic

Then they showed me an “after” picture that looked something like this:

2020-08-01-13-40-32-810--POWERPNT

They had implemented a set of dataflows in a dedicated workspace. These dataflows have entities that pull data from the source APIs, and make them available for consumption by IT and business Power BI authors. All data transformation logic is implemented exactly once, and each author can easily connect to trusted tabular data without needing to worry about technical details like connection strings, API parameters, authentication, or paging. The dataflows are organized by the functional areas represented in the data, mapping roughly to the APIs in the source system as viewed through the lens of their audience.

The diagram above simplifies things a bit. For their actual implementation they used linked and computed entities to stage, prepare, and present the dataflows, but for the general pattern the diagram above is probably the right level of abstraction. Each IT-developed or business-developed application uses the subset of the dataflows that it needs, and the owners of the dataflows keep them up to date and current.

Life is good[5].


[1]There’s a lot of information available online covering chatty and chunky patterns in API design, so if you’re not sure what I’m talking about here, you might want to poke around and take a peek[2] at what’s out there.

[2] Please let me know if you found the joke in footnote[1].

[3] Possibly by multiple orders of magnitude.

[4] Or a given data volume, etc. There are probably as many variations in licensing as there are SaaS vendors with APIs.

[5] If you’re wondering about the beehive image and the obtuse joke it represents, the genus for honey bees is “apis.” I get all of my stock photography from the wonderful web site Pixabay. I will typically search on a random term related to my blog post to find a thumbnail image, and when the context is completely different like it was today I will pick one that I like. For this post I searched on “API” and got a page full of bees, which sounds like something Bob Ross would do…

Viral adoption: Self-service BI and COVID-19

I live 2.6 miles (4.2 km) from the epicenter of the coronavirus outbreak in Washington state. You know, the nursing home that’s been in the news, where over 10 people have died, and dozens more are infected.[1]

As you can imagine, this has started me thinking about self-service BI.

2020-03-10-17-44-57-439--msedge
Where can I find information I can trust?[2]
When the news started to come out covering the US outbreak, there was something I immediately noticed: authoritative information was very difficult to find. Here’s a quote from that last link.

This escalation “raises our level of concern about the immediate threat of COVID-19 for certain communities,” Dr. Nancy Messonnier, director of the CDC’s National Center for Immunization and Respiratory Diseases, said in the briefing. Still, the risk to the general public not in these areas is considered to be low, she said.

That’s great, but what about the general public in these areas?

What about me and my family?

When most of what I saw on Twitter was people making jokes about Jira tickets[3], I was trying to figure out what was going on, and what I needed to do. What actions should I take to stay safe? What actions were unnecessary or unhelpful?

Before I could answer these questions, I needed to find sources of information. This was surprisingly difficult.

Specifically, I needed to find sources of information that I could trust. There was already a surge in misinformation, some of it presumably well-intentioned, and some from deliberately malicious actors. I needed to explore, validate, confirm, cross-check, act, and repeat. And I was doing this while everyone around me seemed to be treating the emerging pandemic as a joke or a curiosity.

I did this work and made my decisions because I was a highly-motivated stakeholder, while others in otherwise similar positions were farther away from the problem, and were naturally less motivated at the time.[4]

And this is what got me thinking about self-service BI.

In many organizations, self-service BI tools like Power BI will spread virally. A highly-motivated business user will find a tool, find some data, explore, iterate, refine, and repeat. They will work with untrusted – and sometimes untrustworthy – data sources to find the information they need to use, and to make the decisions they need to make. And they do it before people in similar positions are motivated enough to act.

But before long, scraping together whatever data is available isn’t enough anymore. As the number of users relying on the insights being produced increases – even if the insights are being produced by a self-service BI solution – the need for trusted data increases as well.

Where an individual might successfully use disparate unmanaged sources successfully, a population needs a trusted source of truth.

At some point a central authority needs to step up, to make available the data that can serve as that single source of truth. This is easier said than done[5], but it must be done. And this isn’t even the hard part.

The hard part is getting everyone to stop using the unofficial and untrusted sources that they’ve been using to make decisions, and to use the trusted source instead. This is difficult because these users are invested in their current sources, and believe that they are good enough. They may not be ideal, but they work, right? They got me this far, so why should I have to stop using them just because someone says so?

This brings me back to those malicious actors mentioned earlier. Why would someone deliberately share false information about public health issues when lies could potentially cost people their lives? They would do it when the lies would help forward an agenda they value more than they value other people’s lives.

In most business situations, lives aren’t at stake, but people still have their own agendas. I’ve often seen situations where the lack of a single source of truth allows stakeholders to present their own numbers, skewed to make their efforts look more successful than they actually are. Some people don’t want to have to rebuild their reports – but some people want to use falsified numbers so they can get a promotion, or a bonus, or a raise.

Regardless of the reason for using untrusted sources, their use is damaging and should be reduced and eliminated. This is true of business data and analytics, and it is true of the current global health crisis. In both arenas, let’s all be part of the solution, not part of the problem.

Let us be a part of the cure, never part of the plague – we’ll only be remembered for what we create.[6]


[1] Before you ask, yes, my family and I are healthy and well. I’ve been working from home for over a week now, which is a nice silver lining; I have a small but comfortable home office, and can avoid the obnoxious Seattle-area commute.

[2] This article is the best single source I know of. It’s not authoritative source for the subject, but it is aggregating and citing authoritative sources and presenting their information in a form closer to the solution domain than to the problem domain.

[3] This is why I’ve been practicing social media distancing.

[4] This is the where the “personal pandemic parable” part of the blog post ends. From here on it’s all about SSBI. If you’re actually curious, I erred on the side of caution and started working from home and avoiding crowds before it was recommended or mandated. I still don’t know if all of the actions I’ve taken were necessary, but I’m glad I took them and I hope you all stay safe as well.

[5] As anyone who has ever implemented a single source of truth for any non-trivial data domain can attest.

[6] You can enjoy the lyrics even if Kreator’s awesome music isn’t to your taste.

Building a data culture

tl;dr – to kick off 2020 we’re starting a new BI Polar video series focusing on building a data culture, and the first video introduces the series. You should watch it and share it.

Succeeding with a tool like Power BI is easy – self-service BI tools let more users do more things with data more easily, and can help reduce the reporting burden on IT teams.

Succeeding at scale with a tool like Power BI is not easy. It’s very difficult, not because of the technology, but because of the context in which the technology is used. Organizations adopt self-service BI tools because their existing approaches to working with data are no longer successful – and because the cost and pain[1] of change has become outweighed by the cost and pain of maintaining course.

Tool adoption may be top-down, encouraged or mandated by senior management as a broad organization-wide effort. Adoption may be bottom-up, growing organically and virally in the teams and departments least well served by the existing tools and processes in place.

Both of these approaches[2] can be successful, and both of these approaches can fail. The most important success factor is a data culture in which the proper use of self-service BI tools can deliver the greatest value for the organization.

The most important success factor is a data culture

books-1655783_640
There must be a data culture on the other side of this door.

Without an organizational culture that values, encourages, recognizes, and rewards users and teams for their use of data, no tool and no amount of effort and skill is enough to achieve the full potential of the tools – or of the data.

In this new video series we’ll be covering practices that will help build a data culture. More specifically, we’ll introduce common practices that are exhibited by large organizations that have mature and successful data cultures. Each culture is unique, but there are enough commonalities to identify patterns and anti-patterns.

The content in this series will be informed by my work with enterprise Power BI customers as part of my day job[3], and will complement nicely[4] the content and guidance in the Power BI Adoption Framework.

Back in November when the 100th BI Polar blog post was published, I asked what everyone wanted to read about in the next 100 posts. There were lots of different ideas and suggestions, but the most common theme was around guidance like this. Hopefully you’ll enjoy the result – and hopefully you’ll let me know either way.


[1] I strongly believe that pain is a fundamental precursor to significant change. If there is no pain, there is no motivation to change. Only when the pain of not changing exceeds the perceived pain of going through the change will most people and organizations consider giving up the status quo. There are occasional exceptions, but in my experience these are very rare.

[2] Including any number of variations – these approaches are common points on a wide spectrum, but should not be interpreted as the only ways to adopt Power BI or other self-service BI tools.

[3] By day I’m a masked crime-fighter. Or a member of the Power BI customer advisory team. Or both. It varies from day to day.

[4] Hopefully this will be true. I’m at least as interested in seeing where this ends up as you are.

Video: A most delicious analogy

Every time I cook or bake something, I think about how the tasks and patterns present in making food have strong and significant parallels with building BI[1] solutions. At some point in the future I’m likely to write a “data mis en place” blog post, but for today I decided to take a more visual approach, starting with one of my favorite holiday recipes[2].

Check it out:

(Please forgive my clickbaitey title and thumbnail image. I was struggling to think of a meaningful title and image, and decided to have a little fun with this one.)

I won’t repeat all of the information from the video here, but I will share a view of what’s involved in making this self-service BI treat.

2019-12-17-12-52-36-894--VISIO

When visualized like this, the parallels between data development and reuse are probably a bit more obvious. Please take a look at the video, and see what others jump out at you.

And please let me know what you think. Seriously.


[1] And other types of software, but mainly BI these days.

[2] I published this recipe almost exactly a year ago. The timing isn’t intentional, but it’s interesting to me to see this pattern emerging as well…

The Power BI Adoption Framework – it’s Power BI AF

You may have seen things that make you say “that’s Power BI AF” but none of them have come close to this. It’s literally the Power BI AF[1].

That’s right – this week Microsoft published the Power BI Adoption Framework on GitHub and YouTube. If you’re impatient, here’s the first video – you can jump right in. It serves as an introduction to the framework, its content, and its goals.

Without attempting to summarize the entire framework, this content provides a set of guidance, practices, and resources to help organizations build a data culture, establish a Power BI center of excellence, and manage Power BI at any scale.

Even though I blog a lot about Power BI dataflows, most of my job involves working with enterprise Power BI customers – global organizations with thousands of users across the business who are building, deploying, and consuming BI solutions built using Power BI.

Each of these large customers takes their own approach to adopting Power BI, at least when it comes to the details. But with very few exceptions[2], each successful customer will align with the patterns and practices presented in the Power BI Adoption Framework – and when I work with a customer that is struggling with their global Power BI rollout, their challenges are often rooted in a failure to adopt these practices.

There’s no single “right way” to be successful with Power BI, so don’t expect a silver bullet. Instead, the Power BI Adoption Framework presents a set of roles, responsibilities, and behaviors that have been developed after working with customers in real-world Power BI deployments.

If you look on GitHub today, you’ll find a set of PowerPoint decks broken down into five topics, plus a few templates.

2019-12-12-12-09-19-166--msedge

These slide decks are still a little rough. They were originally built for use by partners who could customize and deliver them as training content for their customers[3], rather than for direct use by the general public, and as of today they’re still a work in progress. But if you can get past the rough edges, there’s definitely gold to be found. This is the same content I used when I put together my “Is self-service business intelligence a two-edged sword?” presentation earlier this year, and for the most part I just tweaked the slide template and added a bunch of sword pictures.

And if the slides aren’t quite ready for you today, you can head over to the official Power BI YouTube channel where this growing playlist contains bite-size training content to supplement the slides. As of today there are two videos published – expect much more to come in the days and weeks ahead.

The real heroes of this story[4] are Manu Kanwarpal and Paul Henwood.  They’re both cloud solution architects working for Microsoft in the UK. They’ve put the Power BI AF together, delivered its content to partners around the world, and are now working to make it available to everyone.

What do you think?

To me, this is one of the biggest announcements of the year, but I really want to hear from you after you’ve checked out the Power BI AF. What questions are still unanswered? What does the AF not do today that you want or need it to do tomorrow?

Please let me know in the comments below – this is just a starting point, and there’s a lot that we can do with it from here…


[1] If you had any idea how long I’ve been waiting to make this joke…

[2] I can’t think of a single exception at the moment, but I’m sure there must be one or two. Maybe.

[3] Partners can still do this, of course.

[4] Other than you, of course. You’re always a hero too – never stop doing what you do.

Using and reusing Power BI dataflows

I use this diagram a lot[1]:

excel white

This diagram neatly summarizes a canonical use case for Power BI dataflows, with source data being ingested and processed as part of an end-to-end BI application. It showcases the Lego-like composition that’s possible with dataflows. But it also has drawbacks – its simplicity omits common scenarios for using and reusing dataflows.

So, let’s look at what’s shown – and at what’s not shown – in my favorite diagram. Let’s look at some of the ways these dataflows and their entities can be used.

  1. Use the final entities as-is: This is the scenario implied by the diagram. The entities in the “Final Business View” dataflow represent a star schema, and are loaded as-is into a dataset.
  2. Use the final entities with modification: The entities in the “Final Business View” dataflow are loaded into a dataset, but with additional transformation or filtering applied in the dataset’s queries.
  3. Use the final entities with mashup: The entities in the “Final Business View” dataflow are loaded into a dataset, but with additional data from other sources added via the dataset’s queries.
  4. Use upstream entities: The entities in other dataflows are loaded into a dataset, likely with transformations and filtering applied, and with data from other sources added via the dataset’s queries.

Please understand that this list is not exhaustive. There are likely dozens of variations on these themes that I have not called out explicitly. Use this list as a starting point and see where dataflows will take you. I’ll keep the diagram simple, but you can build solutions as complex as you need them to be.


[1] This is my diagram. There are many like it, but this one is mine.