Viral adoption: Self-service BI and COVID-19

I live 2.6 miles (4.2 km) from the epicenter of the coronavirus outbreak in Washington state. You know, the nursing home that’s been in the news, where over 10 people have died, and dozens more are infected.[1]

As you can imagine, this has started me thinking about self-service BI.

2020-03-10-17-44-57-439--msedge
Where can I find information I can trust?[2]
When the news started to come out covering the US outbreak, there was something I immediately noticed: authoritative information was very difficult to find. Here’s a quote from that last link.

This escalation “raises our level of concern about the immediate threat of COVID-19 for certain communities,” Dr. Nancy Messonnier, director of the CDC’s National Center for Immunization and Respiratory Diseases, said in the briefing. Still, the risk to the general public not in these areas is considered to be low, she said.

That’s great, but what about the general public in these areas?

What about me and my family?

When most of what I saw on Twitter was people making jokes about Jira tickets[3], I was trying to figure out what was going on, and what I needed to do. What actions should I take to stay safe? What actions were unnecessary or unhelpful?

Before I could answer these questions, I needed to find sources of information. This was surprisingly difficult.

Specifically, I needed to find sources of information that I could trust. There was already a surge in misinformation, some of it presumably well-intentioned, and some from deliberately malicious actors. I needed to explore, validate, confirm, cross-check, act, and repeat. And I was doing this while everyone around me seemed to be treating the emerging pandemic as a joke or a curiosity.

I did this work and made my decisions because I was a highly-motivated stakeholder, while others in otherwise similar positions were farther away from the problem, and were naturally less motivated at the time.[4]

And this is what got me thinking about self-service BI.

In many organizations, self-service BI tools like Power BI will spread virally. A highly-motivated business user will find a tool, find some data, explore, iterate, refine, and repeat. They will work with untrusted – and sometimes untrustworthy – data sources to find the information they need to use, and to make the decisions they need to make. And they do it before people in similar positions are motivated enough to act.

But before long, scraping together whatever data is available isn’t enough anymore. As the number of users relying on the insights being produced increases – even if the insights are being produced by a self-service BI solution – the need for trusted data increases as well.

Where an individual might successfully use disparate unmanaged sources successfully, a population needs a trusted source of truth.

At some point a central authority needs to step up, to make available the data that can serve as that single source of truth. This is easier said than done[5], but it must be done. And this isn’t even the hard part.

The hard part is getting everyone to stop using the unofficial and untrusted sources that they’ve been using to make decisions, and to use the trusted source instead. This is difficult because these users are invested in their current sources, and believe that they are good enough. They may not be ideal, but they work, right? They got me this far, so why should I have to stop using them just because someone says so?

This brings me back to those malicious actors mentioned earlier. Why would someone deliberately share false information about public health issues when lies could potentially cost people their lives? They would do it when the lies would help forward an agenda they value more than they value other people’s lives.

In most business situations, lives aren’t at stake, but people still have their own agendas. I’ve often seen situations where the lack of a single source of truth allows stakeholders to present their own numbers, skewed to make their efforts look more successful than they actually are. Some people don’t want to have to rebuild their reports – but some people want to use falsified numbers so they can get a promotion, or a bonus, or a raise.

Regardless of the reason for using untrusted sources, their use is damaging and should be reduced and eliminated. This is true of business data and analytics, and it is true of the current global health crisis. In both arenas, let’s all be part of the solution, not part of the problem.

Let us be a part of the cure, never part of the plague – we’ll only be remembered for what we create.[6]


[1] Before you ask, yes, my family and I are healthy and well. I’ve been working from home for over a week now, which is a nice silver lining; I have a small but comfortable home office, and can avoid the obnoxious Seattle-area commute.

[2] This article is the best single source I know of. It’s not authoritative source for the subject, but it is aggregating and citing authoritative sources and presenting their information in a form closer to the solution domain than to the problem domain.

[3] This is why I’ve been practicing social media distancing.

[4] This is the where the “personal pandemic parable” part of the blog post ends. From here on it’s all about SSBI. If you’re actually curious, I erred on the side of caution and started working from home and avoiding crowds before it was recommended or mandated. I still don’t know if all of the actions I’ve taken were necessary, but I’m glad I took them and I hope you all stay safe as well.

[5] As anyone who has ever implemented a single source of truth for any non-trivial data domain can attest.

[6] You can enjoy the lyrics even if Kreator’s awesome music isn’t to your taste.

Building a data culture

tl;dr – to kick off 2020 we’re starting a new BI Polar video series focusing on building a data culture, and the first video introduces the series. You should watch it and share it.

Succeeding with a tool like Power BI is easy – self-service BI tools let more users do more things with data more easily, and can help reduce the reporting burden on IT teams.

Succeeding at scale with a tool like Power BI is not easy. It’s very difficult, not because of the technology, but because of the context in which the technology is used. Organizations adopt self-service BI tools because their existing approaches to working with data are no longer successful – and because the cost and pain[1] of change has become outweighed by the cost and pain of maintaining course.

Tool adoption may be top-down, encouraged or mandated by senior management as a broad organization-wide effort. Adoption may be bottom-up, growing organically and virally in the teams and departments least well served by the existing tools and processes in place.

Both of these approaches[2] can be successful, and both of these approaches can fail. The most important success factor is a data culture in which the proper use of self-service BI tools can deliver the greatest value for the organization.

The most important success factor is a data culture

books-1655783_640
There must be a data culture on the other side of this door.

Without an organizational culture that values, encourages, recognizes, and rewards users and teams for their use of data, no tool and no amount of effort and skill is enough to achieve the full potential of the tools – or of the data.

In this new video series we’ll be covering practices that will help build a data culture. More specifically, we’ll introduce common practices that are exhibited by large organizations that have mature and successful data cultures. Each culture is unique, but there are enough commonalities to identify patterns and anti-patterns.

The content in this series will be informed by my work with enterprise Power BI customers as part of my day job[3], and will complement nicely[4] the content and guidance in the Power BI Adoption Framework.

Back in November when the 100th BI Polar blog post was published, I asked what everyone wanted to read about in the next 100 posts. There were lots of different ideas and suggestions, but the most common theme was around guidance like this. Hopefully you’ll enjoy the result – and hopefully you’ll let me know either way.


[1] I strongly believe that pain is a fundamental precursor to significant change. If there is no pain, there is no motivation to change. Only when the pain of not changing exceeds the perceived pain of going through the change will most people and organizations consider giving up the status quo. There are occasional exceptions, but in my experience these are very rare.

[2] Including any number of variations – these approaches are common points on a wide spectrum, but should not be interpreted as the only ways to adopt Power BI or other self-service BI tools.

[3] By day I’m a masked crime-fighter. Or a member of the Power BI customer advisory team. Or both. It varies from day to day.

[4] Hopefully this will be true. I’m at least as interested in seeing where this ends up as you are.

Video: A most delicious analogy

Every time I cook or bake something, I think about how the tasks and patterns present in making food have strong and significant parallels with building BI[1] solutions. At some point in the future I’m likely to write a “data mis en place” blog post, but for today I decided to take a more visual approach, starting with one of my favorite holiday recipes[2].

Check it out:

(Please forgive my clickbaitey title and thumbnail image. I was struggling to think of a meaningful title and image, and decided to have a little fun with this one.)

I won’t repeat all of the information from the video here, but I will share a view of what’s involved in making this self-service BI treat.

2019-12-17-12-52-36-894--VISIO

When visualized like this, the parallels between data development and reuse are probably a bit more obvious. Please take a look at the video, and see what others jump out at you.

And please let me know what you think. Seriously.


[1] And other types of software, but mainly BI these days.

[2] I published this recipe almost exactly a year ago. The timing isn’t intentional, but it’s interesting to me to see this pattern emerging as well…

The Power BI Adoption Framework – it’s Power BI AF

You may have seen things that make you say “that’s Power BI AF” but none of them have come close to this. It’s literally the Power BI AF[1].

That’s right – this week Microsoft published the Power BI Adoption Framework on GitHub and YouTube. If you’re impatient, here’s the first video – you can jump right in. It serves as an introduction to the framework, its content, and its goals.

Without attempting to summarize the entire framework, this content provides a set of guidance, practices, and resources to help organizations build a data culture, establish a Power BI center of excellence, and manage Power BI at any scale.

Even though I blog a lot about Power BI dataflows, most of my job involves working with enterprise Power BI customers – global organizations with thousands of users across the business who are building, deploying, and consuming BI solutions built using Power BI.

Each of these large customers takes their own approach to adopting Power BI, at least when it comes to the details. But with very few exceptions[2], each successful customer will align with the patterns and practices presented in the Power BI Adoption Framework – and when I work with a customer that is struggling with their global Power BI rollout, their challenges are often rooted in a failure to adopt these practices.

There’s no single “right way” to be successful with Power BI, so don’t expect a silver bullet. Instead, the Power BI Adoption Framework presents a set of roles, responsibilities, and behaviors that have been developed after working with customers in real-world Power BI deployments.

If you look on GitHub today, you’ll find a set of PowerPoint decks broken down into five topics, plus a few templates.

2019-12-12-12-09-19-166--msedge

These slide decks are still a little rough. They were originally built for use by partners who could customize and deliver them as training content for their customers[3], rather than for direct use by the general public, and as of today they’re still a work in progress. But if you can get past the rough edges, there’s definitely gold to be found. This is the same content I used when I put together my “Is self-service business intelligence a two-edged sword?” presentation earlier this year, and for the most part I just tweaked the slide template and added a bunch of sword pictures.

And if the slides aren’t quite ready for you today, you can head over to the official Power BI YouTube channel where this growing playlist contains bite-size training content to supplement the slides. As of today there are two videos published – expect much more to come in the days and weeks ahead.

The real heroes of this story[4] are Manu Kanwarpal and Paul Henwood.  They’re both cloud solution architects working for Microsoft in the UK. They’ve put the Power BI AF together, delivered its content to partners around the world, and are now working to make it available to everyone.

What do you think?

To me, this is one of the biggest announcements of the year, but I really want to hear from you after you’ve checked out the Power BI AF. What questions are still unanswered? What does the AF not do today that you want or need it to do tomorrow?

Please let me know in the comments below – this is just a starting point, and there’s a lot that we can do with it from here…


[1] If you had any idea how long I’ve been waiting to make this joke…

[2] I can’t think of a single exception at the moment, but I’m sure there must be one or two. Maybe.

[3] Partners can still do this, of course.

[4] Other than you, of course. You’re always a hero too – never stop doing what you do.

Using and reusing Power BI dataflows

I use this diagram a lot[1]:

excel white

This diagram neatly summarizes a canonical use case for Power BI dataflows, with source data being ingested and processed as part of an end-to-end BI application. It showcases the Lego-like composition that’s possible with dataflows. But it also has drawbacks – its simplicity omits common scenarios for using and reusing dataflows.

So, let’s look at what’s shown – and at what’s not shown – in my favorite diagram. Let’s look at some of the ways these dataflows and their entities can be used.

  1. Use the final entities as-is: This is the scenario implied by the diagram. The entities in the “Final Business View” dataflow represent a star schema, and are loaded as-is into a dataset.
  2. Use the final entities with modification: The entities in the “Final Business View” dataflow are loaded into a dataset, but with additional transformation or filtering applied in the dataset’s queries.
  3. Use the final entities with mashup: The entities in the “Final Business View” dataflow are loaded into a dataset, but with additional data from other sources added via the dataset’s queries.
  4. Use upstream entities: The entities in other dataflows are loaded into a dataset, likely with transformations and filtering applied, and with data from other sources added via the dataset’s queries.

Please understand that this list is not exhaustive. There are likely dozens of variations on these themes that I have not called out explicitly. Use this list as a starting point and see where dataflows will take you. I’ll keep the diagram simple, but you can build solutions as complex as you need them to be.


[1] This is my diagram. There are many like it, but this one is mine.

 

Quick Tip: Factoring your dataflow entities

This post started as a response to this question from Mark, who was commenting on last week’s data lineage post:

How would you decide how big or how small to make each artifact in the lineage, in terms of the amount of transformations taking place inside the artifact? In my case they would only be shared with 2-3 other users.

For instance I could go all out and have every step that would previously take place in a query editor result in a new link in the data lineage chain, but that would probably be overkill.

I agree that “one step per dataflow” would be overkill, but beyond that the answer is largely “it depends.”

Image by Esi Grünhagen from Pixabay

The approach I generally take is to break the end to end data preparation down into blocks that look like this:

  1. Staging – getting the source data into the system (in this case dataflow, but could be data mart, data warehouse, data lake, etc.) with zero or minimal transformations
  2. Cleansing – correcting known data quality and format problems from the staged data
  3. Transformation 1 – getting the cleansed data into the shape required for intended downstream purposes
  4. Enrichment – adding data from other sources, which have ideally already gone through steps 1 through 3
  5. Transformation 2 – getting the cleansed and enriched data into the shape required for analysis, typically as dimensions and facts

The final step may also be performed in the queries that are used to create the final tabular model when creating a dataset in Power BI Desktop. If a given dimension is likely to be used in multiple datasets, implement it as a dataflow entity. If it isn’t, implement it as a table in your dataset.

These guidelines tend to create a moderate number of easily maintainable entities, but they’re obviously the bare minimum – take what works for you, and discard the rest.

I feel like I’m dating myself with this link[1], but I definitely recommend looking at the Kimball Group’s techniques for data warehousing and BI: resources link. Ralph Kimball and his amazing team know more about this stuff than I will ever forget (or something like that) and there’s a huge volume of guidance available. Do yourself a favor and check it out.


[1] I assume there are newer resources out there, but when I was your age it was the Kimball Method or the… synonym for highway that rhymes with method.

Are you building a BI house of cards?

Every few weeks I see someone asking about using Analysis Services as a data source for Power BI dataflows. Every time I hear this, I cringe, and then include advice something like this[1] in my response.

Using Analysis Services as a data source is an anti-pattern – a worst practice. It is not recommended, and any solution built using this pattern is likely to produce dissatisfied customers. Please strongly consider using other data sources, likely the data sources on which the AS model is built.

 

There are multiple reasons for this advice.

 

Some reasons are technical. Extraction of large volumes of data is not what an Analysis Services model is designed for. Performance for the ETL process is likely to be poor, and you’re likely end up with memory/caching issues on the Analysis Services server. Beyond this, AS models typically don’t include the IDs/surrogate keys that you need for data warehousing, so joining the AS data to other data sources will be problematic.[2]

 

For some specific examples and technical deep dives into how and why this is a bad idea, check out this excellent blog post from Shabnam Watson. The focus of the post is on SSAS memory settings, but it’s very applicable to the current discussion.

 

Some reasons for this advice are less technical, but no less important. Using analytics models as data sources for ETL processing are a strong code smell[3] (“any characteristic in the source code of a program that possibly indicates a deeper problem”) for business intelligence solutions.

 

Let’s look at a simple and familiar diagram:

 

01 good

 

There’s a reason this left-to-right flow is the standard representation of BI applications: it’s what works. Each component has specific roles and responsibilities that complement each other, and which are aligned with the technology used to implement the component. This diagram includes a set of logical “tiers” or “layers” that are common in analytics systems, and which mutually support each other to achieve the systems’ goals.
Although there are many successful variations on this theme, they all tend to have this general flow and these general layers. Consider this one, for example:

 

02 ok

This example has more complexity, but also has the same end-to-end flow as the simple one. This is pretty typical for  scenarios where a single data warehouse and analytics model won’t fulfill all requirements, so the individual data warehouses, data marts, and analytics models each contain a portion – often an overlapping portion – of the analytics data.

Let’s look at one more:

03 - trending badly

This design is starting to smell. The increased complexity and blurring of responsibilities will produce difficulties in data freshness and maintenance. The additional dependencies, and the redundant and overlapping nature of the dependencies means that any future changes will require additional investigation and care to ensure that there are no unintended side effects to the existing functionality.

As an aside, my decades of working in data and analytics suggest that this care will rarely actually be taken. Instead, this architecture will be fragile and prone to problems, and the teams that built it will not be the teams who solve those problems.

And then we have this one[4]:

04 - hard no

This is what you get when you use Analysis Services as the data source for ETL processing, whether that ETL and downstream storage is implemented in Power BI dataflows or different technologies. And this is probably the best case you’re likely to get when you go down this path. Even with just two data warehouses and two analytics models in the diagram, the complex and unnatural dependencies are obvious, and are painful to consider.

What would be better here?[5] As mentioned at the top of the post, the logical alternative is to avoid using the analytics model and to instead use the same sources that the analytics model already uses. This may require some refactoring to ensure that the duplication of logic is minimized. It may require some political or cross-team effort to get buy-in from the owners of the upstream systems. It may not be simple, or easy. But it is almost always the right thing to do.

Don’t take shortcuts to save days or weeks today that will cause you or your successors months or years to undo and repair. Don’t build a house of cards, because with each new card you add, the house is more and more likely to fall.

Update: The post above focused mainly on technical aspects of the anti-pattern, and suggests alternative recommended patterns to follow instead. It does not focus on the reasons why so many projects are pushed into the anti-pattern in the first place. Those reasons are almost always based on human – not technical – factors.

You should read this post next: http://workingwithdevs.com/its-always-a-people-problem/. It presents a delightful and succinct approach to deal with the root causes, and will put the post you just read in a different context.


[1] Something a lot like this. I copied this from a response I sent a few days ago.

[2] Many thanks to Chris Webb for some of the information I’ve paraphrased here. If you want to hear more from Chris on this subject, check out this session recording from PASS Summit 2017. The whole session is excellent; the information most relevant to this subject begins around the 26 minute mark in the recording. Chris also gets credit for pointing me to Shabnam Watson’s blog.

[3] I learned about code smells last year when I attended a session by Felienne Hermans at Craft Conference in Budapest. You can watch the session here. And you really should, because it’s really good.

[4] My eyes are itching just looking at it. It took an effort of will to create this diagram, much less share it.

[5] Yes, just about anything would be better.