BI is dead. Long live BI!

As I was riding the bus home from jury duty the other day[1] I saw this tweet come in from Eric Vogelpohl.

 

There’s a lot to unpack here. and I don’t expect to do it all justice in this post, but Eric’s thought-provoking tweet made me want to reply, and I knew it wouldn’t fit into 280 characters… but I can tackle some of the more important and interesting elements.

First and foremost, Eric tags me before he tags Marco, Chris, or Curbal. I am officially number one, and I will never let Marco or Chris forget it[2].

With that massive ego boost out of the way, let’s get to the BI, which is definitely dead. And also definitely not dead.

Eric’s post starts off with a bold and simple assertion: If you have the reactive/historical insights you need today, you have enough business intelligence and should focus on other things instead. I’m paraphrasing, but I believe this effectively captures the essence of his claim. Let me pick apart some of the assumptions I believe underlie this assertion.

First, this claim seems to assume that all organizations are “good w/ BI.” Although this may be true of an increasing number of mature companies, in my experience it is definitely not something that can be taken for granted. The alignment of business and technology, and the cultural changes required to initiate and maintain this alignment, are not yet ubiquitous.

Should they be? Should we be able to take for granted that in 2019 companies have all the BI they need? [3]

The second major assumption behind Eric’s first point seems to be that “good w/ BI” today translates to “good w/ BI” tomorrow… as if BI capabilities are a blanket solution rather than something scoped and constrained to a specific set of business and data domains. In reality[4], BI capabilities are developed and deployed incrementally based on priorities and constraints, and are then maintained and extended as the priorities and constraints evolve over time.

My job gives me the opportunity to work with large enterprise companies to help them succeed in their efforts related to data, business intelligence, and analytics. Many of these companies have built successful BI architectures and are reaping the benefits of their work. These companies may well be characterized as being “good w/ BI” but none of them are resting on their laurels – they are instead looking for ways to extend the scope of their BI investments, and to optimize what they have.

I don’t believe BI is going anywhere in the near future. Not only are most companies not “good w/ BI” today, the concept of being “good w/ BI” simply doesn’t make sense in the context in which BI exists. So long as business requirements and environments change over time, and so long as businesses need to understand and react, there will be a continuing need for BI. Being “good w/ BI” isn’t a meaningful concept beyond a specific point in time… and time never slows down.

If your refrigerator is stocked with what your family likes to eat, are you “good w/ food”? This may be the case today, but what about when your children become teenagers and eat more? What about when someone in the family develops food allergies? What about when one of your children goes vegan? What about when the kids go off to college? Although this analogy won’t hold up to close inspection[5] it hopefully shows how difficult it is to be “good” over the long term, even for a well-understood problem domain, when faced with easily foreseeable changes over time.

Does any of this mean that BI represents the full set of capabilities that successful organizations need? Definitely not. More and more, BI is becoming “table stakes” for businesses. Without BI it’s becoming more difficult for companies to simply survive, and BI is no longer a true differentiator that assures a competitive advantage. For that advantage, companies need to look at other ways to get value from their data, including predictive and prescriptive analytics, and the development of a data culture that empowers and encourages more people to do more things with more data in the execution of their duties.

And of course, this may well have been Eric’s point from the beginning…

 


[1] I’ve been serving on the jury for a moderately complex civil trial for most of August, and because the trial is in downtown Seattle during business hours I have been working early mornings and evenings in the office, and taking the bus to the courthouse to avoid the traffic and parking woes that plague Seattle. I am very, very tired.

[2] Please remind me to add “thought leader” to my LinkedIn profile. Also maybe something about blockchain.

[3] I’ll leave this as an exercise for the reader.

[4] At least in my reality. Your mileage may vary.

[5] Did this analogy hold up to even distant observation?

Are you building a BI house of cards?

Every few weeks I see someone asking about using Analysis Services as a data source for Power BI dataflows. Every time I hear this, I cringe, and then include advice something like this[1] in my response.

Using Analysis Services as a data source is an anti-pattern – a worst practice. It is not recommended, and any solution built using this pattern is likely to produce dissatisfied customers. Please strongly consider using other data sources, likely the data sources on which the AS model is built.

 

There are multiple reasons for this advice.

 

Some reasons are technical. Extraction of large volumes of data is not what an Analysis Services model is designed for. Performance for the ETL process is likely to be poor, and you’re likely end up with memory/caching issues on the Analysis Services server. Beyond this, AS models typically don’t include the IDs/surrogate keys that you need for data warehousing, so joining the AS data to other data sources will be problematic.[2]

 

For some specific examples and technical deep dives into how and why this is a bad idea, check out this excellent blog post from Shabnam Watson. The focus of the post is on SSAS memory settings, but it’s very applicable to the current discussion.

 

Some reasons for this advice are less technical, but no less important. Using analytics models as data sources for ETL processing are a strong code smell[3] (“any characteristic in the source code of a program that possibly indicates a deeper problem”) for business intelligence solutions.

 

Let’s look at a simple and familiar diagram:

 

01 good

 

There’s a reason this left-to-right flow is the standard representation of BI applications: it’s what works. Each component has specific roles and responsibilities that complement each other, and which are aligned with the technology used to implement the component. This diagram includes a set of logical “tiers” or “layers” that are common in analytics systems, and which mutually support each other to achieve the systems’ goals.
Although there are many successful variations on this theme, they all tend to have this general flow and these general layers. Consider this one, for example:

 

02 ok

This example has more complexity, but also has the same end-to-end flow as the simple one. This is pretty typical for  scenarios where a single data warehouse and analytics model won’t fulfill all requirements, so the individual data warehouses, data marts, and analytics models each contain a portion – often an overlapping portion – of the analytics data.

Let’s look at one more:

03 - trending badly

This design is starting to smell. The increased complexity and blurring of responsibilities will produce difficulties in data freshness and maintenance. The additional dependencies, and the redundant and overlapping nature of the dependencies means that any future changes will require additional investigation and care to ensure that there are no unintended side effects to the existing functionality.

As an aside, my decades of working in data and analytics suggest that this care will rarely actually be taken. Instead, this architecture will be fragile and prone to problems, and the teams that built it will not be the teams who solve those problems.

And then we have this one[4]:

04 - hard no

This is what you get when you use Analysis Services as the data source for ETL processing, whether that ETL and downstream storage is implemented in Power BI dataflows or different technologies. And this is probably the best case you’re likely to get when you go down this path. Even with just two data warehouses and two analytics models in the diagram, the complex and unnatural dependencies are obvious, and are painful to consider.

What would be better here?[5] As mentioned at the top of the post, the logical alternative is to avoid using the analytics model and to instead use the same sources that the analytics model already uses. This may require some refactoring to ensure that the duplication of logic is minimized. It may require some political or cross-team effort to get buy-in from the owners of the upstream systems. It may not be simple, or easy. But it is almost always the right thing to do.

Don’t take shortcuts to save days or weeks today that will cause you or your successors months or years to undo and repair. Don’t build a house of cards, because with each new card you add, the house is more and more likely to fall.

Update: The post above focused mainly on technical aspects of the anti-pattern, and suggests alternative recommended patterns to follow instead. It does not focus on the reasons why so many projects are pushed into the anti-pattern in the first place. Those reasons are almost always based on human – not technical – factors.

You should read this post next: http://workingwithdevs.com/its-always-a-people-problem/. It presents a delightful and succinct approach to deal with the root causes, and will put the post you just read in a different context.


[1] Something a lot like this. I copied this from a response I sent a few days ago.

[2] Many thanks to Chris Webb for some of the information I’ve paraphrased here. If you want to hear more from Chris on this subject, check out this session recording from PASS Summit 2017. The whole session is excellent; the information most relevant to this subject begins around the 26 minute mark in the recording. Chris also gets credit for pointing me to Shabnam Watson’s blog.

[3] I learned about code smells last year when I attended a session by Felienne Hermans at Craft Conference in Budapest. You can watch the session here. And you really should, because it’s really good.

[4] My eyes are itching just looking at it. It took an effort of will to create this diagram, much less share it.

[5] Yes, just about anything would be better.

Quick Tip: Creating “data workspaces” for dataflows and shared datasets

Power BI is constantly evolving – there’s a new version of Power BI Desktop every month, and the Power BI service is updated every week. Many of the new capabilities in Power BI represent gradual refinements, but some are significant enough to make you rethink how you your organization uses Power BI.

Power BI dataflows and the new shared and certified datasets[1] fall into the latter category. Both of these capabilities enable sharing data across workspace boundaries. When building a data model in Power BI Desktop you can connect to entities from dataflows in multiple workspaces, and publish the dataset you create into a different workspace altogether. With shared datasets you can create reports and dashboards in one workspace using a dataset in another[2].

The ability to have a single data resource – dataflow or dataset – shared across workspaces is a significant change in how the Power BI service has traditionally worked. Before these new capabilities, each workspace was largely self-contained. Dashboards could only get data from a dataset in the same workspace, and the tables in the dataset each contained the queries that extracted, transformed, and loaded their data. This workspace-centric design encouraged[3] approaches where assets were grouped into workspaces because of the platform, and not because it was the best way to meet the business requirements.

Now that we’re no longer bound by these constraints, it’s time to start thinking about having workspaces in Power BI whose function is to contain data artifacts (dataflows and/or datasets) that are used by visualization artifacts (dashboards and reports) in other workspaces. It’s time to start thinking about approaches that may look something like this:

data centric workspaces

Please keep in mind these two things when looking at the diagram:

  1. This is an arbitrary collection of boxes and arrows that illustrate a concept, and not a reference architecture.
  2. I do not have any formal art training.

Partitioning workspaces in this way encourages reuse and can reduce redundancy. It can also help enable greater separation of duties during development and maintenance of Power BI solutions. If you have one team that is responsible for making data available, and another team that is responsible for visualizing and presenting that data to solve business problems[4], this approach can given each team a natural space for its work. Work space. Workspace. Yeah.

Many of the large enterprise customers I work with are already evaluating or adopting this approach. Like any big change it’s safer to approach this effort incrementally. The customers I’ve spoken to are planning to apply this pattern to new solutions before they think about retrofitting any existing solutions.

Once you’ve had a chance to see how these new capabilities can change how your teams work with Power BI, I’d love to hear what you think.

Edit 2019-06-26: Adam Saxton from Guy In A Cube has published a video on Shared and Certified datasets. If you want another perspective on how this works, you should watch it.


[1] Currently in preview: blog | docs.

[2] If you’re wondering how these capabilities for data reuse relate to each other, you may want to check out this older post, as the post you’re currently reading won’t go into this topic: Lego Bricks and the Spectrum of Data Enrichment and Reuse.

[3] And in some cases, required.

[4] If you don’t, you probably want to think about it. This isn’t the only pattern for successful adoption of Power BI at scale, but it is a very common and tested pattern.