It all comes down to culture

I talk about data culture a lot, and in my presentations I often emphasize how the most important success factor when adopting a tool like Power BI[1] is the culture of the organization, not the tool itself.

I talk about this a lot, but I think Caitie McCaffrey may have just had the final word.[2]

2019-10-24-12-47-22-722--msedge

I don’t think that Caitie was talking about the enterprise adoption of self-service business intelligence, but she could have been.

In my day job I get to talk to leaders from large companies around the world, and to see how they’re adopting and using Power BI, Azure. Before today I didn’t think of Moby Dick – I thought of Leo Tolstoy’s classic Anna Karenina, which starts with this classic line:

All happy families are alike; each unhappy family is unhappy in its own way.

Although the details vary, large companies that have successfully adopted managed self-service BI at scale have cultures with important aspects in common:

  • Leaders empower business users to work with data
  • Leaders trust business users to use data to make better decisions
  • IT supports business users with platforms and tools and with curated data sources
  • Business users work with the tools from IT and the guidance from leaders, and work within the guardrails and guidelines given to them for this use
  • Business and IT collaborate to deliver responsive solutions and mature/stable solutions, with clearly defined responsibilities between them

Companies that are successful with managed self-service BI do these things. Companies that are not successful do not. The details vary, but the pattern holds up again and again.

How do these roles and responsibilities relate to culture?

In many ways a culture is defined by the behaviors it rewards, the behaviors it allows, and the behaviors it punishes. A culture isn’t what you say – it’s what you do.

In the context of BI, having a culture with shared goals that enable business and IT to work together with the support from the company leaders is the key. If you have this culture, you can be successful with any tool. Some tools may be more helpful than others, and the culture will enable the selection of better tools over time, but the tool is not the most important factor. The culture – not the tool – inevitably determines success.

This is not to say that BI tools should not improve to be a bigger part of the solution. But to paraphrase Caitie… maybe you should let that white whale swim past.

 


[1] But definitely not only Power BI.

[2] He says unironically, before writing many more words.

Fiore’s Virtues of Business Intelligence

In the late 1300s and early 1400s, Fiore de’i Liberi was a knight, a diplomat, and a fencing master. He also wrote one of the most comprehensive treatises on medieval combat, his Flower of Battle, of which four copies survive in museums and private collections today. Fiore started – or was a significant evolutionary step in – one of the most important and long-lasting traditions in armed and unarmed combat.

In addition to detailed instruction on fighting with dagger, longsword, spear, and other weapons, Fiore’s manuscript included a preface with information about the virtues that any fencer[1] would need to be successful in combat.

MS_Ludwig_XV_13_32r.jpg

In the image above, Fiore pictures the seven blows of the sword, and his four virtues, each represented by a different animal[2][3]:

This Master with these swords signifies the seven blows of the sword. And the four animals signify four virtues, that is prudence, celerity, fortitude, and audacity. And whoever wants to be good in this art should have part in these virtues.

Fiore then goes on to describe each virtue in turn:

Prudence
No creature sees better than me, the Lynx.
And I always set things in order with compass and measure.

Celerity
I, the tiger, am so swift to run and to wheel
That even the bolt from the sky cannot overtake me.

Audacity
None carries a more ardent heart than me, the lion,
But to everyone I make an invitation to battle.

Fortitude
I am the elephant and I carry a castle as cargo,
And I do not kneel nor lose my footing.[4]

Step back and read this again: “And whoever wants to be good in this art should have part in these virtues.”

That’s right – Fiore was documenting best practices, 600+ years ago. And although I suspect that Fiore wasn’t thinking about business intelligence projects at the time, I do believe that these virtues are just as relevant to the slicing and dicing[5] we’re still doing today. Let me explain.

Prudence – “…I always set things in order with compass and measure“: A successful BI practitioner knows what needs to be done before a project can begin, and when additional work is required before they can get started. Initiating a project requires careful setup and planning, and moving before the prerequisites for success are in place can be disastrous.[6]

Celerity – “I… am so swift to run and to wheel that even the bolt from the sky cannot overtake me:  Business requirements change day to day and hour to hour. To succeed, a BI practitioner must be prepared to move quickly and decisively, engaging without delay when an opportunity presents itself – and also be prepared to change direction as the needs of the project change.

Audacity – “…to everyone I make an invitation to battle:  Any project declined presents an opening for another practitioner, another team, another tool, and this is likely to reduce opportunities over time. Saying yes to difficult projects – and succeeding in their execution – is necessary to ensure that future projects don’t pass you by.

Fortitude – “And I do not kneel nor lose my footing: When Fiore speaks of fortitude, he does not speak of the strength that comes from big muscles. He speaks of the strength that comes from structure, and balance. His “elephant with a castle on its back” is a perfect metaphor for a BI solution delivered quickly and confidently because of the solid and stable platform on which it is built. Success doesn’t come from the extra effort put in when delivering a solution – it comes from the care and planning that went into the overall data estate.

You may look at these virtues and see contradiction – how can you have prudence and audacity and celerity? The answer for BI is the same answer that it is for the sword: practice, training, and preparation. In both situations, whether you’re battling with an armed foe or battling with a difficult client, you need to apply the right virtues at the right times, and to understand both the big picture and the day to day steps that produce larger successes. In both situations you’re also facing complex and dynamic challenges where you need to quickly take advantage of opportunities as they arise, and create opportunities when they don’t appear on their own[7]. Fortunately, as BI practitioners we can rely on the strengths of our teams – it’s not always a solo battle.

You may also look at these virtues and see Matthew stretching to make the most tenuous of analogies work, just because he loves swords as much as he loves BI. While this may be true, I do honestly believe that these virtues do apply here. Over the past 20-25 years I have seen many projects succeed because these virtues were embodied by the people and teams involved, and I’ve seen many projects fail where these virtues were absent. This isn’t the only way to look at success factors… but at the moment it’s my favorite.

In closing, I’d like to mention that this post marks one year since I started this blog. In the past year I’ve published almost 90 posts, and have had roughly 50,000 visitors and 100,000 page views. Here’s hoping that by applying Fiore’s virtues I’ll be able to make the next year even more productive and more successful than the year that has passed.

Thanks to all of you who read what I write, and who provide feedback here and on Twitter – I couldn’t do it without you.


[1] Fencer in this context meaning someone who fights with swords or other edged weapons, not the Olympic-style sport of fencing that a modern reader might picture when reading the word.

[2] As translated by Michael Chidester and Colin Hatcher.

[3] Although it may not be obvious to the modern reader, the animal at the bottom is an elephant with a tower or castle on its back. I suspect that Fiore never actually saw an elephant.

[4] In case these terms don’t immediately have meaning, prudence == wisdom, celerity == speed, audacity == daring, and fortitude == strength.

[5] See what I did there?

[6] I assume that Fiore’s use of the term “measure” here is pure coincidence.

[7] If you’ve worked on a high-stakes, high-visibility BI project where requirements changed during implementation, or where not all stakeholders were fully committed to the project goals, this will probably feel very familiar.

Self-Service BI: Asleep at the wheel?

I’ve long been a fan of the tech new site Ars Technica. They have consistently good writing, and they cover interesting topics that sit at the intersection of technology and life, including art, politics[1], and more.

When Ars published this article earlier this week, it caught my eye – but not necessarily for the reason you might think.

sleeping tesla

This story immediately got me thinking about how falling asleep at the wheel is a surprisingly good analogy[2] for self-service BI, and for shadow data in general. The parallels are highlighted in the screen shot above.

  1. Initial reaction: People are using a specific tool in a way we do not want them to use it, and this is definitely not ideal.
  2. Upon deeper inspection: People are already using many tools in this bad way, and were it not for the capabilities of this particular tool the consequences would likely be much worse.

If you’re falling asleep at the wheel, it’s good to have a car that will prevent you from injuring or killing yourself or others. It’s best to simply not fall asleep at the wheel at all, but that has been sadly shown to be an unattainable goal.

If you’re building a business intelligence solution without involvement from your central analytics or data team, it’s good to have a tool[3] that will help prevent you from misusing organizational data assets and harming your business. It’s best to simply not “go rogue” and build data without the awareness of your central team at all, but that has been sadly shown to be an unattainable goal.

Although this analogy probably doesn’t hold up to close inspection as well as the two-edge sword analogy, it’s still worth emphasizing. I talk with a lot of enterprise Power BI customers, and I’ve had many conversations where someone from IT talks about their desire to “lock down” some key self-service feature or set of features, not fully realizing the unintended consequences that this approach might have.

I don’t want to suggest that this is inherently bad – administrative controls are necessary, and each organization needs to choose the balance that works best for their goals, priorities, and resources. But turning off self-service features can be like turning off Autopilot in a Tesla. Keeping users from using a feature is not going to prevent them from achieving the goal that the feature enables. Instead, it will drive[4] users into using other features and other tools, often with even more damaging consequences.

Here’s a key quote from that Ars Technica article:

We should be crystal clear about one point here: the problem of drivers falling asleep isn’t limited to Tesla vehicles. To the contrary, government statistics show that drowsy driving leads to hundreds—perhaps even thousands—of deaths every year. Indeed, this kind of thing is so common that it isn’t considered national news—which is why most of us seldom hear about these incidents.

In an ideal world, everyone will always be awake and alert when driving, but that isn’t the world we live in. In an ideal world, every organization will have all of the data professionals necessary to engage with every business user in need. We don’t live in that world either.

There’s always room for improvement. Tools like Power BI[5] are getting better with each release. Organizations keep maturing and building more successful data cultures to use those tools. But until we live in an ideal world, we each need to understand the direct and indirect consequences of our choices…


[1] For example, any time I see stories in the non-technical press related to hacking or electronic voting, I visit Ars Technica for a deeper and more informed perspective. Like this one.

[2] Please let me explicitly state that I am in no way minimizing or downplaying the risks of distracted, intoxicated, or impaired driving. I have zero tolerance for these behaviors, and recognize the very real dangers they present. But I also couldn’t let this keep me from sharing the analogy…

[3] As well as the processes and culture that enable the tool to be used to greatest effect, as covered in a recent post: Is self-service business intelligence a two-edged sword?

[4] Pun not intended, believe it or not.

[5] As a member of the Power BI CAT team I would obviously be delighted if everyone used Power BI, but we also don’t live in that world. No matter what self-service BI tool you’ve chosen, these lessons will still apply – only the details will differ.

Is self-service business intelligence a two-edged sword?

I post about Power BI dataflows a lot, but that’s mainly because I love them. My background in data preparation and ETL, combined with dataflows’ general awesomeness makes them a natural fit for my blog. This means that people often think of me as “the dataflows guy” even though dataflows are actually a small part of my role on the Power BI CAT team. Most of what I do at work is help large enterprise customers successfully adopt Power BI, and to help make Power BI a better tool for their scenarios[1].

As part of my ongoing conversations with senior stakeholders from these large global companies, I’ve noticed an interesting trend emerging: customers describing self-service BI as a two-edged sword. This trend is interesting for two main reasons:

  1. It’s a work conversation involving swords
  2. Someone other than me is bringing swords into the work conversation[2]

As someone who has extensive experience with both self-service BI and with two-edged swords, I found myself thinking about these comments more and more – and the more I reflected, the more I believed this simile holds up, but not necessarily in the way you might suspect.

This week in London I delivered a new presentation for the London Power BI User Group – Lessons from the Enterprise: Managed Self-Service BI at Global Scale. In this hour-long presentation I explored the relationship between self-service BI and two-edged swords, and encouraged my audience to consider the following points[4]:

  • The two sharp edges of a sword each serve distinct and complementary purposes.
  • A competent swordsperson knows how and when to use each, and how to use them effectively in combination.
  • Having two sharp edges is only dangerous to the wielder if they are ignorant of their tool.
  • A BI tool like Power BI, which can be used for both “pro” IT-driven BI and self-service business-driven BI has the same characteristics, and to use it successfully at scale an organization needs to understand its capabilities and know how to use both “edges” effectively in combination.

As you can imagine, there’s more to it than this, so you should probably watch the session recording.

ssbi and swords

If you’re interested in the slides, please download them here: London PUG – 2019-06-03 – Lessons from the Enterprise.

If you interested in the videos shown during the presentation, they’re included in the PowerPoint slides, and you can view them on YouTube here:

For those who are coming to the Microsoft Business Applications Summit next week, please consider joining the CAT team’s “Enterprise business intelligence with Power BI” full-day pre-conference session on Sunday. Much of the day will be deep technical content, but we’ll be wrapping up with a revised and refined version of this content, with a focus on building a center of excellence and a culture of data in your organization.

Update 2019-06-10: The slides from the MBAS pre-conference session can be found here: PRE08 – Enterprise business intelligence with Power BI – Building a CoE.

There is also a video of the final demo where Adam Saxton joined me to illustrate how business and IT can work together to effectively respond to unexpected challenges. If you ever wondered what trust looks like in a professional[5] environment, you definitely want to watch this video.

 


[1] This may be even more exciting for me than Power BI dataflows are, but it’s not as obvious how to share this in blog-sized pieces.

[2] Without this second point, it probably wouldn’t be noteworthy. I have a tendency to bring up swords more often in work conversations than you might expect[3].

[3] And if you’ve been paying attention for very long, you’ll probably expect this to come up pretty often.

[4] Pun intended. Obviously.

[5] For a given value of “professional.”

Dataflows, Datasets, and Models – Oh My!

How do Power BI datasets and dataflows relate to each other? Do you need one if you have the other?

Photo by Chris Liverani on Unsplash

I received this question as a comment on another post, and I think it warrants a full post as a reply:

Hi Matthew,  my organization is currently evaluating where to put BI data models for upcoming PBI projects. Central in the debates is the decision of whether to use PBI Datasets, SSAS or DataFlows. I know a lot of factors need considering. I’m interested in hearing your thoughts.

Rather than answering the question directly, I’m gong to rephrase and re-frame it in a slightly different context.

I’m currently evaluating how to best chop and prepare a butternut squash. Central in the debates is the decision of whether to use a 6″ chef’s knife, a 10″ chef’s knife, or a cutting board.

(I’ll pause for a moment to let that sink in.)

It doesn’t really make sense to compare two knives and a cutting board in this way, does it? You can probably get the job done with either knife, and the cutting board will certainly make the job easier… but it’s not like you’d need to choose one of the three, right? Right?

Right!

Your choice of knife will depend on multiple factors including the size of the squash, the size of your hand, and whether or not you already have one or the other or both.

Your choice of using a cutting board will come down to your workflow and priorities. Do you already have a cutting board? Is it more important to you to have a safe place to chop the squash and not damage the edge of your knife, or is it more important to not have one more thing to clean?

Both of these are valid decisions that need to be made – but they’re not dependent on each other.

Let’s get back to the original question by setting some context for dataflows and datasets in Power BI.

2018-12-07_11-54-18.jpg

This image is from one of the standard slides in my dataflows presentation deck, and I stole it from the dataflows team[1]. It shows where datasets and dataflows fit in Power BI from a high-level conceptual perspective.

Here’s what seems most important in the context of the original question:

  • Power BI visualizations are built using datasets as their sources
  • Power BI includes datasets, which are tabular BI models hosted in the Power BI service
  • Dataflows are a data preparation capability in Power BI for loading data into Azure Data Lake Storage gen2
  • Dataflows can be used as a data source when building datasets in Power BI, but cannot currently be used as a data source for models outside of Power BI, including SSAS and AAS
  • Dataflows and datasets solve different problems and serve different purposes, and cannot be directly compared to each other as the original question tries to do – that’s like comparing chef’s knives and cutting boards

What’s not shown in this diagram is SQL Server Analysis Services (SSAS) or Azure Analysis Services (AAS) because the diagram is limited in scope to capabilities that are natively part of Power BI. SSAS and AAS are both analytics services that can host tabular BI models that are very similar to Power BI datasets, and which can be used as a data source for Power BI datasets. Each option – SSAS, AAS, or Power BI datasets – is implemented using the same underlying technology[2], but each has different characteristics that make it more or less desirable for specific scenarios.

This list isn’t exhaustive, and I make no claims to being an expert on this topic, but these are the factors that seem most significant when choosing between SSAS, AAS, or Power BI datasets as your analytics engine of choice:

  • Cost and pricing model – if you choose SSAS you’ll need to own and manage your own physical or virtual server. If you choose AAS or Power BI you’ll pay to use the managed cloud service. Dedicated Power BI Premium capacity and shared Power BI Pro capacity have different licensing models and costs tp target different usage patterns.
  • Model size – you can scale SSAS to pretty much any workload if you throw big enough hardware at it[3]. AAS can scale to models that are hundreds of gigabytes in size. Power BI Premium can support PBIX files up to 10GB[4], and Power BI Pro supports PBIX files up to 1GB.
  • Deployment and control scenarios – with SSAS and AAS, you have a huge range of application lifecycle management (ALM) and deployment capabilities that are enabled by the services’ XMLA endpoint and a robust tool ecosystem. Power BI Premium will support this before too long[5] as well.

I’m sure I’m missing many things, but this is what feels most important to me. Like I said, I’m far from being an expert on this aspect of Power BI and the Microsoft BI stack.

So let’s close by circling back to the original question, and that delicious analogy. You need a knife, but the knife you choose will depend on your requirements. Having a cutting board will probably also help, but it’s not truly required.

Now I’m hungry.

 


[1] If you want to watch a conference presentation or two that includes this slide, head on over to the Dataflows in Power BI: Resources post.

[2] This feels like an oversimplification, but it’s technically correct at the level of abstraction at which I’m writing it. If anyone is interested in arguing this point, please reply with a comment that links to your article or blog post where the salient differences are listed.

[3] Remember I’m not an expert on this, so feel free to correct me by pointing me to documentation. Thanks!

[4] This is not a direct size-to-size comparison. The services measure things differently.

[5] As announced at Microsoft Ignite a few months back, no firm dates shared yet.

Lego Bricks and the Spectrum of Data Enrichment and Reuse

I love Lego bricks.

I love them because they were a favorite toy from my childhood. I love them because my children love them. I love them because they foster and encourage creativity.

I love them because they serve as an excellent metaphor for the enrichment and reuse of data in the enterprise.[1]

Consider this starting point.

1

A single brick has almost unlimited potential. This brick could become the floor of a building, the chassis of a fire truck, the wing of an airplane or spaceship, or part of something that no Lego engineer had ever imagined. This potential comes with a cost – this single brick must be enriched with many other bricks before it can achieve any of these lofty goals.

Similarly, a single file, entity, or table has massive potential. The data it maintains could become part of many different applications in many different contexts. But as with the lone brick, it would need to first be enriched, to be combined with other data for that potential to be realized.

 

As each additional brick is combined with the first brick, its complexity increases, but so does its value. These more complex components are closer to the ultimate end state, so less work will be required to achieve that goal.

But at the same time, each additional brick reduces the potential for reuse. After only a few additions we’ve already ruled out creating a floor or an airplane wing. We might still create a car or truck, or some types of spaceships, but the enrichment we’ve performed is already constraining our potential.

Similarly, each time we enrich a data source to move closer to our goal, we also limit the potential scenarios for which the data can be used. If we filter the data horizontally or vertically to eliminate records or fields we don’t currently need, we are eliminating data that may be needed for some other purpose. If we merge our starter data set with additional data, we may also be adding records or fields that aren’t needed for future purposes, while increasing complexity and adversely affecting performance.

 

As we continue building, we see this pattern continue. We also see the pattern repeated on multiple levels, while contributing to the overall goal. At multiple points we will combine a small number of individual bricks to build a component, and then add that component to the main model to make it more complex, and more specialized. Each time – both for the component and for the whole – the process of enrichment adds value and complexity, and reduces the potential scope of reuse. When the final model is finished we have the final model we needed[2]. The value is very high, but the opportunities for reuse are very small.

The parallel is clear: when we finish building a BI application, the value is very high, but the potential for reuse is very low. The dashboards and reports, the data model with its measures, dimensions, and KPIs, the data warehouse and data lake, and all of the upstream logic and components that make up a BI solution need to be combined in specific ways to achieve specific goals. The application as a whole is specialized for a given purpose…

…but what about the components that make up the application? Where can we find the sweet spot, the perfect balance between achieved value and potential value?

Like these Lego components:

 

When you’re building your solution using Power BI, this is where dataflows come in.

The answer will of course differ for each context, but when designing an application, it’s important to take larger and longer-term requirements into account.

Consider this diagram[3]:

New Excel Diagram

In this simple architecture, each dataflow (represented by the lighter squares) represents a stage in the overall data processing and enrichment flow. Each one adds value toward the application, and each serves as a building block that can be further reused both in this application and in other applications with overlapping data requirements.

The nature of Power BI dataflows lends itself well to this problem – each dataflow is a collection of reusable data entities managed by the Power BI service, easily discoverable and usable by technical and business users in BI applications. The computed entities feature in particular makes this type of scenario easy to set up and manage.

At the same time, the nature of Power BI dataflows introduces challenges for this Lego-style reuse. Dataflows in Power BI are optimized to enable non-IT users, business users and analysts who are typically focused on solving their immediate data challenges without relying on support from IT. These users are less likely to be focused on bigger-picture requirements like broad reuse of the entities they create.

This is where a little process and collaboration can come in, aided by the fact that dataflows are managed by the Power BI service. Power BI administrators can monitor the service to understand what dataflows are being used most frequently and most widely, and in what contexts. With this as a starting point, they can then operationalize[4] dataflows and entities created by business users, so that they are managed and maintained by IT. Since each dataflow entity is defined by the Power Query “M” code in the entity definition, this operationalization process is likely to be simpler and easier than similar processes with other technologies.

This approach also fits in well with how many larger enterprises implement Power BI. It is common[5] for larger organizations to use both shared capacity and dedicated Premium capacity for different purposes, and often those applications deployed to Premium capacity are those that are managed by a central IT/BI team. Since computed entities are only available when using Power BI Premium[6], this approach could lend itself well to the hand-off from business users to IT.

In any event, the next time you’re building dataflow entities, pause for a moment to think about Lego bricks, and what types of bricks or components your entities and dataflows represent. And then maybe take a break to go play with your kids.


[1] All images and examples in this post are taken from the building instructions for the Lego City Starter Set. I used to have a similar Lego set in my office that I would use to tell this story in person, but I gave it away during an office move shortly before I started this blog. The moral of the story: never get rid of your Lego sets.

[2] A fire truck!!

[3] This diagram is my reformatted version of a diagram included in the Power BI dataflows whitepaper. If you haven’t read this document, you really should.

[4] Or industrialize – which term do you use?

[5] Please note that this is not a blanket recommendation. I have the advantage of talking to many top Power BI customers around the world, so I can see this approach emerging as a common pattern, but the “right” approach in a given context will always depend on the context and the goals of the organization. I personally believe it is too early to start talking about best practices for Power BI dataflows (as I write this, dataflows have been in public preview for barely three weeks) but this is one of the areas where I am most excited to see best practices start to emerge.

[6] Even though Power BI dataflows do enable reuse in other ways that do not require Premium capacity.