Data Culture: Strategy is more important than tactics

Even though he lived 2,000 years ago, you’ve probably heard of the Chinese military strategist and general Sun Tzu. He’s known for a lot of things, but these days he’s best known for his work The Art of War[1], which captures military wisdom that is still studied and applied today

Even though Sun Tzu didn’t write about building a data culture[2], there’s still a lot we can learn from his writings. Perhaps the most relevant advice is this:

Building a data culture is hard. Keeping it going, and thriving, as the world and the organization change around you is harder. Perhaps the single most important thing[3] you can do to ensure long-term success is to define the strategic goals for your efforts.

Rather than doing all the other important and valuable tactical things, pause and think about why you’re doing them, and where you want to be once they’re done. This strategic reflection will prove invaluable, as it will help you prioritize, scope, and tune those tactical efforts.

Having a shared strategic vision makes everything else easier. At every step of the journey, any contributor can evaluate their actions against that strategic vision. When conflicts arise – as they inevitably will – your pre-defined strategic north star can help resolve them and to keep your efforts on track.


[1] Or possibly for the Sabaton album of the same name, which has a catchier bass line. And since Sabaton is a metal band led by history geeks, they also have this video that was released just a few weeks ago that looks at some of the history behind the album and song.

[2] Any more than Fiore wrote about business intelligence.

[3] I say “perhaps” because having an engaged executive sponsor is the other side of the strategy coin. Your executive sponsor will play a major role in defining your strategy, and in getting all necessary stakeholders on board with the strategy. Although I didn’t plan it this way, I’m quite pleased with the parallelism of having executive sponsorship be the first non-introductory video in the series, and having this one be the last non-summary video. It feels neat, and right, and satisfying.

Data Culture: Measuring Success

Building a data culture is hard. It involves technology and people, each of which is complicated enough on its own. When you combine them[1] they get even harder together. Building a data culture takes time, effort, and money – and because it takes time, you don’t always know if the effort and money you’re investing will get you to where you need to go.

Measuring the success of your efforts can be as hard as the efforts themselves.

Very often the work involved in building a data culture doesn’t start neatly and cleanly with a clearly defined end state. It often starts messily, with organic bottom-up efforts combining with top-down efforts over a period of change that’s driven as much by external forces as by any single decision to begin. This means that measuring success – and defining what “success” means – happens while the work is being done.

Measuring usage is the easiest approach, but it’s not really measuring success. Does having more reports or more users actually produce more value?

For many organizations[2], success is measured in the bottom line –  is the investment in building a data culture delivering the expected return from a financial perspective?

Having a data culture can make financial sense in two ways: it can reduce costs, and it can increase revenue.

Organizations often reduce costs by simplifying their data estate. This could involve standardizing on a single BI tool, or at least minimizing the number of tools used, migrating from older tools before they’re retired and decommissioned. This reduces costs directly by eliminating licensing expenses, and reduces costs indirectly by reducing the effort required for training, support, and related tasks. Measuring cost reduction can be straightforward – odds are someone is already tracking the IT budget – and measuring the reduction in the utilization of legacy tools can also take advantage of existing usage reporting.

Organizations can increase revenue by building more efficient, data-driven business processes. This is harder to measure. Typically this involves instrumenting the business processes in question, and proactively building the processes to correlate data culture efforts to business process outcomes.

In the video I mention the work of several enterprise Power BI customers who have build Power BI apps for their information workers and salespeople. These apps provide up-to-date data and insights for employees who would otherwise need to rely on days- or weeks-old batch data delivered via email or printout. By tracking which employees are using what aspects of the Power BI apps, the organizations can correlate this usage with the business outcomes of the employees’ work[3]. If a person or team’s efficiency increases as data usage increases, it’s hard to argue with that sort of success.

But.. this post and video assume that you have actually set explicit goals. Have you? If you haven’t defined that strategy, you definitely want to check out next week’s video…


[1] Especially since you usually have organizational politics thrown into the mix for good measure, and that never makes things any simpler.

[2] I originally typed “most organizations” but I don’t have the data to support that assertion. This is true of most of the mature enterprise organizations that I’ve worked with, but I suspect that for a broader population, most organizations don’t actually measure – they just cross their fingers and do what they can.

[3] Odds are someone is already tracking things like sales, so the “business outcomes” part of this approach might be simpler than you might otherwise assume. Getting access to the data and incorporating it in a reporting solution may not be straightforward, but it’s likely the data itself already exists for key business processes.

Data Culture: Experts and Expertise

Power BI lets business users solve more and more problems without requiring deep BI and data expertise. This is what self-service business intelligence is all about, as we saw when we looked at a brief history of business intelligence.

At other points in this series we also looked at how each app needs to be treated like the unique snowflake that it is, that successful data cultures have well-defined roles and responsibilities, and that sometimes you need to pick your battles and realize that some apps and data don’t need the management and care that others do.

But some apps do.

Some BI solutions are too important to let grow organically through self-service development. Sometimes you need true BI experts who can design, implement, and support applications that will scale to significant data volumes and number of concurrent users.

In this video we look at a specific approach taken by the BI team at Microsoft that developed the analytic platform used by Microsoft finance[1].

This is one specific approach, but it demonstrates a few fundamental facts that can be overlooked too easily:

  • Building an enterprise BI solution is building enterprise software, and it requires the rigor and discipline that building enterprise software demands
  • Each delivery team has dedicated teams of experts responsible for their part of the whole
  • Each business group with data and BI functionality included in the solution pays for what they get, with both money and personnel

Organizations that choose to ignore the need for experts tend to build sub-optimal solutions that fail to deliver on stakeholder expectations. These solutions are often replaced much sooner than planned, and the people responsible for their implementation are often replaced at the same time[2].

This isn’t the right place to go into the details of what sort of expertise you’ll need, because there’s too much to cover, and because the details will depend on your goals and your starting point. In my opinion the best place to go for more information is the Power BI whitepaper on Planning a Power BI Enterprise Deployment. This free resources delivers 250+ pages of wisdom from authors Melissa Coates and Chris Webb. You probably don’t need to read all of it, but odds are you will probably want to once you get started…


After this video and post were completed but before they were published, this story hit the global news wire: Botched Excel import may have caused loss of 15,841 UK COVID-19 cases | Ars Technica (arstechnica.com)

Wow. Although I am generally a big fan of Ars Technica’s journalism, I need to object to the sub-headline: “Lost data was reportedly the result of Excel’s limit of 1,048,576 rows.”

Yeah, no. The lost data was not the result of a  capability that has been well-known and documented for over a decade. The lost data was a result of using non-experts to do a job that experts should have done.

Choosing the wrong tool for a given job is often a symptom of not including experts and their hard-earned knowledge at the phases of a project where that expertise could have set everything up for success. This is just one example of many. Don’t let this happen to you.


[1] If you’re interested in a closer look at the Microsoft Finance COE approach, please check out this article in the Power BI guidance documentation.

[2] If you’ve been a consultant for very long, you’ve probably seen this pattern more than once. A client calls you in to replace or repair a system that never really worked, and all of the people who built it are no longer around.

The best Power BI roadmap

The Power BI team at Microsoft publishes a “release plan,” which is essentially the public product roadmap. Anyone can use it to understand what new capabilities and improvements are planned, and when they’re expected to be released[1].

One challenge with the official release plan comes from the fact that it is a set of online documents, and that for each “release wave[2]” there is a new set of docs – it’s not always clear where to look for the latest information on a given feature.

It’s clear now. You look here: https://aka.ms/pbireleaseplan

This link will take you to an interactive Power BI report built by Microsoft Technical Specialist and community superhero Alex Powers.

Using this report you can browse and explore by release month, or you can use keyword search to find specific features you’re interested in. The report spans multiple release waves (as I write this post it includes features from October 2019 to March 2021) and for each feature there’s a link to the documentation that includes all publicly available detail.

I use this report almost every day. It’s not actually new, but it is awesome. You should use it, and share it with everyone you know… or at least the people you know who care about Power BI.

Update January 2022: The link above no longer takes you to the public release plan report – it takes you to a Power BI app in the AppSource store that allows you to install the report in your own Power BI tenant. If you just want the public report, you can still find it here: https://aka.ms/ReleasePlanReport.


[1] Ohhh…. Now that i type that sentence, I understand why we call it a release plan. Suddenly that makes sense.

[2] There are two release waves per year, and they correspond to the planning and execution semesters used by the product teams.

DirectQuery and dataflows in Power BI

Important: This post was written and published in 2020, and the content below may no longer represent the current capabilities of Power BI. Please consider this post to be more of an historical record and less of a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

Last week I saw this scenario cross my screen:

I’m trying to use dataflows with DirectQuery. I’ve pushed all the data into a dataflow, around 1.5 billion rows, and am trying to DirectQuery all the data. Performance is quite slow.

There’s a lot here to unpack, but I want to start with this[1]:

As covered in this recent post on the dataflows enhanced compute engine, it is possible to use dataflows in Power BI as a DirectQuery data source. But… it’s probably not going to deliver awesome performance when you’re querying billions of rows, over and over, every time a user interacts with visuals on published reports.

Many of the conversations I have that involve DirectQuery expose a lack of awareness of what DirectQuery actually does. Here’s a quote from the Power BI docs (emphasis is mine):

DirectQuery mode is an alternative to Import mode. Models developed in DirectQuery mode don’t import data. Instead, they consist only of metadata defining the model structure. When the model is queried, native queries are used to retrieve data from the underlying data source.

Here’s something that isn’t in the article, but which I believe probably should be:

DirectQuery often seems like an attractive option because it leaves the data in the database where it already is, and doesn’t require importing the data into the Tabular engine in the Power BI service. It can feel like you’re avoiding the effort and complexity that can come with designing a high-performance tabular model, but what you’re actually doing is moving the complexity and effort somewhere else.

When you’re designing a tabular model that will have a small data volume and/or a small number of users, you can probably get away without paying too much attention to best practices. But as your data and user base grow, you you need to invest effort and expertise into designing and implementing a model[2] that will scale and perform as needed.

The same factors are true for scaling with DirectQuery – it’s just a different set of effort and expertise that needs to be acquired and applied. My colleague Kasper de Jonge wrote at length about this a few years ago, and his post links to other, deeper, sources as well. At the risk of oversimplifying, there are two key factors that you need to take into consideration when using DirectQuery:

  1. Database design – the schema accessed via DirectQuery needs to be optimized for the types of queries it will process
  2. Index design – the database needs appropriate indexes to ensure that most common queries are fulfilled by data that’s already in memory, without the IO cost of hitting a disk

These factors fall broadly into the category of performance tuning and optimization, and the specific steps and details will depend on the data source. SQL Server, Oracle, SAP HANA… each database platform has its own nuances. There are lots of resources available online, like this awesome post from Brent Ozar… but if you pay attention you’ll see this 3-page post is mainly just a collection of links to other more detailed articles, and that it probably represents hundreds of pages of content. And this is just one resource for one database.

Using DirectQuery at scale means you need to be an expert on the database where your data lives, and you need to put that expertise to work tuning and optimizing the database. If you don’t do this, you cannot expect DirectQuery to perform well as your data volume and number of users grow.

At this point you may be asking what any of this has to do with dataflows in Power BI. The answer, of course, is everything.

The dataflows enhanced compute engine improves performance for multiple scenarios by loading dataflow entity data into a SQL-based cache. Queries against the dataflow can then be fulfilled by reading from the cache instead of reading from CDM folders in Azure storage – this enables both query folding and DirectQuery.

One thing that the enhanced compute engine does not provide is direct access to the SQL database that’s doing the caching. Not only can you not define indexes[3], you cannot use any of the SQL Server tools or techniques that Brent’s performance tuning post calls out. The implementation of the compute engine is a black box, and it does not provide any mechanism for monitoring, tuning, or optimization.

This means that while the enhanced compute engine is a simple solution to common problems, expecting dataflows DirectQuery over billions of rows to perform well is likely to result in the same disappointment as when expecting DirectQuery to perform well at scale over any un-optimized database.

None of this to say that DirectQuery – including DirectQuery over dataflows – can’t be part of a scalable BI solution. Power BI and Azure provide capabilities like composite models and aggregations, as well as the recently-announced Power BI performance accelerator for Azure Synapse Analytics[4], each of which is a valuable addition to your modern BI toolkit.

But if you expect to flip a switch and have awesome performance with DirectQuery against dataflows with billions of records…


[1] I know there’s a typo in the meme. It should be driving me crazy, but for some reason it fits.

[2] Yes, I just linked to the SQLBI home page, rather than to a specific article. This is the point – it’s complicated, and there is a body of knowledge that needs to be acquired and applied. If you and your team don’t have that expertise, you either need to develop it or bring in experts who have.

[3] It’s worth pointing out that the compute engine does create columnstore indexes automatically. This will provide improved performance for many queries, but does not reach the level of targeted performance tuning for specific workloads.

[4] I hope this gets named PBIPA4ASA, but so far no one on the branding team is returning my calls.

Data Culture: Community champions

What would an epic battle be without champions?

Lost. The epic battle would be lost without champions.

Don’t let this happen to you battle to build a data culture. Instead, find your champions, recognize and thank them, and give them the tools they need to rally their forces and lead them to victory.

Let’s do this!!

Despite what the nice short video[1] may lead you to believe, it’s not absolutely necessary to provide your data culture champions with literal swords[2]. But it is vital that you arm[3] them with the resources and connections they need to be successful.

In any community there will be people who step up to go the extra mile, to learn more than they need to know, and to do more than they are asked. These people are your champions, but they can’t do it all on their own. In the long term champions will succeed or fail based on the support they get from the center of excellence.

With support from the BI COE, champions can help a small central team scale their reach and impact. Champions typically become the primary point of contact for their teams and business groups, sharing information and answering questions. They demonstrate the art of the possible, and put technical concepts into the context and language that their business peers understand.

This is just what they do – this is what makes them champions.

An organization that’s actively working to build a data culture will recognize and support these activities. And if an organization does not…


[1] This video is about 1/3 as long as the last video in the series. You’re very welcome.

[2] But why take chances, am I right?

[3] See what I did there? I shouldn’t be allowed to write blog posts this close to bedtime.

Automatically refresh dataset when dataflow refresh completes

Important: This post was written and published in 2020, and the content below may no longer represent the current capabilities of Power BI. Please consider this post to be more of an historical record and less of a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

Back in August I highlighted the new dataflows PowerShell script repo on GitHub. These scripts provide an accelerated starting point to working with the dataflows REST APIs and to automate common dataflows tasks.

This week the dataflows team has released two new REST APIs for managing dataflows transactions (think “refresh”) and a new parameterized PowerShell script to make using the APIs easier.

You can find the APIs documented here and the PowerShell script here.

The most exciting thing about these APIs and scripts[1] is that they enable one of the most frequently requested capabilities: you can now trigger a dataset refresh (or the refresh of a dataflow in another workspace) when a dataflow refresh completes.

Boom.

If you’re reading this and are thinking “that’s all well and good, but I don’t want to use an API or PowerShell[2]” please don’t fret. The Power BI team is working on non-API-based experiences to make the end-to-end refresh experience simpler and more complete. I can’t share dates or details here before they’re publicly announced, but I did want to share proactively before anyone asked about a code-free UX for refresh orchestration.

Update December 2020: There’s now a Power Automate connector that will let you do this without writing any code.


[1] And the best thing to happen in 2020 so far, if James Ward is to be believed.

[2] Or even “this should be ‘clicky-clicky-draggy-droppy’ and not ‘typey-typey-scripty-codey’.”