BI is dead. Long live BI!

As I was riding the bus home from jury duty the other day[1] I saw this tweet come in from Eric Vogelpohl.

There’s a lot to unpack here. and I don’t expect to do it all justice in this post, but Eric’s thought-provoking tweet made me want to reply, and I knew it wouldn’t fit into 280 characters… but I can tackle some of the more important and interesting elements.

First and foremost, Eric tags me before he tags Marco, Chris, or Curbal. I am officially number one, and I will never let Marco or Chris forget it[2].

With that massive ego boost out of the way, let’s get to the BI, which is definitely dead. And also definitely not dead.

Eric’s post starts off with a bold and simple assertion: If you have the reactive/historical insights you need today, you have enough business intelligence and should focus on other things instead. I’m paraphrasing, but I believe this effectively captures the essence of his claim. Let me pick apart some of the assumptions I believe underlie this assertion.

First, this claim seems to assume that all organizations are “good w/ BI.” Although this may be true of an increasing number of mature companies, in my experience it is definitely not something that can be taken for granted. The alignment of business and technology, and the cultural changes required to initiate and maintain this alignment, are not yet ubiquitous.

Should they be? Should we be able to take for granted that in 2019 companies have all the BI they need? [3]

The second major assumption behind Eric’s first point seems to be that “good w/ BI” today translates to “good w/ BI” tomorrow… as if BI capabilities are a blanket solution rather than something scoped and constrained to a specific set of business and data domains. In reality[4], BI capabilities are developed and deployed incrementally based on priorities and constraints, and are then maintained and extended as the priorities and constraints evolve over time.

My job gives me the opportunity to work with large enterprise companies to help them succeed in their efforts related to data, business intelligence, and analytics. Many of these companies have built successful BI architectures and are reaping the benefits of their work. These companies may well be characterized as being “good w/ BI” but none of them are resting on their laurels – they are instead looking for ways to extend the scope of their BI investments, and to optimize what they have.

I don’t believe BI is going anywhere in the near future. Not only are most companies not “good w/ BI” today, the concept of being “good w/ BI” simply doesn’t make sense in the context in which BI exists. So long as business requirements and environments change over time, and so long as businesses need to understand and react, there will be a continuing need for BI. Being “good w/ BI” isn’t a meaningful concept beyond a specific point in time… and time never slows down.

If your refrigerator is stocked with what your family likes to eat, are you “good w/ food”? This may be the case today, but what about when your children become teenagers and eat more? What about when someone in the family develops food allergies? What about when one of your children goes vegan? What about when the kids go off to college? Although this analogy won’t hold up to close inspection[5] it hopefully shows how difficult it is to be “good” over the long term, even for a well-understood problem domain, when faced with easily foreseeable changes over time.

Does any of this mean that BI represents the full set of capabilities that successful organizations need? Definitely not. More and more, BI is becoming “table stakes” for businesses. Without BI it’s becoming more difficult for companies to simply survive, and BI is no longer a true differentiator that assures a competitive advantage. For that advantage, companies need to look at other ways to get value from their data, including predictive and prescriptive analytics, and the development of a data culture that empowers and encourages more people to do more things with more data in the execution of their duties.

And of course, this may well have been Eric’s point from the beginning…

 


[1] I’ve been serving on the jury for a moderately complex civil trial for most of August, and because the trial is in downtown Seattle during business hours I have been working early mornings and evenings in the office, and taking the bus to the courthouse to avoid the traffic and parking woes that plague Seattle. I am very, very tired.

[2] Please remind me to add “thought leader” to my LinkedIn profile. Also maybe something about blockchain.

[3] I’ll leave this as an exercise for the reader.

[4] At least in my reality. Your mileage may vary.

[5] Did this analogy hold up to even distant observation?

Power BI dataflows, Premium, and ADLSg2

I received a question today via Twitter, and although I know the information needed to answer it is available online, I don’t believe there’s a single concise answer anywhere[1]. This is the question, along with a brief elaboration following my initial response:

Billing Twitter

Here’s the short answer: When you use an organizational ADLSg2 account to store dataflow data, your Azure subscription will be billed for any storage and egress based on however Azure billing works[2].

Here’s the longer answer:

  • Power BI dataflows data counts against the same limits as Power BI datasets. Each Pro license grants 10 GB of storage,  and a Premium capacity node includes 100 TB of storage.
  • Integrating Power BI dataflows with ADLSg2 is not limited to Power BI Premium.
  • When you’re using Power BI dataflows in their default configuration, dataflow data is stored to this Power BI storage, and counts against the appropriate quota.
  • When dataflow data is saved to Power BI storage, it can only be accessed by Power BI – no other services or applications can read the data.
  • When you configure your dataflows to use an organizational ADLSg2 account, the dataflow data is saved to the Azure resource you specify, and not to the Power BI storage, so it doesn’t count against the Pro or Premium storage quota. This is particularly significant when you’re not using Power BI Premium, as ADLSg2 storage will scale to support any scenario, and not be limited by the 10 GB Pro storage limit.
  • When dataflow data is saved to ADLSg2, the CDM folders can be accessed by any authorized client via Azure APIs, and by Power BI as dataflow entities. This is particularly valuable for enabling collaboration between analysts and other Power BI users, and data scientists and other data professionals using Azure tools.

Hopefully this will help clear things up. If you have any questions, please let me know!


[1] Please note that I didn’t actually go looking to make sure, because I was feeling lazy and needed an excuse to blog about something vaguely technical.

[2] I add that final qualifier because I am not an authority on Azure or Power BI billing, or on licensing of any sort. For any specific information on licensing or billing, please look elsewhere for expert advice, because you won’t find it here.

 

Choosing the right tool for the job

A few weeks back[1] I got this comment on my Power BI dataflows overview post:

choosing tool comment

This morning I started to reply to the comment, and by the time I was done I realized that it should be a blog post on its own. So here we go… but before I continue I feel compelled to share this blurb from my blog bio:

I’m a program manager on the Power BI team at Microsoft. This is my personal blog, and all posts and opinions are mine and mine alone, and if you believe my teenage children they are not shared by anyone else in the entire world, because I’m weird and embarrassing.

So… There are a few capabilities that make dataflows “pop” for me. This shouldn’t be taken as a comprehensive list, and isn’t intended to say that dataflows are better than any other data prep tool, but hopefully it will be useful:

  • Power BI dataflows build on the Power Query experience many users are already familiar with. If you know how to use Power Query in one tool, you know how to use it in another.
  • Power BI dataflows are integrated into the end-to-end Power BI service and experience. You’re not pushing data into a CSV file. You don’t need to provision and manage a database. You get strongly typed “tables” of data that can be reused by you or by other users, and it’s all in one seamless experience.
  • Power BI dataflows and CDM folders provide capabilities for bridging the low-code/no-code world of self-service BI with managed central corporate BI in Azure.
  • Power BI dataflows enable Excel-like composition of ETL processes with linked and computed entities.
  • Power BI dataflows can scale beyond the desktop and leverage the power of the cloud to become part of an end-to-end BI application.

But… This is just a list of features.

The right tool for a job depends largely on the context of that job. If you’re trying to say that one tool is better than another, you need to have project/selection/evaluation criteria that everyone[2] agrees on. Only then you can compare multiple tools against those criteria. If you can’t do that, you’re probably just having a popularity contest.

This is one of the reasons why analysts like Gartner and and Forrester play the role that they do – they define and document their criteria, and then do exhaustive research to evaluate tools against those criteria. They take great pains to make sure that their criteria align well with the needs of the industry. They evolve the criteria as the market evolves, and they update their analyses as products evolve.

If you take this type of approach, you’ll probably end up choosing the tool that’s right for the job at hand – even if it’s not the tool you had in mind when you started. It’s not always easy to convince everyone to step back and look at the big picture before making what may feel like a small decision, but when choosing tools and platforms it’s often a good way to save time and effort in the long run.


[1] July has been kicking my butt, even more so than usual this year. Not only have I not been blogging consistently, I’ve basically put everything on hold that wasn’t vitally important and critically urgent. Ugh.

[2] For a given value of “everyone.” Identifying the necessary and appropriate stakeholders is a big enough problem on its own…

Quick Tip: Implementing “IsNumeric” in Power Query

I’ve had many people tell me that they have a blog so they will have a place to look to find the information that they need, and answers to their own future questions. This isn’t typically the case for me… but this post is an exception.

Every few months I find myself searching for an IsNumeric function[1] in Power Query. Each time I end up spending a few frustrating hours before I find the solution I’m looking for, because the search terms I’m using aren’t how people who have already solved the problem have shared it.

Last night, when I was once again failing to find the solution I needed I reached out to Twitter for help, and Imke Feldman delivered:.

If you’re not sure how you’d use this solution, consider the problem I was working on last night. My source data includes a “release-date” column that can include a wide range of values, and because I do not have any influence over the source system, I need to work with what it contains. Specifically, I need to find records that contain a year, and I need to extract that year into a new numeric column.

01 data

There are a few more pattern variations in the data including two-digit years, but this sample shows the basic problem I need to solve. And in order to solve it, I need to be able to determine if the two characters at the end of the text represent a number.

Using Imke’s approach, this is what I ended up doing:

  1. Extract the last two characters from the source column
  2. When the extracted characters contain a number, convert them to numeric
  3. Standardize the resulting value to represent the correct year

This last step is specific to my particular “date as year encoded in different text formats” problem, but I’m including it here in case someone else may find it useful.

It looks like this:

02 solution

Thanks to Imke for her quick response. Hopefully there are people out there other than me who will find this useful… but I know that I will.


[1] This is how I think of a function that accepts a text value returns true if that value contains a number, and false if it does not. This is probably because of my decades-long love affair with Transact-SQL, since T-SQL has an ISNUMERIC function built in.

Are you building a BI house of cards?

Every few weeks I see someone asking about using Analysis Services as a data source for Power BI dataflows. Every time I hear this, I cringe, and then include advice something like this[1] in my response.

Using Analysis Services as a data source is an anti-pattern – a worst practice. It is not recommended, and any solution built using this pattern is likely to produce dissatisfied customers. Please strongly consider using other data sources, likely the data sources on which the AS model is built.


There are multiple reasons for this advice.


Some reasons are technical. Extraction of large volumes of data is not what an Analysis Services model is designed for. Performance for the ETL process is likely to be poor, and you’re likely end up with memory/caching issues on the Analysis Services server. Beyond this, AS models typically don’t include the IDs/surrogate keys that you need for data warehousing, so joining the AS data to other data sources will be problematic.[2]


For some specific examples and technical deep dives into how and why this is a bad idea, check out this excellent blog post from Shabnam Watson. The focus of the post is on SSAS memory settings, but it’s very applicable to the current discussion.


Some reasons for this advice are less technical, but no less important. Using analytics models as data sources for ETL processing are a strong code smell[3] (“any characteristic in the source code of a program that possibly indicates a deeper problem”) for business intelligence solutions.


Let’s look at a simple and familiar diagram:


01 good


There’s a reason this left-to-right flow is the standard representation of BI applications: it’s what works. Each component has specific roles and responsibilities that complement each other, and which are aligned with the technology used to implement the component. This diagram includes a set of logical “tiers” or “layers” that are common in analytics systems, and which mutually support each other to achieve the systems’ goals.
Although there are many successful variations on this theme, they all tend to have this general flow and these general layers. Consider this one, for example:


02 ok

This example has more complexity, but also has the same end-to-end flow as the simple one. This is pretty typical for  scenarios where a single data warehouse and analytics model won’t fulfill all requirements, so the individual data warehouses, data marts, and analytics models each contain a portion – often an overlapping portion – of the analytics data.

Let’s look at one more:

03 - trending badly

This design is starting to smell. The increased complexity and blurring of responsibilities will produce difficulties in data freshness and maintenance. The additional dependencies, and the redundant and overlapping nature of the dependencies means that any future changes will require additional investigation and care to ensure that there are no unintended side effects to the existing functionality.

As an aside, my decades of working in data and analytics suggest that this care will rarely actually be taken. Instead, this architecture will be fragile and prone to problems, and the teams that built it will not be the teams who solve those problems.

And then we have this one[4]:

04 - hard no

This is what you get when you use Analysis Services as the data source for ETL processing, whether that ETL and downstream storage is implemented in Power BI dataflows or different technologies. And this is probably the best case you’re likely to get when you go down this path. Even with just two data warehouses and two analytics models in the diagram, the complex and unnatural dependencies are obvious, and are painful to consider.

What would be better here?[5] As mentioned at the top of the post, the logical alternative is to avoid using the analytics model and to instead use the same sources that the analytics model already uses. This may require some refactoring to ensure that the duplication of logic is minimized. It may require some political or cross-team effort to get buy-in from the owners of the upstream systems. It may not be simple, or easy. But it is almost always the right thing to do.

Don’t take shortcuts to save days or weeks today that will cause you or your successors months or years to undo and repair. Don’t build a house of cards, because with each new card you add, the house is more and more likely to fall.


[1] Something a lot like this. I copied this from a response I sent a few days ago.

[2] Many thanks to Chris Webb for some of the information I’ve paraphrased here. If you want to hear more from Chris on this subject, check out this session recording from PASS Summit 2017. The whole session is excellent; the information most relevant to this subject begins around the 26 minute mark in the recording. Chris also gets credit for pointing me to Shabnam Watson’s blog.

[3] I learned about code smells last year when I attended a session by Felienne Hermans at Craft Conference in Budapest. You can watch the session here. And you really should, because it’s really good.

[4] My eyes are itching just looking at it. It took an effort of will to create this diagram, much less share it.

[5] Yes, just about anything would be better.

Quick Tip: Power Query column by example

It seems like everyone knows about how Power Query in Power BI Desktop lets you use web pages as data sources, and lets you build queries by entering values from the web page. This is probably because so many people use the “from web page by example” feature for demos[1].

It also seems like no one knows about the “column from examples” feature in Power Query, even though this is just as exciting, and much more useful. They’re both built on the same underlying intelligence, but this one lets you work with data from any source.

Here’s the quick overview:

  1. In the Power Query editor in Power BI Desktop, choose “Column from Examples” from the “Add Column” tab.
  2. Enter the values that the new column should have for rows that are already in your data set.
  3. Review the values that Power Query is suggesting for the other rows, and when they are all correct, choose OK, and then say “Ooooooohhhhh” when Power Query does all the work for you.

Like this[2]:

column by example

In this example, I want to extract an ID value from a URL column where the URLs all follow the same pattern. I’ve copied the ID value from the first row, and when I paste it in to the new column, Power Query immediately generates the M code to extract the ID and adds a new step the query.

= Table.AddColumn(#”Renamed Columns”, “Text After Delimiter”, each Text.AfterDelimiter([Band Page URL], “/”, 4), type text)

This code isn’t too complex. You could have written this code. I could have written it, but odds are it would have taken me a few minutes to look up the syntax for the Text.AfterDelimiter function, and to iterate a few times getting it right. Using this awesome feature I get the same code in seconds. For more complex scenarios you may need to enter more than one desired value before Power Query has enough information to get the function just right.

The best part is that the step that is added is just another query step. You can delete it, you can edit it, you can add more steps after it. This is just a fast and simple way to add the step without needing to write any code or even know what functions to use.

There have been a few occasions where Power Query couldn’t quite figure out what I needed it to do, but it hasn’t happened often, or lately. This may be because I’ve been getting better, but it’s probably because the Power Query team has continued to improve the feature since it was introduced in April 2017. That’s right – this feature has been available for over two years, and you’re just learning about it now.

Or are you[3]?

Either way, check it out the next time you’re working in Power Query, and let me know what you think.


[1] To be fair, it does make for an awesome demo.

[2] There also used to be a GIF here, but apparently my GIF-making skills need some refinement before I start using GIFs in blog posts.

[3] When I demo Power Query in Power BI Desktop to customers, maybe 10% have already seen this feature already. I suspect that the high quality demographic that this blog is certain to attract will be more aware and more engaged than the general population, so I’m genuinely curious to know if you already knew about this gem.