Quick Tip: Implementing “IsNumeric” in Power Query

I’ve had many people tell me that they have a blog so they will have a place to look to find the information that they need, and answers to their own future questions. This isn’t typically the case for me… but this post is an exception.

Every few months I find myself searching for an IsNumeric function[1] in Power Query. Each time I end up spending a few frustrating hours before I find the solution I’m looking for, because the search terms I’m using aren’t how people who have already solved the problem have shared it.

Last night, when I was once again failing to find the solution I needed I reached out to Twitter for help, and Imke Feldman delivered:.

If you’re not sure how you’d use this solution, consider the problem I was working on last night. My source data includes a “release-date” column that can include a wide range of values, and because I do not have any influence over the source system, I need to work with what it contains. Specifically, I need to find records that contain a year, and I need to extract that year into a new numeric column.

01 data

There are a few more pattern variations in the data including two-digit years, but this sample shows the basic problem I need to solve. And in order to solve it, I need to be able to determine if the two characters at the end of the text represent a number.

Using Imke’s approach, this is what I ended up doing:

  1. Extract the last two characters from the source column
  2. When the extracted characters contain a number, convert them to numeric
  3. Standardize the resulting value to represent the correct year

This last step is specific to my particular “date as year encoded in different text formats” problem, but I’m including it here in case someone else may find it useful.

It looks like this:

02 solution

Thanks to Imke for her quick response. Hopefully there are people out there other than me who will find this useful… but I know that I will.

[1] This is how I think of a function that accepts a text value returns true if that value contains a number, and false if it does not. This is probably because of my decades-long love affair with Transact-SQL, since T-SQL has an ISNUMERIC function built in.

Are you building a BI house of cards?

Every few weeks I see someone asking about using Analysis Services as a data source for Power BI dataflows. Every time I hear this, I cringe, and then include advice something like this[1] in my response.

Using Analysis Services as a data source is an anti-pattern – a worst practice. It is not recommended, and any solution built using this pattern is likely to produce dissatisfied customers. Please strongly consider using other data sources, likely the data sources on which the AS model is built.

There are multiple reasons for this advice.

Some reasons are technical. Extraction of large volumes of data is not what an Analysis Services model is designed for. Performance for the ETL process is likely to be poor, and you’re likely end up with memory/caching issues on the Analysis Services server. Beyond this, AS models typically don’t include the IDs/surrogate keys that you need for data warehousing, so joining the AS data to other data sources will be problematic.[2]

For some specific examples and technical deep dives into how and why this is a bad idea, check out this excellent blog post from Shabnam Watson. The focus of the post is on SSAS memory settings, but it’s very applicable to the current discussion.

Some reasons for this advice are less technical, but no less important. Using analytics models as data sources for ETL processing are a strong code smell[3] (“any characteristic in the source code of a program that possibly indicates a deeper problem”) for business intelligence solutions.

Let’s look at a simple and familiar diagram:

01 good

There’s a reason this left-to-right flow is the standard representation of BI applications: it’s what works. Each component has specific roles and responsibilities that complement each other, and which are aligned with the technology used to implement the component. This diagram includes a set of logical “tiers” or “layers” that are common in analytics systems, and which mutually support each other to achieve the systems’ goals.
Although there are many successful variations on this theme, they all tend to have this general flow and these general layers. Consider this one, for example:

02 ok

This example has more complexity, but also has the same end-to-end flow as the simple one. This is pretty typical for  scenarios where a single data warehouse and analytics model won’t fulfill all requirements, so the individual data warehouses, data marts, and analytics models each contain a portion – often an overlapping portion – of the analytics data.

Let’s look at one more:

03 - trending badly

This design is starting to smell. The increased complexity and blurring of responsibilities will produce difficulties in data freshness and maintenance. The additional dependencies, and the redundant and overlapping nature of the dependencies means that any future changes will require additional investigation and care to ensure that there are no unintended side effects to the existing functionality.

As an aside, my decades of working in data and analytics suggest that this care will rarely actually be taken. Instead, this architecture will be fragile and prone to problems, and the teams that built it will not be the teams who solve those problems.

And then we have this one[4]:

04 - hard no

This is what you get when you use Analysis Services as the data source for ETL processing, whether that ETL and downstream storage is implemented in Power BI dataflows or different technologies. And this is probably the best case you’re likely to get when you go down this path. Even with just two data warehouses and two analytics models in the diagram, the complex and unnatural dependencies are obvious, and are painful to consider.

What would be better here?[5] As mentioned at the top of the post, the logical alternative is to avoid using the analytics model and to instead use the same sources that the analytics model already uses. This may require some refactoring to ensure that the duplication of logic is minimized. It may require some political or cross-team effort to get buy-in from the owners of the upstream systems. It may not be simple, or easy. But it is almost always the right thing to do.

Don’t take shortcuts to save days or weeks today that will cause you or your successors months or years to undo and repair. Don’t build a house of cards, because with each new card you add, the house is more and more likely to fall.

[1] Something a lot like this. I copied this from a response I sent a few days ago.

[2] Many thanks to Chris Webb for some of the information I’ve paraphrased here. If you want to hear more from Chris on this subject, check out this session recording from PASS Summit 2017. The whole session is excellent; the information most relevant to this subject begins around the 26 minute mark in the recording. Chris also gets credit for pointing me to Shabnam Watson’s blog.

[3] I learned about code smells last year when I attended a session by Felienne Hermans at Craft Conference in Budapest. You can watch the session here. And you really should, because it’s really good.

[4] My eyes are itching just looking at it. It took an effort of will to create this diagram, much less share it.

[5] Yes, just about anything would be better.