Every few weeks I see someone asking about using Analysis Services as a data source for Power BI dataflows. Every time I hear this, I cringe, and then include advice something like this in my response.
Using Analysis Services as a data source is an anti-pattern – a worst practice. It is not recommended, and any solution built using this pattern is likely to produce dissatisfied customers. Please strongly consider using other data sources, likely the data sources on which the AS model is built.
This example has more complexity, but also has the same end-to-end flow as the simple one. This is pretty typical for scenarios where a single data warehouse and analytics model won’t fulfill all requirements, so the individual data warehouses, data marts, and analytics models each contain a portion – often an overlapping portion – of the analytics data.
Let’s look at one more:
This design is starting to smell. The increased complexity and blurring of responsibilities will produce difficulties in data freshness and maintenance. The additional dependencies, and the redundant and overlapping nature of the dependencies means that any future changes will require additional investigation and care to ensure that there are no unintended side effects to the existing functionality.
As an aside, my decades of working in data and analytics suggest that this care will rarely actually be taken. Instead, this architecture will be fragile and prone to problems, and the teams that built it will not be the teams who solve those problems.
And then we have this one:
This is what you get when you use Analysis Services as the data source for ETL processing, whether that ETL and downstream storage is implemented in Power BI dataflows or different technologies. And this is probably the best case you’re likely to get when you go down this path. Even with just two data warehouses and two analytics models in the diagram, the complex and unnatural dependencies are obvious, and are painful to consider.
What would be better here? As mentioned at the top of the post, the logical alternative is to avoid using the analytics model and to instead use the same sources that the analytics model already uses. This may require some refactoring to ensure that the duplication of logic is minimized. It may require some political or cross-team effort to get buy-in from the owners of the upstream systems. It may not be simple, or easy. But it is almost always the right thing to do.
Don’t take shortcuts to save days or weeks today that will cause you or your successors months or years to undo and repair. Don’t build a house of cards, because with each new card you add, the house is more and more likely to fall.
Update: The post above focused mainly on technical aspects of the anti-pattern, and suggests alternative recommended patterns to follow instead. It does not focus on the reasons why so many projects are pushed into the anti-pattern in the first place. Those reasons are almost always based on human – not technical – factors.
You should read this post next: http://workingwithdevs.com/its-always-a-people-problem/. It presents a delightful and succinct approach to deal with the root causes, and will put the post you just read in a different context.
 Something a lot like this. I copied this from a response I sent a few days ago.
 Many thanks to Chris Webb for some of the information I’ve paraphrased here. If you want to hear more from Chris on this subject, check out this session recording from PASS Summit 2017. The whole session is excellent; the information most relevant to this subject begins around the 26 minute mark in the recording. Chris also gets credit for pointing me to Shabnam Watson’s blog.
 I learned about code smells last year when I attended a session by Felienne Hermans at Craft Conference in Budapest. You can watch the session here. And you really should, because it’s really good.
 My eyes are itching just looking at it. It took an effort of will to create this diagram, much less share it.
 Yes, just about anything would be better.
10 thoughts on “Are you building a BI house of cards?”
This is one to be *red and bookmarked for future reference. I have had this same conversations several times. SSAS cubes or Tabular models as source for other cubes. There is a special place …. for people who play a part in the creation of such monstrosities.
LikeLiked by 1 person
Fabian: Love the comment. Reminds me of the Excel world, where I see PivotTables as a source for PivotTables. Unfortunately the users of solutions like these are more likely to end up in that ‘special place’ than the architects.
Hi Matthew, Excellent blog. I just downloaded the June ver of PowerBI on my Win desktop, where I have XAMPP running with MariaDB. My Maria install was performed by the book, and it runs flawlessly. I installed the msql-connect-net-6.6.5.msi connector, and logged in from PowerBI to Maria using my normal credentials. The login was successful. Unfortunately, I then encountered this error
and am reluctant to run mysql_upgrade, for fear of hosing my dev server (I have heard the horror stories). too bad: i have to try to use csv files, instead of a direct connection in order to check out PowerBI! I was hoping to experienced something more advanced in terms of live mapping to sql tables and views. for e.g. Still, great blog. You should have an army of followers!
Thanks for the kind words. Please be aware that I have no experience with Maria and cannot provide any help for the problem you describe.
Thanks for the reply. I will investigate the issue with some thoroughness, and publish the fix on my blog. In the meantime, will use upload CSV test files (which I reckon is what is meant by “Data Flows)” and study your tutorials on Power Bi. I am particularly interested in how BI might reconcile the data semantics of heterogeneous schemas, without first requiring that all flows be fed into some a priori unified reference model.
Please check out https://ssbipolar.com/2018/10/23/dataflows-in-power-bi/ as a starting point to learn about dataflows. This is a significant feature in Power BI, and you’ll be better served to understand its capabilities before you begin.
I used mysql upgrade to fix maria on xampp and BI connected to it fine after that. I will check out your dataflow tutorials in the next month, as I build my WP plugin app on the server using php/js/html/css/json. My app exports reports and CSV data files, and I am hoping that BI can offer a pretty face and further analytical functionality to my end users. From what I can tell, BI stores large data sets in the Azure cloud that it then manipulate on the desktop via BI. There is mapping required to the MS common data model or (apparently) a custom data model (my preference; I already have one and it is well suited to the problem domain). Not sure how much MS would charge Woocommerce users who might wish to store analytical data on the cloud and do interesting analysis. I could write all the needed templates for this, but if MS charges an arm and a leg, then my projected customer base (thrifty SMB merchants who run ecommerce stores via the Woocommerce platform) would most likely not use it. Plus it has to be super easy for them in term of ETL or BI would be relegated in their minds to being more of a corporate thing with dedicated data wranglers doing the heavy lifting. Either way, I will see if the BI analytics go beyond eye candy (colorful charts etc) and are actually sophisticated from a data quant perspective. I watched some of the canned presentations on YouTube by MS earlier this year re the new release of BI, and the younger MS presenters struck me as somewhat technically naive when it comes to understanding the complex challenges of large scale DSS data modeling using heterogeneous data sources. My hope is that BI is more than just Excel on steroids. with Azure replacing Access on the back end.
I didn’t understand what you were saying until I read the article and saw the diagrams. That last diagram makes me start to twitch… and I’m probably going to have nightmares tonight.
I’m glad it’s not just me!
When Power BI first started to become enterprises ready it didn’t have the advanced capabilities like Data Flows or even sharing datasets between Workspaces. This anti-pattern made a lot of sense then.