Using and reusing Power BI dataflows

I use this diagram a lot[1]:

excel white

This diagram neatly summarizes a canonical use case for Power BI dataflows, with source data being ingested and processed as part of an end-to-end BI application. It showcases the Lego-like composition that’s possible with dataflows. But it also has drawbacks – its simplicity omits common scenarios for using and reusing dataflows.

So, let’s look at what’s shown – and at what’s not shown – in my favorite diagram. Let’s look at some of the ways these dataflows and their entities can be used.

  1. Use the final entities as-is: This is the scenario implied by the diagram. The entities in the “Final Business View” dataflow represent a star schema, and are loaded as-is into a dataset.
  2. Use the final entities with modification: The entities in the “Final Business View” dataflow are loaded into a dataset, but with additional transformation or filtering applied in the dataset’s queries.
  3. Use the final entities with mashup: The entities in the “Final Business View” dataflow are loaded into a dataset, but with additional data from other sources added via the dataset’s queries.
  4. Use upstream entities: The entities in other dataflows are loaded into a dataset, likely with transformations and filtering applied, and with data from other sources added via the dataset’s queries.

Please understand that this list is not exhaustive. There are likely dozens of variations on these themes that I have not called out explicitly. Use this list as a starting point and see where dataflows will take you. I’ll keep the diagram simple, but you can build solutions as complex as you need them to be.


[1] This is my diagram. There are many like it, but this one is mine.

 

Power BIte: Sharing and reuse with dataflows and datasets

Last week I kicked off the new BI Polar YouTube channel with a video and blog post comparing and contrasting Power BI dataflows and datasets. In the days that followed, I continued to hear questions that led me to believe I hadn’t done a great job answering one vital question:

When would I use dataflows, and when would I use shared datasets?

Here’s the short answer:

And here’s the long answer: Lego Bricks and the Spectrum of Data Enrichment and Reuse.

The video focuses on – and demonstrates – sharing and reuse with both dataflows and datasets. It’s short and to the point[1] and focuses on this one question.

The blog post takes a more conceptual view, using Lego bricks as a metaphor for dataflows and datasets and the types of reuse they enable.

If you’ve watched the videos and read the posts and you still have questions, please let me know.


[1] As short and to the point as anything you’re likely to get from me, anyway. Brevity has never been my forte.

Fiore’s Virtues of Business Intelligence

In the late 1300s and early 1400s, Fiore de’i Liberi was a knight, a diplomat, and a fencing master. He also wrote one of the most comprehensive treatises on medieval combat, his Flower of Battle, of which four copies survive in museums and private collections today. Fiore started – or was a significant evolutionary step in – one of the most important and long-lasting traditions in armed and unarmed combat.

In addition to detailed instruction on fighting with dagger, longsword, spear, and other weapons, Fiore’s manuscript included a preface with information about the virtues that any fencer[1] would need to be successful in combat.

MS_Ludwig_XV_13_32r.jpg

In the image above, Fiore pictures the seven blows of the sword, and his four virtues, each represented by a different animal[2][3]:

This Master with these swords signifies the seven blows of the sword. And the four animals signify four virtues, that is prudence, celerity, fortitude, and audacity. And whoever wants to be good in this art should have part in these virtues.

Fiore then goes on to describe each virtue in turn:

Prudence
No creature sees better than me, the Lynx.
And I always set things in order with compass and measure.

Celerity
I, the tiger, am so swift to run and to wheel
That even the bolt from the sky cannot overtake me.

Audacity
None carries a more ardent heart than me, the lion,
But to everyone I make an invitation to battle.

Fortitude
I am the elephant and I carry a castle as cargo,
And I do not kneel nor lose my footing.[4]

Step back and read this again: “And whoever wants to be good in this art should have part in these virtues.”

That’s right – Fiore was documenting best practices, 600+ years ago. And although I suspect that Fiore wasn’t thinking about business intelligence projects at the time, I do believe that these virtues are just as relevant to the slicing and dicing[5] we’re still doing today. Let me explain.

Prudence – “…I always set things in order with compass and measure“: A successful BI practitioner knows what needs to be done before a project can begin, and when additional work is required before they can get started. Initiating a project requires careful setup and planning, and moving before the prerequisites for success are in place can be disastrous.[6]

Celerity – “I… am so swift to run and to wheel that even the bolt from the sky cannot overtake me:  Business requirements change day to day and hour to hour. To succeed, a BI practitioner must be prepared to move quickly and decisively, engaging without delay when an opportunity presents itself – and also be prepared to change direction as the needs of the project change.

Audacity – “…to everyone I make an invitation to battle:  Any project declined presents an opening for another practitioner, another team, another tool, and this is likely to reduce opportunities over time. Saying yes to difficult projects – and succeeding in their execution – is necessary to ensure that future projects don’t pass you by.

Fortitude – “And I do not kneel nor lose my footing: When Fiore speaks of fortitude, he does not speak of the strength that comes from big muscles. He speaks of the strength that comes from structure, and balance. His “elephant with a castle on its back” is a perfect metaphor for a BI solution delivered quickly and confidently because of the solid and stable platform on which it is built. Success doesn’t come from the extra effort put in when delivering a solution – it comes from the care and planning that went into the overall data estate.

You may look at these virtues and see contradiction – how can you have prudence and audacity and celerity? The answer for BI is the same answer that it is for the sword: practice, training, and preparation. In both situations, whether you’re battling with an armed foe or battling with a difficult client, you need to apply the right virtues at the right times, and to understand both the big picture and the day to day steps that produce larger successes. In both situations you’re also facing complex and dynamic challenges where you need to quickly take advantage of opportunities as they arise, and create opportunities when they don’t appear on their own[7]. Fortunately, as BI practitioners we can rely on the strengths of our teams – it’s not always a solo battle.

You may also look at these virtues and see Matthew stretching to make the most tenuous of analogies work, just because he loves swords as much as he loves BI. While this may be true, I do honestly believe that these virtues do apply here. Over the past 20-25 years I have seen many projects succeed because these virtues were embodied by the people and teams involved, and I’ve seen many projects fail where these virtues were absent. This isn’t the only way to look at success factors… but at the moment it’s my favorite.

In closing, I’d like to mention that this post marks one year since I started this blog. In the past year I’ve published almost 90 posts, and have had roughly 50,000 visitors and 100,000 page views. Here’s hoping that by applying Fiore’s virtues I’ll be able to make the next year even more productive and more successful than the year that has passed.

Thanks to all of you who read what I write, and who provide feedback here and on Twitter – I couldn’t do it without you.


[1] Fencer in this context meaning someone who fights with swords or other edged weapons, not the Olympic-style sport of fencing that a modern reader might picture when reading the word.

[2] As translated by Michael Chidester and Colin Hatcher.

[3] Although it may not be obvious to the modern reader, the animal at the bottom is an elephant with a tower or castle on its back. I suspect that Fiore never actually saw an elephant.

[4] In case these terms don’t immediately have meaning, prudence == wisdom, celerity == speed, audacity == daring, and fortitude == strength.

[5] See what I did there?

[6] I assume that Fiore’s use of the term “measure” here is pure coincidence.

[7] If you’ve worked on a high-stakes, high-visibility BI project where requirements changed during implementation, or where not all stakeholders were fully committed to the project goals, this will probably feel very familiar.

Power BI dataflows best practices

I do a lot that’s not related to dataflows. In fact, dataflows take up a surprisingly small part of my day, if your insight into my calendar came solely from this blog.

Despite this, I like to believe that I’m keeping my finger on the pulse of this feature, and when I learned today that the dataflows team had published best practice guidance almost a month ago, I was shocked and surprised.

Here are those best practices: https://docs.microsoft.com/en-us/power-bi/service-dataflows-best-practices

Image by rawpixel from Pixabay
This is apparently where Matthew thinks documentation comes from

In my defense, the blog post where this guidance was announced was the overall September update for dataflows, and it was the last link at the bottom of the post… but I still should have noticed.

These practices were produced by the dataflows team, and are based on questions and support tickets from customers around the world. Definitely check them out, and see how you can incorporate them into your Power BI solutions!

Power BIte: Dataflows vs. datasets

This is still one of the most common dataflows questions: what’s the difference between Power BI dataflows and Power BI datasets?

For the last year I have resisted tackling this question head-on. This isn’t because it’s a bad or “dumb” question. Just the opposite – this is a very simple question, and the simpler a question is, the more complex and nuanced the answer is likely to be.

See how complex this is?
A graphical representation of the answer’s likely complexity.

If you’re a regular reader of this blog, you probably already know the answer, because I’ve answered it already. Sort of. The existing answer is distributed across dozens of posts, and if you’ve read all of them and picked up the answer along the way. But I keep hearing this question, and I keep thinking that there must be a more direct answer I could share.

Here it is, in a single, simple table[1].

Power BI dataflows Power BI datasets
Implementation CDM folder Analysis Services tabular model
Storage[2] CSV files Vertipaq
Metadata[3] Common Data Model – model.json BISM
Development Power Query Online Power Query in Power BI Desktop
Primary purpose[4] Data reuse Data analysis
Reuse[5] Acts as data source in multiple datasets Shared datasets across workspaces
Scope of reuse[6] Entity level reuse Dataset level reuse
Mashup with other data sources[7] Yes No
Used for reporting[8] Not directly Yes
Reuse outside Power BI[9] Yes, through ADLSg2 Yes, through XMLA
Data access methods[10] Import Import, DirectQuery
Connection methods[11] Import Live Connection
Row-level security No Yes
Certification and promotion Not yet Yes
What else am I missing? Please let me know! Seriously, you should let me know.

Update: I’ve added a few rows to the table after the post was originally published, to incorporate feedback from readers on differences I had missed. Thank you!

Each of the rows in this table could easily be an in-depth topic in and of itself, so if you’re looking at any of them and thinking “that’s not quite right” I might very well agree with you. There’s a lot of context and a lot of nuance here, and we’re trying to sum things up in a word or two… which is kind of the whole point.

Oh yeah, there’s a video too.[12]

I can’t wait to hear what you think!


[1] A simple table with ten footnotes.

[2] The storage aspect of dataflows and datasets is one of the most significant differences between the two. Datasets use the Vertipaq column store to load data into an optimized and highly compressed in-memory representation that is optimized for analysis. Dataflows use text files in folders, which are optimized for interoperability.

[3] The Analysis Services Tabular engine uses the BI Semantic Model (BISM) to represent its metadata. This is a metadata model originally included in SQL Server 2012 Analysis Services, and used by the Tabular engine ever since.

[4] Saying “this is the primary purpose” of any complex tool is fraught with risk, because no matter what you say, there are other valid things that remain unsaid. With this said… the big gap that dataflows close is that of self-service data preparation for the purpose of data sharing and reuse. Power BI has always had self-service data preparation through Power Query, but before dataflows the data that was prepared was “locked” in a dataset, for analysis, and not for sharing or reuse.

[5] Once you have loaded data into dataflows, authorized users can reuse entities from multiple dataflows, and use them as the building blocks for new dataflows or new datasets. Once you have loaded data into a dataset (and published it to the Power BI service) you can enable users to connect to it.

[6] With dataflows, users can pick and choose the entities they want, but a dataset can only be reused as-is.

[7] Dataflow entities can be used as data sources in the same Power BI Desktop file as other data sources, and can serve as part of a mashup or composite model, but a dataset can only be reused as-is.

[8] Although you can obviously use dataflows for reporting, you do so by first importing the data from the dataflow into a dataset.

[9] It’s interesting to point out that using your own organizational ADLSg2 account does not require Power BI Premium, but using the XMLA endpoint to connect to Power BI datasets from non-Power BI clients does.

[10] You can only import data into your dataflow entities, but tables in your dataset can import data or use DirectQuery, and a dataset can use a combination of the two.

[11] You can only import data from a dataflow into a dataset. When connecting to a shared dataset you can only use Live Connections.

[12] I’ve been thinking of making videos to supplement this blog for almost as long as I’ve been hearing the question that inspired this post. Please take a moment to share your thoughts on the video. This is something of a “soft launch” and although I have plans for a few dozen more videos already, your feedback will be a main factor in how the video series evolves.

Power BI dataflows and query folding

In a recent post I mentioned an approach for working around the import-only nature of Power BI dataflows as a data source in Power BI Desktop, and in an older post I shared information about the enhanced compute engine that’s currently available in preview.

Image by Chris Martin from Pixabay

Some recent conversations have led me to believe that I should summarize a few points about dataflows and query folding, because these existing posts don’t make them easy to find and understand.

  1. When accessing dataflow entities from Power BI Desktop, no query folding takes place, even if the enhanced compute engine is enabled.
  2. When accessing dataflow entities from other entities in the Power BI service, no query folding takes place unless the enhanced compute engine enabled.
  3. When accessing dataflow entities from other entities in the Power BI service, query folding will take place when the enhanced compute engine is enabled, because the linked entity’s query will be executed against the cached data in SQL, rather than the underlying CDM folder.

These three statements summarize how query folding works – or does not work – in Power BI dataflows today.

The Power BI team has discussed some of their plans for the enhanced compute engine, so this should change in the future [1] but as of today, the only dataflows scenario where query folding takes place is when a dataflow is backed by the enhanced compute engine is referenced by a linked entity.

I hope this helps clarify things, at least a little…


[1] I think this will be around the time the engine goes GA, but I don’t remember for sure, and I’m too lazy to re-watch the MBAS session to double check. If you watch it and let me know, I’ll gladly update this post with the details.