Power BIte: Creating dataflows by importing model.json

This week’s Power BIte is the third in a series of videos[1] that present different ways to create new Power BI dataflows, and the results of each approach.

When creating a dataflow by importing a model.json file previously exported from Power BI, the dataflow will have the following characteristics:

Attribute Value
Data ingress path Ingress via the mashup engine hosted in the Power BI service.
Data location Data stored in the CDM folder defined for the newly created dataflow
Data refresh The dataflow is refreshed based on the schedule and policies defined in the workspace.

Let’s look at the dataflow’s model.json metadata to see some of the details.

2019-11-06-14-50-53-981--Code

At the top of the file we can see the dataflow name on line 2…

…and that’s pretty much all that’s important here. The rest of the model.json file will exactly match what was exported from the Power BI portal, and will look a lot like this or this. Boom.

For a little more detail (and more pictures, in case you don’t want to watch a four minute video) check out this post from last month, when this capability was introduced.

If this information doesn’t make sense yet, please hold on. We still have one more incoming Power BIte in this series, and then we’ll have the big picture view.

I guarantee[3] it will make as much sense as anything on this blog.


[1] New videos every Monday morning![2]

[2] Did you notice that I just copied the previous post and made some small edits to it? That seemed very appropriate given the topic…

[3] Or your money back.

Power BIte: Creating dataflows with linked and computed entities

This week’s Power BIte is the second in a series of videos[1] that present different ways to create new Power BI dataflows, and the results of each approach.

When creating a dataflow by defining new entities, the final dataflow will have the following characteristics:

Attribute Value
Data ingress path Ingress via the mashup engine hosted in the Power BI service, using source data that is also managed by the Power BI service, taking advantage of locality of data.
Data location Data stored in the CDM folder defined for the dataflow for computed entities. Data for linked entities remains in source dataflow and is not moved or copied.
Data refresh The dataflow is refreshed based on the schedule and policies defined in the workspace.

Let’s look at the dataflow’s model.json metadata to see some of the details.

2019-11-04-07-00-30-025--Code

At the top of the file we can see the mashup definition, including the query names and load settings on lines 11 through 35 and the Power Query code for all of the entities on line 37. This will look awfully familiar from the last Power BIte post.

Things start to get interesting and different when we look at the entity definitions:

2019-11-04-07-04-23-519--Code

On line 80 we can see that the Product entity is defined as a ReferenceEntity, which is how the CDM metadata format describes what Power BI calles linked entities. Rather than having its attribute metadata defined in the current dataflow’s model.json file, it instead identifies the source entity it references, and the CDM folder in which the source entity is stored, similar to what we saw in the last example. Each modelId value for a linked entity references the id value in the referenceModels section as we’ll see below.

The Customers with Addresses entity, defined starting on line 93, is the calculated entity built in the video demo. This entity is a LocalEntity, meaning that its data is stored in the current CDM folder, and its metadata includes both the location, and its full list of attributes.

The end of the model.json file highlights the rest of the differences between local and linked entities.

2019-11-04-07-16-41-335--Code

At line 184 we can see the partitions for the Customers with Addresses entity, including the URL for the data file backing this entity. Because the other entities are linked entities, their partitions are not defined in the current model.json.

Instead, the CDM folders where their data does reside are identified in the referenceModels section starting at line 193. The id values in this section match the modelId values for the model.json file, above, and the location values are the URLs to the model.json files that define the source CDM folders for the linked entities.

If this information doesn’t make sense yet, please hold on. We’ll have different values for the same attributes for other dataflow creation methods, and then we can compare and contrast them.

I guarantee[2] it will make as much sense as anything on this blog.


[1] New videos every Monday morning!

[2] Or your money back.

Power BIte: Creating dataflows with Power Query Online

This week’s Power BIte is the first in a series of videos[1] that present different ways to create new Power BI dataflows, and the results of each approach.

When creating a dataflow by defining new entities in Power Query Online, the final dataflow will have the following characteristics:

Attribute Value
Data ingress path Ingress via the mashup engine hosted in the Power BI service
Data location Data stored in the CDM folder defined for the newly created dataflow
Data refresh The dataflow is refreshed based on the schedule and policies defined in the workspace

Let’s look at the dataflow’s model.json metadata to see some of the details.

2019-10-27-10-13-08-592--Code

At the top of the file we can see the mashup definition, including the query names and load settings on lines 11 through 19 and the Power Query code for all of the entities on line 22.

2019-10-27-10-23-37-698--Code.png

At the bottom of the file we can see information about the refresh and storage.[2] Line 26 identifies the entity as a LocalEntity, which means that the entity’s data is physically stored in the current CDM folder.

Line 30 shows that the entity is fully refreshed rather than incrementally refreshed, and line 31 shows the file name where the entity data is stored. Lines 97 through 99 identify the single partition where the data for the current version of the entity is stored, including the full URI for the data file. If this entity used incremental refresh, there would be multiple partitions to match the incremental refresh policy.

If this information doesn’t all make sense just yet, please hold on. We’ll have different values for the same attributes for other dataflow creation methods, and then we can compare and contrast them.

I guarantee[3] it will make as much sense as anything on this blog.


[1] New videos every Monday morning!

[2] The same information is also included starting on line 103 for the Promotions entity,  but is not pictured here.

[3] Or your money back.

Power BIte: Sharing and reuse with dataflows and datasets

Last week I kicked off the new BI Polar YouTube channel with a video and blog post comparing and contrasting Power BI dataflows and datasets. In the days that followed, I continued to hear questions that led me to believe I hadn’t done a great job answering one vital question:

When would I use dataflows, and when would I use shared datasets?

Here’s the short answer:

And here’s the long answer: Lego Bricks and the Spectrum of Data Enrichment and Reuse.

The video focuses on – and demonstrates – sharing and reuse with both dataflows and datasets. It’s short and to the point[1] and focuses on this one question.

The blog post takes a more conceptual view, using Lego bricks as a metaphor for dataflows and datasets and the types of reuse they enable.

If you’ve watched the videos and read the posts and you still have questions, please let me know.


[1] As short and to the point as anything you’re likely to get from me, anyway. Brevity has never been my forte.

Power BIte: Dataflows vs. datasets

This is still one of the most common dataflows questions: what’s the difference between Power BI dataflows and Power BI datasets?

For the last year I have resisted tackling this question head-on. This isn’t because it’s a bad or “dumb” question. Just the opposite – this is a very simple question, and the simpler a question is, the more complex and nuanced the answer is likely to be.

See how complex this is?
A graphical representation of the answer’s likely complexity.

If you’re a regular reader of this blog, you probably already know the answer, because I’ve answered it already. Sort of. The existing answer is distributed across dozens of posts, and if you’ve read all of them and picked up the answer along the way. But I keep hearing this question, and I keep thinking that there must be a more direct answer I could share.

Here it is, in a single, simple table[1].

Power BI dataflows Power BI datasets
Implementation CDM folder Analysis Services tabular model
Storage[2] CSV files Vertipaq
Metadata[3] Common Data Model – model.json BISM
Development Power Query Online Power Query in Power BI Desktop
Primary purpose[4] Data reuse Data analysis
Reuse[5] Acts as data source in multiple datasets Shared datasets across workspaces
Scope of reuse[6] Entity level reuse Dataset level reuse
Mashup with other data sources[7] Yes No
Used for reporting[8] Not directly Yes
Reuse outside Power BI[9] Yes, through ADLSg2 Yes, through XMLA
Data access methods[10] Import Import, DirectQuery
Connection methods[11] Import Live Connection
Row-level security No Yes
Certification and promotion Not yet Yes
What else am I missing? Please let me know! Seriously, you should let me know.

Update: I’ve added a few rows to the table after the post was originally published, to incorporate feedback from readers on differences I had missed. Thank you!

Each of the rows in this table could easily be an in-depth topic in and of itself, so if you’re looking at any of them and thinking “that’s not quite right” I might very well agree with you. There’s a lot of context and a lot of nuance here, and we’re trying to sum things up in a word or two… which is kind of the whole point.

Oh yeah, there’s a video too.[12]

I can’t wait to hear what you think!


[1] A simple table with ten footnotes.

[2] The storage aspect of dataflows and datasets is one of the most significant differences between the two. Datasets use the Vertipaq column store to load data into an optimized and highly compressed in-memory representation that is optimized for analysis. Dataflows use text files in folders, which are optimized for interoperability.

[3] The Analysis Services Tabular engine uses the BI Semantic Model (BISM) to represent its metadata. This is a metadata model originally included in SQL Server 2012 Analysis Services, and used by the Tabular engine ever since.

[4] Saying “this is the primary purpose” of any complex tool is fraught with risk, because no matter what you say, there are other valid things that remain unsaid. With this said… the big gap that dataflows close is that of self-service data preparation for the purpose of data sharing and reuse. Power BI has always had self-service data preparation through Power Query, but before dataflows the data that was prepared was “locked” in a dataset, for analysis, and not for sharing or reuse.

[5] Once you have loaded data into dataflows, authorized users can reuse entities from multiple dataflows, and use them as the building blocks for new dataflows or new datasets. Once you have loaded data into a dataset (and published it to the Power BI service) you can enable users to connect to it.

[6] With dataflows, users can pick and choose the entities they want, but a dataset can only be reused as-is.

[7] Dataflow entities can be used as data sources in the same Power BI Desktop file as other data sources, and can serve as part of a mashup or composite model, but a dataset can only be reused as-is.

[8] Although you can obviously use dataflows for reporting, you do so by first importing the data from the dataflow into a dataset.

[9] It’s interesting to point out that using your own organizational ADLSg2 account does not require Power BI Premium, but using the XMLA endpoint to connect to Power BI datasets from non-Power BI clients does.

[10] You can only import data into your dataflow entities, but tables in your dataset can import data or use DirectQuery, and a dataset can use a combination of the two.

[11] You can only import data from a dataflow into a dataset. When connecting to a shared dataset you can only use Live Connections.

[12] I’ve been thinking of making videos to supplement this blog for almost as long as I’ve been hearing the question that inspired this post. Please take a moment to share your thoughts on the video. This is something of a “soft launch” and although I have plans for a few dozen more videos already, your feedback will be a main factor in how the video series evolves.

Is self-service business intelligence a two-edged sword?

I post about Power BI dataflows a lot, but that’s mainly because I love them. My background in data preparation and ETL, combined with dataflows’ general awesomeness makes them a natural fit for my blog. This means that people often think of me as “the dataflows guy” even though dataflows are actually a small part of my role on the Power BI CAT team. Most of what I do at work is help large enterprise customers successfully adopt Power BI, and to help make Power BI a better tool for their scenarios[1].

As part of my ongoing conversations with senior stakeholders from these large global companies, I’ve noticed an interesting trend emerging: customers describing self-service BI as a two-edged sword. This trend is interesting for two main reasons:

  1. It’s a work conversation involving swords
  2. Someone other than me is bringing swords into the work conversation[2]

As someone who has extensive experience with both self-service BI and with two-edged swords, I found myself thinking about these comments more and more – and the more I reflected, the more I believed this simile holds up, but not necessarily in the way you might suspect.

This week in London I delivered a new presentation for the London Power BI User Group – Lessons from the Enterprise: Managed Self-Service BI at Global Scale. In this hour-long presentation I explored the relationship between self-service BI and two-edged swords, and encouraged my audience to consider the following points[4]:

  • The two sharp edges of a sword each serve distinct and complementary purposes.
  • A competent swordsperson knows how and when to use each, and how to use them effectively in combination.
  • Having two sharp edges is only dangerous to the wielder if they are ignorant of their tool.
  • A BI tool like Power BI, which can be used for both “pro” IT-driven BI and self-service business-driven BI has the same characteristics, and to use it successfully at scale an organization needs to understand its capabilities and know how to use both “edges” effectively in combination.

As you can imagine, there’s more to it than this, so you should probably watch the session recording.

ssbi and swords

If you’re interested in the slides, please download them here: London PUG – 2019-06-03 – Lessons from the Enterprise.

If you interested in the videos shown during the presentation, they’re included in the PowerPoint slides, and you can view them on YouTube here:

For those who are coming to the Microsoft Business Applications Summit next week, please consider joining the CAT team’s “Enterprise business intelligence with Power BI” full-day pre-conference session on Sunday. Much of the day will be deep technical content, but we’ll be wrapping up with a revised and refined version of this content, with a focus on building a center of excellence and a culture of data in your organization.

Update 2019-06-10: The slides from the MBAS pre-conference session can be found here: PRE08 – Enterprise business intelligence with Power BI – Building a CoE.

There is also a video of the final demo where Adam Saxton joined me to illustrate how business and IT can work together to effectively respond to unexpected challenges. If you ever wondered what trust looks like in a professional[5] environment, you definitely want to watch this video.

 


[1] This may be even more exciting for me than Power BI dataflows are, but it’s not as obvious how to share this in blog-sized pieces.

[2] Without this second point, it probably wouldn’t be noteworthy. I have a tendency to bring up swords more often in work conversations than you might expect[3].

[3] And if you’ve been paying attention for very long, you’ll probably expect this to come up pretty often.

[4] Pun intended. Obviously.

[5] For a given value of “professional.”