Life and work have been getting the best of me this month, and I haven’t found the time to keep up on blogging and video now that my series on building a data culture has wrapped up. I’ve been watching the dataflows and Power Query teams releasing all sorts of exciting new capabilities, and realizing that I’m not going to be writing about them in a timely manner.
Laura is a Microsoft MVP whose “Hat Full of Data” blog has become one of my favorite morning reads, and whose YouTube channel seems to include all of the videos I’ve been thinking about making, but not actually finding the time to make them.
Like this one on the new Power Query Online diagram view that is now available in public preview for dataflows in Power BI:
If you’ve been waiting for new dataflows content, you should definitely head over to Laura’s blog today to check out the awesome work she’s been doing.
I hope to be writing more regularly in December after my work-related “crunch mode” has passed, but if 2020 has taught me anything it’s that I have no idea what’s waiting around the corner. In the meantime, you should follow Laura, because she’s doing awesome work.
 Or the spare creative mental energies, which seem to be in sparser supply than spare minutes and hours.
 If 2020 has taught me anything, it’s that 2020 has taught me nothing.
There are over 20 articles in this new resource, and the content they contain answers many of the questions I hear most frequently. For some reason WordPress isn’t letting me upload the screen shot I took of the content outline, so instead here is a picture of me when I first learned about this new content:
Please check out this new information, and please submit feedback to the documentation team using the feedback form on each page. When you let the docs team know what you like, what you don’t like, or what is missing, they read every comment!
But how do you do this in as secure a manner as possible, so that the right users have the minimum necessary permissions on the right data?
The short answer is that you let the data source handle secure access to the data it manages. ADLSg2 has a robust security model, which supports both Azure role-based access control (RBAC) and POSIX-like access control lists (ACLs).
The longer answer is that this robust security model may make it more difficult to know how to set up permissions in the data lake to meet your analytics and security requirements.
Earlier this week I received a question from a customer on how to get Power BI to work with data in ADLSg2 that is secured using ACLs. I didn’t know the answer, but I knew who would know, and I looped in Ben Sack from the dataflows team.Ben answered the customer’s questions and unblocked their efforts, and he said that I could turn them into a blog post. Thank you, Ben!
Here’s what you should know:
1 – If you’re using ACLs, you must at least specify a filesystem name in the URL to load in the connector (or if you access ADLS Gen2 via API or any other client).
Update: James Baker, a Program Manager on the Azure Storage team has published on GitHub a PowerShell script to recursively set ACLs. Thanks to Simon for commenting on this post to make me aware of it, Josh from the Azure support team for pointing me to the GitHub repo, and of course to James for writing the actual script!
 This description is copied directly from the ADLSg2 documentation, which you should also read before acting on the information in this post.
 Disclaimer: This post is basically me using my blog as a way to get Ben’s assistance online so more people can get insights from it. If the information is helpful, all credit goes to Ben. If anything doesn’t work, it’s my fault. Ok, it may also be your fault, but it’s probably mine.
 This one is very important to know before you begin, even though it may be #3 on the list.
 This is a best practice pretty much everywhere, not just here.
INTERIOR: pan over cubicles of happy, productive office workers
CLOSE-UP: office worker at desk
NARRATOR: Is that Susie I see, giving Power Platform dataflows a try?
SUSIE: That’s right! With dataflows I can have all of the data I need, right where I need it!!
NARRATOR: Dataflows. They’re not just for Power BI anymore.
OK, you may not remember that orange juice ad campaign from the late 1970s and early 80s, but I’ve had it stuck in my head since I started working on this post and video. I couldn’t figure out how to work it into the video itself, so here it is in written form.
Anyway, with that awkward moment behind us, you probably want to watch the video. Here is it:
Power Platform dataflows use Power Query Online – and the same set of connectors, gateways, and transformation capabilities as Power BI dataflows. But there are a few key differences that are worth emphasizing.
Power Platform dataflows can load data into the Common Data Service, either into the standard Common Data Model entities used by Dynamics 365 apps, or into custom entities used by custom Power Apps. This is important – this makes dataflows more like a traditional ETL tool like SSIS data flows in that at the end of the dataflow creation process you can map the columns in your queries to the columns in these existing tables.
Power Platform dataflows can load data into ADLSg2 for analytical scenarios, but Power Apps doesn’t have the same concept of “built-in storage” that Power BI does. That means if you want to use Power Platform dataflows to create CDM folders, you must configure your Power Apps environment to use an ADLSg2 resource in your Azure subscription.
The “link to data lake” feature in Power Apps feels to me like a better integration experience than what’s currently available in Power BI. In Power Apps you define the link at the environment level, not the tenant level – this provides more flexibility, and enables non-tenant admins to configure and use the data lake integration.
The first time you create a Power Platform dataflow and select the “analytical entities” option, you’ll be prompted – and required – to link the Power Apps environment to an Azure Data Lake Storage resource. You’ll need to have an Azure subscription to use, but the process itself if pretty straightforward.
I can’t wait to hear what you think of this new capability. Please let me know in the comments or via Twitter.
See you in the new year!
 I just realized that this was 40 years ago. Were you even born yet?
 CDS entities aren’t tables by the strictest definition, but it’s close enough for our purposes today.
 I honestly don’t know enough about Power Apps security to go into too much depth on this point, but I am not a Power Apps admin and I was able to create a trial environment and link it to my own ADLSg2 resource in my own Azure subscription without affecting other users.
The enhanced compute engine in Power BI dataflows has been in preview since June. It’s not really new, and I’ve posted about it before. But I still keep hearing questions about it, so I thought it might make sense to record a video.
I won’t go into too much more depth here – just watch the video, and if you want more details check out one of these existing posts:
 Also, I’m behind on my video schedule – this was a motivating factor as well. November was an unexpectedly busy month, and between work, life, and not really having the video editing skills I need to keep to a schedule… Yeah.
At this point I’ve said “Power BI dataflows enable reuse” enough times that I feel like a broken record. What does this phrase actually mean, and how can you take advantage of dataflows to enable reuse in your Power BI applications?
This Power BIte video is a bit longer than its predecessors, and part of this is because it covers both the problem and the solution.
The problem is that self-service BI applications often start out as one-off efforts, but don’t stay that way. At least in theory, if the problem solved by the application was widespread and well understood, there would be an existing solution already developed and maintained by IT, and business users wouldn’t need to develop their own solutions.
Successful applications have a tendency to grow. For self-service BI, this could mean that more and more functionality gets added to the application, or it could mean that someone copies the relevant portions of the application and uses them as the starting point for a new, different-but-related, application.
Once this happens, there is a natural and gradual process of drift that occurs, as each branch of the application tree grows in its own direction. A predictable consequence of this drift in Power BI applications is that query definitions that start off as common will gradually become out of sync, meaning that “the same data” in two datasets will actually contain different values.
Moving queries that need to be shared across multiple applications from multiple datasets into a single dataflow is a simple and effective solution to this problem. There’s no dedicated tooling for this solution in Power BI today, but the steps are still simple and straightforward.
P.S. This is the first Power BIte video recorded in my home office. After struggling unsuccessfully to get decent audio quality in my office at work, I’m trying out a new environment and some new tools. I know there’s still work to be done, but hopefully this is a step in the right direction. As always, I’d love to know what you think…
 For my younger readers, this phrase is a reference to when Spotify used to be called “records” and the most common service outage symptom was a repeat of the audio buffer until the user performed a hard reset of the client application. True story.
 Is there a better term for this? I feel like there should be an existing body of knowledge that I could reference, but my searching did not yield any promising results. The fact that “Logical Drift” is the name of a band probably isn’t helping.
Miguel’s profile picture is even older than Matthew’s profile picture
All snark aside, Miguel and the whole dataflows team have been awfully busy, and it’s great to see their work available to Power BI authors. I won’t attempt to repeat what’s in the announcement, but I will highlight the new capabilities that have me most excited:
Support for data profiling in Power Query Online – we’ve had this in Power BI Desktop for a while, but it’s just as important for dataflows as it is for datasets.
Better support for files and folders – a lot of the data I play with these days is in folders full of text files, and Power Query Online hasn’t had the best experience for working with this type of data.
Better support for query parameters – there are lots of scenarios where having parameterized queries makes working with dataflows easier, and now Power Query Online makes it easier to work with query parameters.
Do yourself a favor and check out the whole list. Odds are there’s something you’ve been waiting for that will excite you as much as these new capabilities excite me.
And I can’t wait to hear what they are…
 No, I don’t believe that’s possible either, but it is nice to see that you’ve been paying attention.
 Very little of my actual work involves data prep these days, so I need to find data to play with to avoid getting too bored.
Back in July the Power BI team announced the availability of a new connector for Azure Data Lake Storage Gen2.
In recent weeks I’ve been starting to hear questions that sound like these:
Question: Is this ADLSg2 connector how you get to the data behind dataflows?
Answer: No. Dataflows are how you get to the data behind dataflows.
Question: Is this how I can access dataflows if I don’t use Power BI Premium?
Answer: No. Dataflows are not a Premium-only feature.
Question: Can I use the ADLSg2 connector to work with CDM folder data?
Answer: Yes, but why would you?
If your data is already in CDM folders, using the ADLSg2 connector simply adds effort to consuming it in Power BI. You’ll be working with raw, untyped text files instead of working with strongly typed entities.
If your ADLSg2 data is already in CDM folders, strongly consider attaching the CDM folder as a dataflow. This means less up-front work for you, and less ongoing work for the users who need to get insights from the data.
Question: Why do we need an ADLSg2 connector if we have dataflows?
Answer: Now that is a good question!
Power BI dataflows store their data in CDM folder format, and they can be configured to store those CDM folders in your organization’s ADLSg2 data lake. In addition to this, you can attach a CDM folder in ADLSg2 as an external dataflow, making its data available to Power BI users even though the data ingress is taking place through another tool like Azure Data Factory.
But ADLSg2 is much, much more than a repository for dataflows or CDM folders. ADLSg2 supports all sorts of file and blob data, not just CDM folders. And sometimes you need to work with that data in Power BI.
The ADLSg2 connector exists for these scenarios, when your data is not stored in CDM folders. With this connector, users in Power BI Desktop can connect to ADLSg2 resources and work with the files they contain, similar to the existing HDFS and Folder connectors.
 Yes, this is another catch-up post that has been waiting to be finished. No, I do not have any reason to believe that 2020 will be any more forgiving than 2019 has been.
 I could have linked to the product documentation or the official product page, but I believe that Melissa‘s blog does the best job summing up ADLSg2 in a single post.
This week’s Power BIte is the first in a series of videos that present different ways to create new Power BI dataflows, and the results of each approach.
When creating a dataflow by defining new entities in Power Query Online, the final dataflow will have the following characteristics:
Data ingress path
Ingress via the mashup engine hosted in the Power BI service
Data stored in the CDM folder defined for the newly created dataflow
The dataflow is refreshed based on the schedule and policies defined in the workspace
Let’s look at the dataflow’s model.json metadata to see some of the details.
At the top of the file we can see the mashup definition, including the query names and load settings on lines 11 through 19 and the Power Query code for all of the entities on line 22.
At the bottom of the file we can see information about the refresh and storage. Line 26 identifies the entity as a LocalEntity, which means that the entity’s data is physically stored in the current CDM folder.
Line 30 shows that the entity is fully refreshed rather than incrementally refreshed, and line 31 shows the file name where the entity data is stored. Lines 97 through 99 identify the single partition where the data for the current version of the entity is stored, including the full URI for the data file. If this entity used incremental refresh, there would be multiple partitions to match the incremental refresh policy.
If this information doesn’t all make sense just yet, please hold on. We’ll have different values for the same attributes for other dataflow creation methods, and then we can compare and contrast them.
I guarantee it will make as much sense as anything on this blog.
 New videos every Monday morning!
 The same information is also included starting on line 103 for the Promotions entity, but is not pictured here.
This is still one of the most common dataflows questions: what’s the difference between Power BI dataflows and Power BI datasets?
For the last year I have resisted tackling this question head-on. This isn’t because it’s a bad or “dumb” question. Just the opposite – this is a very simple question, and the simpler a question is, the more complex and nuanced the answer is likely to be.
If you’re a regular reader of this blog, you probably already know the answer, because I’ve answered it already. Sort of. The existing answer is distributed across dozens of posts, and if you’ve read all of them and picked up the answer along the way. But I keep hearing this question, and I keep thinking that there must be a more direct answer I could share.
Here it is, in a single, simple table.
Power BI dataflows
Power BI datasets
Analysis Services tabular model
Common Data Model – model.json
Power Query Online
Power Query in Power BI Desktop
Acts as data source in multiple datasets
Shared datasets across workspaces
Scope of reuse
Entity level reuse
Dataset level reuse
Mashup with other data sources
Used for reporting
Reuse outside Power BI
Yes, through ADLSg2
Yes, through XMLA
Data access methods
Certification and promotion
What else am I missing?
Please let me know!
Seriously, you should let me know.
Update: I’ve added a few rows to the table after the post was originally published, to incorporate feedback from readers on differences I had missed. Thank you!
Each of the rows in this table could easily be an in-depth topic in and of itself, so if you’re looking at any of them and thinking “that’s not quite right” I might very well agree with you. There’s a lot of context and a lot of nuance here, and we’re trying to sum things up in a word or two… which is kind of the whole point.
Oh yeah, there’s a video too.
I can’t wait to hear what you think!
 A simple table with ten footnotes.
 The storage aspect of dataflows and datasets is one of the most significant differences between the two. Datasets use the Vertipaq column store to load data into an optimized and highly compressed in-memory representation that is optimized for analysis. Dataflows use text files in folders, which are optimized for interoperability.
 The Analysis Services Tabular engine uses the BI Semantic Model (BISM) to represent its metadata. This is a metadata model originally included in SQL Server 2012 Analysis Services, and used by the Tabular engine ever since.
 Saying “this is the primary purpose” of any complex tool is fraught with risk, because no matter what you say, there are other valid things that remain unsaid. With this said… the big gap that dataflows close is that of self-service data preparation for the purpose of data sharing and reuse. Power BI has always had self-service data preparation through Power Query, but before dataflows the data that was prepared was “locked” in a dataset, for analysis, and not for sharing or reuse.
 Once you have loaded data into dataflows, authorized users can reuse entities from multiple dataflows, and use them as the building blocks for new dataflows or new datasets. Once you have loaded data into a dataset (and published it to the Power BI service) you can enable users to connect to it.
 With dataflows, users can pick and choose the entities they want, but a dataset can only be reused as-is.
 Dataflow entities can be used as data sources in the same Power BI Desktop file as other data sources, and can serve as part of a mashup or composite model, but a dataset can only be reused as-is.
 Although you can obviously use dataflows for reporting, you do so by first importing the data from the dataflow into a dataset.
 It’s interesting to point out that using your own organizational ADLSg2 account does not require Power BI Premium, but using the XMLA endpoint to connect to Power BI datasets from non-Power BI clients does.
 You can only import data into your dataflow entities, but tables in your dataset can import data or use DirectQuery, and a dataset can use a combination of the two.
 You can only import data from a dataflow into a dataset. When connecting to a shared dataset you can only use Live Connections.
 I’ve been thinking of making videos to supplement this blog for almost as long as I’ve been hearing the question that inspired this post. Please take a moment to share your thoughts on the video. This is something of a “soft launch” and although I have plans for a few dozen more videos already, your feedback will be a main factor in how the video series evolves.