Important: This post was written and published in 2018, and the content below no longer represents the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.
One of the exciting new preview capabilities in the October 2018 release of Power BI Desktop is support for data profiling in the Power Query editor. Having per-column data profile information available in the query editor is very useful to help understand the data you’re working with…
…but what about understanding data in a broader context?
The Power Query function language “M” contains a Table.Profile function that accepts a table as input and returns a table containing the data profile for the input table.[1] You can use this in Power BI Desktop, but the value of this, at least now that there is a data profiling UI, is limited in scope.
This is where dataflows can help.
Remember the Excel-like, automatically-updating capabilities of linked and computed dataflow entities?[2] The most common use case for linked entities is for data transformation, but with Table.Profile you can also use linked entities to collect, consolidate, and maintain data profile information for the data stored in dataflow entities.
And it’s surprisingly simple.
Start with a workspace[3] and a dataflow, and add linked entities to it for each of the entities you want to profile.
For each linked entity in the dataflow, perform the following steps:
- Right-click on the entity in the query editor and select “Reference” from the context menu to create a computed entity
- Rename the new computed entity to include the word “profile”
- Right-click on the renamed entity and select “Advanced editor” from the context menu
- In the advanced editor, add a new query step that uses the Table.Profile function
Like this:
The edited query is very simple, and because all of the edits made to one query will apply without modification to each of the other queries, once the first one is done it’s just a copy and paste for each new profile entity. You can make this easier by putting the comma at the beginning of the profile line, rather than at the end of the source line, but it will work either way.
let Source = Site ,Profile = Table.Profile(Source) // note the placement of the comma in Profile
When you’re done, you’ll have a dataflow that contains data profiles for each linked entity, regardless of the workspace in which the linked entity originated.
Best of all, because the data profiles are stored in Power BI dataflow entities, which are in turn persisted in CDM Folders in Azure Data Lake Storage gen2, they can be consumed and processed in any tool for further analysis.
One of the biggest challenges for data governance is having current and accurate metadata available for enterprise data assets. Data profiles are only one part of this, but they’re a significant part. Because of the nature of linked entities in Power BI, we can now have up-to-date column-level profiles for our data, and can have it without a major engineering effort, and without any complex orchestration or management.
Life is good.
[1] If I understand correctly, the new feature in Power BI Desktop uses this function.
[2] If you don’t, you should probably read this post before you continue.
[3] Remember: to use linked entities this needs to be a new “v2” workspace, and it needs to be backed by Power BI Premium dedicated capacity.
Pingback: Dataflows in Power BI – BI Polar
Pingback: Power BI Dataflows – Data Profiling Without Premium – BI Polar
Pingback: Quick Tip: Restricting access to linked entities in Power BI dataflows – BI Polar
Pingback: Exam DA-100: Analyzing Data with Microsoft Power BI – Sergio and Sabrina Zenatti