Power BIte: Creating dataflows by attaching external CDM folders

This week’s Power BIte is the fourth and final entry in a series of videos[1] that present different ways to create new Power BI dataflows, and the results of each approach.

When creating a dataflow by attaching an external CDM folder, the dataflow will have the following characteristics:

Attribute Value
Data ingress path Ingress via Azure Data Factory, Databricks, or whatever Azure service or app has created the CDM folder.
Data location Data stored in ADLSg2 in the CDM folder created by the data ingress process.
Data refresh The data is refreshed based on the execution schedule and properties of the data ingress process, not by any setting in Power BI.

The key to this scenario is the CDM folder storage format. CDM folders provide a simple and open way to persist data in a data lake. Because CDM folders are implemented using CSV data files and JSON metadata, any application can read from and write to CDM folders. This includes multiple Azure services that have libraries for reading and writing CDM folders and 3rd party data tools like Informatica that have implemented their own CDM folder connectors.

CDM folders enable scenarios like this one, which is implemented in a sample and tutorial published on GitHub by the Azure data team:

  • Create a Power BI dataflow by ingesting order data from the Wide World Importers sample database and save it as a CDM folder
  • Use an Azure Databricks notebook that prepares and cleanses the data in the CDM folder, and then writes the updated data to a new CDM folder in ADLS Gen2
  • Attach the CDM folder created by Databricks as an external dataflow in Power BI[2]
  • Use Azure Machine Learning to train and publish a model using data from the CDM folder
  • Use an Azure Data Factory pipeline to load data from the CDM folder into staging tables in Azure SQL Data Warehouse and then invoke stored procedures that transform the data into a dimensional model
  • Use Azure Data Factory to orchestrate the overall process and monitor execution

That’s it for this mini-series!

If all this information still doesn’t make sense yet, now is the time to ask questions.


[1] New videos every Monday morning!

[2] I added this bullet to the list because it fits in with the rest of the post – the other bullets are copied from the sample description.

2 thoughts on “Power BIte: Creating dataflows by attaching external CDM folders

  1. Nick Anselmo

    Hi Matthew! Thank you for and great job on this mini series! One point of clarification if you don’t mind, I believe in this last part, connecting an existing CDM folder it requires a tenant administrator to designate a single ADLS2 environment as the primary source prior to setup.

    Point of clarification – We are unable to take ANY existing CDM folder and link it here as a dataflow, this aforementioned step must be completed first.

    Am I correct here? Let me know. Thanks again.

    Like

    1. Thanks for the feedback, and thanks for the question as well.

      Your understanding is NOT correct – the requirements you’re describing are what you need to do to “go in the opposite direction” and save dataflow data into an ADLSg2 resource for consumption by Azure services. If you have an Azure service saving data into ADLSg2 for use by Power BI, the same requirements do not apply.

      Check here for details – there’s a section on requirements: https://docs.microsoft.com/en-us/power-bi/service-dataflows-add-cdm-folder

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s