Important: This post was written and published in 2019, and the content below may no longer represent the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.
This week’s Power BIte is the fourth and final entry in a series of videos that present different ways to create new Power BI dataflows, and the results of each approach.
When creating a dataflow by attaching an external CDM folder, the dataflow will have the following characteristics:
|Data ingress path||Ingress via Azure Data Factory, Databricks, or whatever Azure service or app has created the CDM folder.|
|Data location||Data stored in ADLSg2 in the CDM folder created by the data ingress process.|
|Data refresh||The data is refreshed based on the execution schedule and properties of the data ingress process, not by any setting in Power BI.|
The key to this scenario is the CDM folder storage format. CDM folders provide a simple and open way to persist data in a data lake. Because CDM folders are implemented using CSV data files and JSON metadata, any application can read from and write to CDM folders. This includes multiple Azure services that have libraries for reading and writing CDM folders and 3rd party data tools like Informatica that have implemented their own CDM folder connectors.
CDM folders enable scenarios like this one, which is implemented in a sample and tutorial published on GitHub by the Azure data team:
- Create a Power BI dataflow by ingesting order data from the Wide World Importers sample database and save it as a CDM folder
- Use an Azure Databricks notebook that prepares and cleanses the data in the CDM folder, and then writes the updated data to a new CDM folder in ADLS Gen2
- Attach the CDM folder created by Databricks as an external dataflow in Power BI
- Use Azure Machine Learning to train and publish a model using data from the CDM folder
- Use an Azure Data Factory pipeline to load data from the CDM folder into staging tables in Azure SQL Data Warehouse and then invoke stored procedures that transform the data into a dimensional model
- Use Azure Data Factory to orchestrate the overall process and monitor execution
That’s it for this mini-series!
If all this information still doesn’t make sense yet, now is the time to ask questions.
 New videos every Monday morning!
 I added this bullet to the list because it fits in with the rest of the post – the other bullets are copied from the sample description.
3 thoughts on “Power BIte: Creating dataflows by attaching external CDM folders”
Hi Matthew! Thank you for and great job on this mini series! One point of clarification if you don’t mind, I believe in this last part, connecting an existing CDM folder it requires a tenant administrator to designate a single ADLS2 environment as the primary source prior to setup.
Point of clarification – We are unable to take ANY existing CDM folder and link it here as a dataflow, this aforementioned step must be completed first.
Am I correct here? Let me know. Thanks again.
Thanks for the feedback, and thanks for the question as well.
Your understanding is NOT correct – the requirements you’re describing are what you need to do to “go in the opposite direction” and save dataflow data into an ADLSg2 resource for consumption by Azure services. If you have an Azure service saving data into ADLSg2 for use by Power BI, the same requirements do not apply.
Check here for details – there’s a section on requirements: https://docs.microsoft.com/en-us/power-bi/service-dataflows-add-cdm-folder
Pingback: New resource: Generating CDM folders from Azure Databricks – BI Polar