One key aspect of Power BI dataflows is that they use Azure Data Lake Storage gen2 for their data storage. As mentioned in part 1, the technology is not exposed to Power BI users. If you’re working in Power BI, a dataflow is just a collection of entities in a workspace, with data that can be reused. But if you’re trying to understand dataflows, it’s worth looking under the hood at some of the details.
Power BI stores dataflow data in a format known as CDM Folders. The “CDM” part stands for Common Data Model and the “Folder” part… is because they’re folders, with files in them.
Each CDM folder is a simple and self-describing structure. The folder contains one or more CSV files for each entity, plus a JSON metadata file. Having the data in a simple and standard format like CSV means that it is easy for any application or service to read the data. Having a JSON metadata file in the folder to describe its contents means that any consumer can read the JSON to easily understand the contents and their structure.
The JSON metadata file contains:
- The names and locations of all files in the folder.
- Entity metadata, including names, descriptions, attributes, data types, last modified dates, and so on.
- Lineage information for the entities – specifically, the Power Query “M” query that defines the entity
- Information about how each entity conforms (or does not conform) to Common Data Model standard entity schemas.
If you’re interested in seeing this for yourself, the JSON metadata for a dataflow can be exported from the Power BI portal. Just select “export JSON” from the menu in the dataflows list.
You don’t need to know any of this to use dataflows in Power BI. But if you’re interested in getting the most from dataflows in your end-to-end data architecture, there’s no time like the present to see how things work.
 The Common Data Model is a bigger topic than we’ll cover here, but if you’re interested in an introduction, you can check out this excellent session from Microsoft Ignite 2018.
 For simple scenarios, each entity will be backed by a single text file. For more complex scenarios involving partitioned data or incremental refresh, there will be one file per partition.
 Please note that in the default configuration, Power BI manages the underlying storage, and it is not available to other applications or services, so this won’t do you all that much good to start off. Power BI will, however, provide integration with Azure that will make this important and valuable.