You probably already know that Power BI dataflows store their data in CDM folders. But what does this actually mean?

This is a quick post to share information that I hope will answer some of the most common questions that I hear from time to time, and which I discuss when I present on Power BI dataflows integration with Azure. I don’t believe any of the information in this post is new or unique[1], but I do believe it is delivered in a more targeted manner that might help.
Point #1: CDM is a metadata system
The Common Data Model is a metadata system that simplifies data management and application development by unifying data into a known form and applying structural and semantic consistency across multiple apps and deployments. If you’re coming from a SQL Server background, it may help to think about CDM as the “system tables” for data that’s stored in multiple locations and formats. This analogy doesn’t hold up to particularly close inspection, but it’s a decent place to start.
Point #2: CDM includes standard entity schemas
In addition to being a metadata system, the Common Data Model includes a set of standardized, extensible data schemas that Microsoft and its partners have published. This collection of predefined schemas includes entities, attributes, semantic metadata, and relationships. The schemas represent commonly used concepts and activities, such as Account and Campaign, to simplify the creation, aggregation, and analysis of data.
Point #3: CDM folders are data storage that use CDM metadata
A CDM folder is a folder in a data lake that conforms to specific, well-defined, and standardized metadata structures and self-describing data. These folders facilitate metadata discovery and interoperability between data producers and data consumers.
CDM folders store metadata in a model.json file; this is what makes them self-describing. This metadata conforms to the CDM metadata format, and can be read by any client application or code that knows how to work with CDM.
Point #4: You don’t need to use any standard entities
The most common misconception I hear about CDM and CDM folders is that you only use them when you’re storing “standard data.” This is not correct. The data in a CDM entity may map to a standard entity schema, but for 99% of the entities I have built or used, this is not the case. There is nothing in CDM or CDM folders that requires you to use a standard schema.
I hope this helps – please let me know if you have questions!
[1] Check out the documentation for CDM and CDM folders here and here, and here for more detail. You’ll probably notice that some chunks of text in this post were simply copied from that documentation.
Pingback: Power BIte: Creating dataflows by attaching external CDM folders – BI Polar
Hello Matthew.
Thank you for the great article.
I have one question that looks like important for me.
You mention about the “collection of predefined schemas includes entities, attributes, semantic metadata, and relationships”. Everything is clear to me except relationships. What role do the relationships play in the CDM? It is not visible as entities and attributes… Are those relationships standard and static? As I understand they connect all entities in the internal CDM datawarehouse? If so what happens if some of my custom datasource tables don’t fit any of standard CDM entities, in that case as I understand this table is separated from the CDM datawerehouse since it doesn’t have any relationships ?
Thank you in advance!
LikeLike
Short answer: I think the key here is “standardized, extensible data schemas.” The relationships are part of the “reference metadata” and can be used or disregarded as needed. Since this is metadata only, the data store (CDS, ADLSg2, etc.) will have its own representation to implement the metadata, including the relationship.
Longer answer: I think that Point #4 (“You don’t need to use any standard entities”) is the point you want to focus on, not Point #3. The use of any of these reference entities is 100% optional.
LikeLike
Then what could be the benefits of using standard entities at all? Are there any cases those standard entities are useful?
I know that “These folders facilitate metadata discovery and interoperability between data producers and data consumers.”, but this a mechanism only, what are the outputs for developers and users?
LikeLike
For that I would look in the docs, not in this specific post.
https://docs.microsoft.com/en-us/common-data-model/#why-use-the-common-data-model
LikeLike
Thank you, Matthew!
LikeLike
Pingback: New resource: Generating CDM folders from Azure Databricks – BI Polar
Pingback: Dataflows enhanced compute engine – will it fold? – BI Polar