When you build a Power BI dataflow, lineage is built in. It’s not an added feature – it’s a fundamental aspect of how dataflows work. Instead of having an ETL process that performs the data movement and compute in one place and the data storage in another, dataflow entities are defined by a Power Query “M” query. The ETL logic in the query and the data storage are defined as a single unit. The result is that not only does a dataflow contain the data, it contains the full lineage about where the data came from and how it was transformed.
This lineage information is what makes possible the diagram view that’s available today. As shown above, the diagram view provides a workspace-level lineage view of all dataflows in the workspace, their relationships, and the data sources from which they extract data. – including data sources that are dataflows in other workspaces.
It’s worth emphasizing that this diagram view isn’t an editor per se – it’s not where you define lineage relationships, it’s where you can view, explore, and understand the lineage relationships that are automatically created when you build your dataflows. At the same time, the diagram view does let you edit and manage the dataflows in your workspace in the same way as the list view.
The dataflows API lets developers build similar experiences and automated processes. Specifically, the Get Dataflow API returns the model.json file that contains all the dataflow metadata, including the M scripts for the queries that define the dataflow entities.
Of course, dataflows are just one part of a complete lineage and impact analysis story in Power BI. Ideally a Power BI workspace administrator would be able to see all data sources, dataflows, datasets, reports, and dashboards in a single view, and to easily navigate and manage the relationships and dependencies between them. Something like this:
At the Microsoft Business Applications Summit in June, Microsoft shared plans for lineage across all artifacts in a workspace. This represents a continuation of the current functionality available in dataflows and of the recently-announced shared and certified datasets.
With this upcoming functionality, users working with reports and dashboard can easily see where the data comes from, and how and where it has been transformed. I’m hesitant to call this the “holy grail” of business intelligence, but I’ve heard it described this was enough times that I wouldn’t argue too loudly if someone did. If I a dollar for every meeting I’ve been in where the agenda was derailed by and argument about whether the data was right… In any event, this is a very common problem, with few good solutions.
How are you using the lineage capabilities in dataflows today? Do you have processes that rely on the dataflows lineage UI or API? I’d love to hear about what you’re doing now and what you’re planning – let me know.
 Such as having a batch job, stored procedure, or SSIS Data Flow that loads into a SQL Server table or HDFS folder.
 If you’re not familiar with this aspect of Power BI dataflows, I’d recommend reading through these posts to catch up: Dataflows in Power BI: Overview Part 6 – Linked and Computed Entities, Lego Bricks and the Spectrum of Data Enrichment and Reuse, Dataflows in Power BI: Resources, and maybe Dataflows in Power BI.
 Then I could afford the upgraded hosting plans that WordPress keeps pushing at me…