Important: This post was written and published in 2019, and the content below may no longer represent the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.
When you build a Power BI dataflow, lineage is built in. It’s not an added feature – it’s a fundamental aspect of how dataflows work. Instead of having an ETL process that performs the data movement and compute in one place and the data storage in another[1], dataflow entities are defined by a Power Query “M” query. The ETL logic in the query and the data storage are defined as a single unit[2]. The result is that not only does a dataflow contain the data, it contains the full lineage about where the data came from and how it was transformed.
Update: The user experience shown in this post is from July 2019, and is no longer current. For a more current view of lineage in Power BI workspaces, please see this post and video from October 2019, and the Power BI documentation.
This lineage information is what makes possible the diagram view that’s available today. As shown above, the diagram view provides a workspace-level lineage view of all dataflows in the workspace, their relationships, and the data sources from which they extract data. – including data sources that are dataflows in other workspaces.
It’s worth emphasizing that this diagram view isn’t an editor per se – it’s not where you define lineage relationships, it’s where you can view, explore, and understand the lineage relationships that are automatically created when you build your dataflows. At the same time, the diagram view does let you edit and manage the dataflows in your workspace in the same way as the list view.
The dataflows API lets developers build similar experiences and automated processes. Specifically, the Get Dataflow API returns the model.json file that contains all the dataflow metadata, including the M scripts for the queries that define the dataflow entities.
Of course, dataflows are just one part of a complete lineage and impact analysis story in Power BI. Ideally a Power BI workspace administrator would be able to see all data sources, dataflows, datasets, reports, and dashboards in a single view, and to easily navigate and manage the relationships and dependencies between them. Something like this:
At the Microsoft Business Applications Summit in June, Microsoft shared plans for lineage across all artifacts in a workspace. This represents a continuation of the current functionality available in dataflows and of the recently-announced shared and certified datasets.
With this upcoming functionality, users working with reports and dashboard can easily see where the data comes from, and how and where it has been transformed. I’m hesitant to call this the “holy grail” of business intelligence, but I’ve heard it described this was enough times that I wouldn’t argue too loudly if someone did. If I a dollar for every meeting I’ve been in where the agenda was derailed by and argument about whether the data was right[3]… In any event, this is a very common problem, with few good solutions.
How are you using the lineage capabilities in dataflows today? Do you have processes that rely on the dataflows lineage UI or API? I’d love to hear about what you’re doing now and what you’re planning – let me know.
[1] Such as having a batch job, stored procedure, or SSIS Data Flow that loads into a SQL Server table or HDFS folder.
[2] If you’re not familiar with this aspect of Power BI dataflows, I’d recommend reading through these posts to catch up: Dataflows in Power BI: Overview Part 6 – Linked and Computed Entities, Lego Bricks and the Spectrum of Data Enrichment and Reuse, Dataflows in Power BI: Resources, and maybe Dataflows in Power BI.
[3] Then I could afford the upgraded hosting plans that WordPress keeps pushing at me…
Pingback: Dataflows in Power BI – BI Polar
The data lineage developments are exciting. A large portion of our datasets come from views in SQL/oracle or SAP queries, both of which will have transformations prior to the dataflow lineage path, I believe. I’m really hoping the development plans will in the future provide:
-visibility to a builder consuming dataflows to see the lineage from the dataflow connector in power bi desktop. that would help the builder know the lineage path at the time needed rather than have them open the workspace to review
-visibility to the models/builds that take advantage of this dataflow. To me, that is currently where we are somewhat lacking in visibility. “If I change this dataflow, which models are about to fail refresh?”
LikeLike
That second hope is one of the capabilities shown in the MBAS video this post references. I can’t wait until there’s an internal environment available for me to start kicking the tires…
LikeLike
Hi Mathew, great post. We need to be able to track lineage of dataflows the other way i.e. which datasets are they being used in, from which workspace. Similar into how we can track the lineage of datasets across workspaces/reports/dashboards. This helps to manage the sharing of data across different teams/departments. The lineage features coming look good.
LikeLike
Thanks Lee – I definitely agree that there’s a need for comprehensive end-to-end lineage. We showed off some demos of workspace-level lineage at MBAS in June, and I’ve been playing with some of the preview capabilities since then. I think you’ll be pretty happy once they’re more broadly available.
Although I’ll have to deprecate this post…;-)
LikeLike
Pingback: Governing Power BI just got a little easier – BI Polar
Pingback: Power BI workspace lineage – BI Polar
It seems that this option is no longer available with the new interface of PowerBi service. The lineage between dataflow and dataset and not shown any more…
Is there a simple way to view the connections of dataset to dataflow?
LikeLike
Hey Quentin – I’ve updated this old post to include a link to the newer post that shows where lineage capabilities have moved.
LikeLike
Thanks Metthew,
Unfortunatly i’m still incapable of viewing a lineage between a dataflow and the dataset created after importation in a report…
We’ll look deeper into it and try to see where i get things wrong…
LikeLike
Thanks for the helpful article. I also recommend reading this article for those who want to deepen their understanding of Power BI Dataflows https://skyvia.com/blog/etl-in-power-bi#skyvia. It explains how easy to do ETL with Power BI Dataflows.
LikeLike