4 thoughts on “Power BI workspace lineage

  1. How would you decide how big or how small to make each artifact in the lineage, in terms of the amount of transformations taking place inside the artifact? In my case they would only be shared with 2-3 other users.

    For instance I could go all out and have every step that would previously take place in a query editor result in a new link in the data lineage chain, but that would probably be overkill.

    Like

    1. Thanks for the question!

      I agree that “one step per dataflow” would be overkill, but beyond that the answer is largely “it depends.”

      The approach I generally take is to break the end to end data preparation down into blocks that look like this:

      1. Staging – getting the source data into the system (in this case dataflow, but could be data mart, data warehouse, data lake, etc.) with zero or minimal transformations
      2. Cleansing – correcting known data quality and format problems from the staged data
      3. Transformation 1 – getting the cleansed data into the shape required for intended downstream purposes
      4. Enrichment – adding data from other sources, which have ideally already gone through steps 1 through 3
      5. Transformation 2 – getting the cleansed and enriched data into the shape required for analysis, typically as dimensions and facts

      These guidelines tend to create a moderate number of easily maintainable entities.

      I feel like I’m dating myself with this link, but I definitely recommend looking at the Kimball Group’s techniques for data warehousing and BI: https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/. Ralph Kimball and his amazing team know more about this stuff than I will ever forget (or something like that) and there’s a huge volume of guidance available.

      This feels like it has become a blog post…

      Like

  2. What an amazing feature! Data lineage is so hot right now.

    I just tested it and noted a few odd things:

    (a) the view still renders unused data sources, e.g., if a dataflow against data source A is completely switched to data source B, then A will still appear. Also, we’re unable to remove credentials for the “Data source credentials” screen – I assume these two challenges are related; and

    (b) the view does not render artifacts outside the current workspace.

    Like

  3. Pingback: Quick Tip: Factoring your dataflow entities – BI Polar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s