Quick Tip: Creating “data workspaces” for dataflows and shared datasets

Power BI is constantly evolving – there’s a new version of Power BI Desktop every month, and the Power BI service is updated every week. Many of the new capabilities in Power BI represent gradual refinements, but some are significant enough to make you rethink how you your organization uses Power BI.

Power BI dataflows and the new shared and certified datasets[1] fall into the latter category. Both of these capabilities enable sharing data across workspace boundaries. When building a data model in Power BI Desktop you can connect to entities from dataflows in multiple workspaces, and publish the dataset you create into a different workspace altogether. With shared datasets you can create reports and dashboards in one workspace using a dataset in another[2].

The ability to have a single data resource – dataflow or dataset – shared across workspaces is a significant change in how the Power BI service has traditionally worked. Before these new capabilities, each workspace was largely self-contained. Dashboards could only get data from a dataset in the same workspace, and the tables in the dataset each contained the queries that extracted, transformed, and loaded their data. This workspace-centric design encouraged[3] approaches where assets were grouped into workspaces because of the platform, and not because it was the best way to meet the business requirements.

Now that we’re no longer bound by these constraints, it’s time to start thinking about having workspaces in Power BI whose function is to contain data artifacts (dataflows and/or datasets) that are used by visualization artifacts (dashboards and reports) in other workspaces. It’s time to start thinking about approaches that may look something like this:

data centric workspaces

Please keep in mind these two things when looking at the diagram:

  1. This is an arbitrary collection of boxes and arrows that illustrate a concept, and not a reference architecture.
  2. I do not have any formal art training.

Partitioning workspaces in this way encourages reuse and can reduce redundancy. It can also help enable greater separation of duties during development and maintenance of Power BI solutions. If you have one team that is responsible for making data available, and another team that is responsible for visualizing and presenting that data to solve business problems[4], this approach can given each team a natural space for its work. Work space. Workspace. Yeah.

Many of the large enterprise customers I work with are already evaluating or adopting this approach. Like any big change it’s safer to approach this effort incrementally. The customers I’ve spoken to are planning to apply this pattern to new solutions before they think about retrofitting any existing solutions.

Once you’ve had a chance to see how these new capabilities can change how your teams work with Power BI, I’d love to hear what you think.

Edit 2019-06-26: Adam Saxton from Guy In A Cube has published a video on Shared and Certified datasets. If you want another perspective on how this works, you should watch it.


[1] Currently in preview: blog | docs.

[2] If you’re wondering how these capabilities for data reuse relate to each other, you may want to check out this older post, as the post you’re currently reading won’t go into this topic: Lego Bricks and the Spectrum of Data Enrichment and Reuse.

[3] And in some cases, required.

[4] If you don’t, you probably want to think about it. This isn’t the only pattern for successful adoption of Power BI at scale, but it is a very common and tested pattern.

9 thoughts on “Quick Tip: Creating “data workspaces” for dataflows and shared datasets

  1. Jeff Weir

    Maybe using cross-workspace dataflows also give the ability to better target/simplify who can see what, compared with setting up seperate workspaces to handle all the different permutations? Or to make global changes regarding what tier of management can see what data?

    So for example, without shared dataflows, you would possibly need to set up a whole bunch of different workspace permutations in order to handle all the different combinations that a certain type of user might need to have access to, requiring end users to visit workspace 1 for widely shared data, workspace 2 for more restricted, and workspace 3 for highly confidential. And they have to remember where to find what.

    But now, datasets can be ‘clustered’ or ‘triaged’ into different management tier ‘views’, meaning one type of manager can access their key reports from one workspace, and not have to visit multiple. Or in other words, there are now different baskets of data/security available.

    For example, both Jeff and Matthew might be allowed to see datasets A & B via accessing workspace 1. Phil is allowed to see A, B and C over at workspace 2. Jeff’s entire management tranche gets demoted on account of a catching case of bad speling, and as a result the decision is made to remove access to dataset B for all Jeffs as a result.

    Previously, I take it that IT would have to set up a new workspace (workspace 3) for all the Jeffs, put a duplicate of report A into it, and then unassign all the Jeffs from workspace 1

    But now, each management tranche get just the one workspace, and instead dataset B is simply moved from say the first box on the LHS in your diagram above to the bottom box on the LHS.

    Like

  2. Julio Granados

    Hi Matthew,
    First of all thanks for all your posts! I consider myself a great dataflow enthusiast! , about the article I think this is a great approach but there are some doubts that come to me, especially when I recommend my clients to start working with dataflows, more and more people want to venture.

    I know that many times you have supported the idea Dataflows does not replace an ETL tool such as Data Factory (so far 100% agree with you), but after seeing all the talks of the Pass Summit it seems that the way they want to take is another, if they remove restriction of only batch origins. What is needed to generate that layer of transformation of the data?

    I miss in the power bi service to be able to work with environments to have versions of the “ETL” and that the solution is different to working with the same folders with prefixes such as “DEV_”, “PRO_”.

    Another thing that worries me about working with shared datasets that are not yet integrated into the power bi API and when we work as service administrators is difficult to automate the deployment of different solutions.

    In summary, this has more and more potential but I do not see an organized growth or simply a clear goal with dataflows ETL or No ETL? 😦

    Forgive a comment so long and really thank you for the effort taking this blog that is one of my favorites!

    Like

    1. Thanks for the comment and for the feedback. As your comment implies, this is a more complicated problem domain than any one approach will effectively cover. I believe that dataflows can be part of a mature approach to managed self-service BI, but the details will depend on the needs and goals of each organization.

      Like

  3. Rick

    Hi Matthew. Thank you for the great article!
    You said: “It can also help enable greater separation of duties during development and maintenance of Power BI solutions.”. Could you please describe how it works in practice? Since it is not clear for me how shared dataset solves this…

    Like

    1. Thanks for the question!

      The short answer is that you can have one person or team create and maintain the data model as a shared dataset, and then have other people and teams create and maintain the reports and visuals that use the model.

      Once the dataset is deployed to a workspace, authorized users can create reports in Power BI Desktop and deploy them to their own workspaces. Because the model is in one workspace and the reports are in others, each workspace can have different permissions.

      Like

  4. Jeff Weir

    Matthew…is there another question from me in your spam folder again? I was musing about the fact that shared datasets may be able to reduce the number of workspaces that users would otherwise be forced to use, depending on what level of permissions different people across an organisation had. i.e. using workspaces to ‘bundle’ content to specific user groups who’s access may span that of other user groups.

    Like

  5. Pingback: Dataflows in Power BI: Overview Part 9 – Lineage and impact analysis – BI Polar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s