Power BI and ADLSg2 – but not dataflows

Back in July[1] the Power BI team announced the availability of a new connector for Azure Data Lake Storage Gen2.

It's a data lake. Get it?
When Matthew closes his eyes and pictures a data lake, this is what he sees.

In recent weeks I’ve been starting to hear questions that sound like these:

Question: Is this ADLSg2 connector how you get to the data behind dataflows?

Answer: No. Dataflows are how you get to the data behind dataflows.

Question: Is this how I can access dataflows if I don’t use Power BI Premium?

Answer: No. Dataflows are not a Premium-only feature.

Question: Can I use the ADLSg2 connector to work with CDM folder data?

Answer: Yes, but why would you?

If your data is already in CDM folders, using the ADLSg2 connector simply adds effort to consuming it in Power BI. You’ll be working with raw, untyped text files instead of working with strongly typed entities.

If your ADLSg2 data is already in CDM folders, strongly consider attaching the CDM folder as a dataflow. This means less up-front work for you, and less ongoing work for the users who need to get insights from the data.

Question: Why do we need an ADLSg2 connector if we have dataflows?

Answer: Now that is a good question!

Power BI dataflows store their data in CDM folder format, and they can be configured to store those CDM folders in your organization’s ADLSg2 data lake. In addition to this, you can attach a CDM folder in ADLSg2 as an external dataflow, making its data available to Power BI users even though the data ingress is taking place through another tool like Azure Data Factory.

But ADLSg2 is much, much more[2] than a repository for dataflows or CDM folders. ADLSg2 supports all sorts of file and blob data, not just CDM folders. And sometimes you need to work with that data in Power BI.

The ADLSg2 connector exists for these scenarios, when your data is not stored in CDM folders. With this connector, users in Power BI Desktop can connect to ADLSg2 resources and work with the files they contain, similar to the existing HDFS and Folder connectors.


[1] Yes, this is another catch-up post that has been waiting to be finished. No, I do not have any reason to believe that 2020 will be any more forgiving than 2019 has been.

[2] I could have linked to the product documentation or the official product page, but I believe that Melissa‘s blog does the best job summing up ADLSg2 in a single post.

2 thoughts on “Power BI and ADLSg2 – but not dataflows

  1. Jordan

    Thanks Matthew – “Dataflows are premium only” is an urban legend I encounter weekly in my org.

    Somewhat related, I really like this idea: https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/36495421-support-multiple-azure-data-lake-store-gen2-storag

    One-ADLSg2-Fits-All doesn’t work in large organizations where allocation of cloud costs is paramount. Being able to provision workspace-level ADLSg2 storage will ensure the costs of storage are allocated to the right groups.

    Like

    1. I’ve been blogging about dataflows not being a Premum-only feature for a year at this point. Any help you can provide to help spread the word will be very much appreciated.

      The “more than one lake” idea is planned, but there’s no official ETA to share at this point. The current ADLSg2 integration is in preview, and the dataflows team definitely understands that there are many companies that can’t do more than test with the current preview features.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s