Session resources: Power BI dataflows and Azure Data Lake integration

Last week I delivered two online sessions on the topic of integrating Power BI dataflows with an organizational Azure Data Lake Storage Gen2 storage account. I’ve blogged on this topic before (link | link | link | link) but sometimes a presentation and demo is worth a thousand words.

On April 30th I delivered a “Power BI dataflows 201 – beyond the basics” session for the PASS Business Intelligence Virtual Chapter. The session recording is online here, and you can download the slides here.

On May 4th I delivered a “Integrating Power BI and Azure Data Lake with dataflows and CDM Folders” session for the SQL Saturday community event in Stockholm, Sweden. I was originally planning to deliver the Stockholm session in person, but due to circumstances beyond my control[1] I ended up presenting remotely, which meant that I could more easily record the session. The session recording is online here, and you can download the slides here.

Each of these sessions covers much of the same material. The Stockholm presentation got off to a bit rocky start[2] but it gets smoother after the first few minutes.

Please feel free to use these slides for your own presentations if they’re useful. And please let me know if you have any questions!


[1] I forgot to book flights. Seriously, I thought I had booked flights in February when I committed to speaking, and by the time I realized that I had not booked them, they were way out of my budget. This was not my finest hour.

[2] The presentation was scheduled to start at 6:00 AM, so I got up at 4:00 and came into the office to review and prepare. Instead I spent the 90 minutes before the session start time fighting with PC issues and got everything working less than a minute before 6:00. I can’t remember ever coming in quite this hot…

General availability of Power BI dataflows

Power BI dataflows have been available in public preview since November 2018. For almost five months, customers around the world have been kicking the tires, testing and providing feedback, and building production capabilities using dataflows.

When Microsoft published the latest Business Applications Release Notes, the “new and planned features” list included dataflows general availability with a target date of April 2019, which could typically mean anything before May 1st.

But… April has just arrived, and so has the dataflows GA!

The full details are on the official Power BI blog, so be sure to check it out. Also keep in mind that although dataflows are now generally available, some specific capabilities are still in preview.

 

Free Data Modeling Course

In any industry there is a small set of luminaries – people who everyone knows by name and by reputation. They’ve been around for years and decades. They started off doing amazing work, and they’ve only gotten better over time.

In the world of Microsoft business intelligence, Marco Russo and Alberto Ferrari are two of these luminaries[1]. I’ve been doing this data thing for a while now, and they know more about data and BI than I will ever forget. Or something like that.

In any event, they have just released a free introductory data modeling course, and you should complete it. I know I will.

For context, there are three big reasons why I’m going to complete an online introductory data modeling course even though I’ve been working as a data professional for over 20 years:

  1. Marco and Alberto are experienced trainers and presenters who know how to communicate complex topics in ways that make them easy to understand and apply. Even if I know a given topic already, I will learn details I did not previously know, or will learn new ways to think about the topic.
  2. There’s a lot I need to learn. My background heavily emphasizes ETL and data warehousing, which was a fine place to specialize when I was part of a multi-disciplinary team and someone else was responsible for the analytics model. When working in Power BI as a self-service user, I frequently run into gaps in my knowledge. Although I know where to look to find answers, focusing on proactive learning will make me more efficient every day I work in Power BI.
  3. Data modeling is a topic of vital importance for enterprise Power BI customers. In my day job[2] I see lots of problems that would not have existed if people had started with a well designed data model.
  4. I’m not getting any younger, so if I’m ever going to start a modeling career, it had better be soon.

I hope you’ll take advantage of Marco and Alberto’s generous offer and complete this course. When you do, let me know what you think.


[1] They are not, as the post title and their names might suggest, elite fashion designers.

[2] Although the dataflows-centric subject matter of this blog may lead you to believe otherwise, most of what I do on a daily basis is help large customers succeed with Power BI… I blog about dataflows just because I love them so much.

My slides from SQL Saturday Victoria

Last weekend I had the pleasure of presenting at SQL Saturday in Victoria, BC. I delivered an introductory session on Power BI dataflows, and included an unplanned impromptu musical[1] performance.

This week at the Microsoft MVP Summit in Redmond I have been talking to a lot of MVPs and I realized that I had yet to make any of my presentation resources available. The slide deck below (click on the image or the link below it) is my standard dataflows slide deck, with the SQL Saturday template applied.

Slide deck

SQLSatVictoria – Introduction to Power BI Dataflows

This deck includes more content than is appropriate for a single session, but it has resources that you can use if you want to present on dataflows at a conference or user group meeting. Please feel free to take what’s there and to use the parts that are valuable. Please also feel free to ping me on Twitter if you have any questions or feedback on the content or the content flow.

If you have any critical feedback to share related to my musical performance, please include a link to your Power BI themed sings as reference.


[1] For a given value of “musical”.

Upcoming Dataflows Presentations

I’m not dead!

After having a prolific first few months with this blog, the year-end holidays disrupted my routine, and I’ve been struggling to get everything balanced again. 2019 has been great so far, but its also been stupid crazy busy, and while blogging has been on my backlog, it just hasn’t made the bar for implementation.

Until now, I guess…

Last week I was in Darmstadt, Germany for the SQL Server Konferenz 2019 event, where I presented on Power BI dataflows. My session was well-attended and well-received[1] but I realized that I’d never actually told anyone about it. Now it’s time to correct this oversight for some upcoming events. These are the public events where I’ll be speaking over the next few months:

Event: SQL Saturday #826

Location: Victoria, BC, Canada

Date: March 16, 2019

Session: Introduction to Power BI dataflows

Event: Intelligent Cloud Conference

Location: Copenhagen, Denmark

Date: April 9, 2019

Session: Integrating Power BI and Azure Data Lake with dataflows and CDM Folders

Pro tip: If you’re attending Intelligent Cloud, be sure to attend Wolfgang Strasser‘s “Let your data flow” session earlier in the day. This session will provide a great introduction to Power BI dataflows and will provide the prerequisite knowledge necessary to get the most out of my session.

Event: SQL Saturday #851

Location: Stockholm, Sweden

Date: May 4, 2019

Session: Hopefully two dataflows sessions[2], details still being ironed out.

 


[1] Except for that one guy who rated the session a 2. I’d love to know what I could have done to improve the presentation and demos

[2] Also, swords.

Dataflows, or Data Flows?

I don’t hear this question as often as I used to[1], but I still hear it: What’s the difference between dataflows in Power BI and Data Flows in Azure Data Factory?

I’ve already written extensively on the Power BI side of things, and now the awesome Merrill Aldrich from BlueGranite has published an excellent overview of the ADF feature on the BlueGranite blog. You should check it out here: https://www.blue-granite.com/blog/ssis-inspired-visual-data-transformation-comes-to-azure-data-factory

I’m not going to summarize the similarities and differences in this post, but after you’ve read Merrill’s article, I’d love to hear your specific questions.


[1] This may or may not be because I’ve been on vacation for the past three weeks.

Power BI Dataflows and Additional Data Sources

A few recent posts have touched on different aspects of how dataflows have capabilities that aren’t exposed in the current preview user experience in Power Query Online. This is true of Power Query “M” functions, and of using custom functions, and it is also true of data sources.

Not every data source that is supported in Power BI Desktop is supported today in dataflows, but these data sources should work[1]:

  • SAP Business Warehouse
  • SAP HANA
  • Analysis Services
  • Azure Analysis Services
  • Google Analytics
  • Adobe Analytics
  • ODBC
  • OLE DB
  • Folder
  • SharePoint Online folder
  • SharePoint folder
  • Hadoop HDFS
  • Azure HDInsight (HDFS)
  • Hadoop file HDFS
  • Informix (beta)

 

For any of these data sources, you should be able to build a query in Power BI Desktop, copy it and paste it into Power Query Online and have it work, even though the data sources are not yet listed in the UX.

Let’s look at an example using the Folder source, both because the Folder data source is not yet supported in the Power Query Online user experience, and because of how to relates to yesterday’s post on using custom functions[2]. We’ll begin by choosing a folder that contains our data files.

 

Once we’ve connected to the folder, we’ll edit the query to load the data from all CSV files that the folder contains, and then combine the contents of the files.

 

Once we’re done, Power Query in Power BI Desktop will give us this:

2018-12-09_17-53-17

There are three points worth mentioning in this image:

  1. The data itself – this is probably what grabs your attention, because it’s big and data
  2. The set of queries built by Power BI Desktop includes multiple queries and a custom function
  3. The main query loaded into the PBIX data model references the other query and the custom function

Now let’s take it all from Power BI Desktop and use it in a dataflow. As covered in the post on authoring Power BI dataflows in Power BI Desktop, right-clicking on the query and choosing “Copy” will copy all of the script we need. Like this:

// Demo File Source
let
Source = Folder.Files("C:\Demo File Source"),
#"Filtered Rows" = Table.SelectRows(Source, each Text.EndsWith([Name], ".csv")),
#"Filtered Hidden Files1" = Table.SelectRows(#"Filtered Rows", each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from Demo File Source", each #"Transform File from Demo File Source"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File from Demo File Source"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Demo File Source", Table.ColumnNames(#"Transform File from Demo File Source"(#"Sample File"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Date received", type date}, {"Product", type text}, {"Sub-product", type text}, {"Issue", type text}, {"Sub-issue", type text}, {"Consumer complaint narrative", type text}, {"Company public response", type text}, {"Company", type text}, {"State", type text}, {"ZIP code", type text}, {"Tags", type text}, {"Consumer consent provided?", type text}, {"Submitted via", type text}, {"Date sent to company", type date}, {"Company response to consumer", type text}, {"Timely response?", type text}, {"Consumer disputed?", type text}, {"Complaint ID", Int64.Type}})
in
#"Changed Type"

// Transform File from Demo File Source
let
Source = (#"Sample File Parameter1") => let
Source = Csv.Document(#"Sample File Parameter1",[Delimiter=",", Columns=18, Encoding=1252, QuoteStyle=QuoteStyle.Csv]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true])
in
#"Promoted Headers"
in
Source

// Sample File
let
Source = Folder.Files("C:\Demo File Source"),
#"Filtered Rows" = Table.SelectRows(Source, each Text.EndsWith([Name], ".csv")),
Navigation1 = #"Filtered Rows"{0}[Content]
in
Navigation1

Isn’t that sweet?

We can then paste each of these queries into the “Blank query” template in Power Query Online. Power Query Online is smart enough to automatically prompt for a gateway and credentials. The only edit we’ll need to make is to remove the one line that’s highlighted in red[3], and then… it just works.

2018-12-09_18-04-57

There’s certainly a little irony involved in using the phrase “it just works” after going through all of these steps… but it does work.

While Power BI dataflows and Power Query Online are in preview, there are likely to be some rough edges. But because dataflows are built on the same Power Query function language and connectivity platform, which means that in many scenarios you can use dataflows for tasks that are not yet fully supported in the interface.

During preview there are still functions and data sources that aren’t yet supported in dataflows, but this technique can unlock data sources and capabilities that aren’t yet exposed through the UI. Check it out.


[1] As the rest of the blog makes clear, I have not yet tested all of these. If you try any of them and they don’t work, remember this: Getting Help for Power BI Dataflows.

[2] But mainly because I don’t have any of the other listed data sources handy to use.

[3] Because this is not yet supported in dataflows.