Upcoming Dataflows Presentations

I’m not dead!

After having a prolific first few months with this blog, the year-end holidays disrupted my routine, and I’ve been struggling to get everything balanced again. 2019 has been great so far, but its also been stupid crazy busy, and while blogging has been on my backlog, it just hasn’t made the bar for implementation.

Until now, I guess…

Last week I was in Darmstadt, Germany for the SQL Server Konferenz 2019 event, where I presented on Power BI dataflows. My session was well-attended and well-received[1] but I realized that I’d never actually told anyone about it. Now it’s time to correct this oversight for some upcoming events. These are the public events where I’ll be speaking over the next few months:

Event: SQL Saturday #826

Location: Victoria, BC, Canada

Date: March 16, 2019

Session: Introduction to Power BI dataflows

Event: Intelligent Cloud Conference

Location: Copenhagen, Denmark

Date: April 9, 2019

Session: Integrating Power BI and Azure Data Lake with dataflows and CDM Folders

Pro tip: If you’re attending Intelligent Cloud, be sure to attend Wolfgang Strasser‘s “Let your data flow” session earlier in the day. This session will provide a great introduction to Power BI dataflows and will provide the prerequisite knowledge necessary to get the most out of my session.

Event: SQL Saturday #851

Location: Stockholm, Sweden

Date: May 4, 2019

Session: Hopefully two dataflows sessions[2], details still being ironed out.

 


[1] Except for that one guy who rated the session a 2. I’d love to know what I could have done to improve the presentation and demos

[2] Also, swords.

Dataflows, or Data Flows?

I don’t hear this question as often as I used to[1], but I still hear it: What’s the difference between dataflows in Power BI and Data Flows in Azure Data Factory?

I’ve already written extensively on the Power BI side of things, and now the awesome Merrill Aldrich from BlueGranite has published an excellent overview of the ADF feature on the BlueGranite blog. You should check it out here: https://www.blue-granite.com/blog/ssis-inspired-visual-data-transformation-comes-to-azure-data-factory

I’m not going to summarize the similarities and differences in this post, but after you’ve read Merrill’s article, I’d love to hear your specific questions.


[1] This may or may not be because I’ve been on vacation for the past three weeks.

Recipe: Spicy Cheese Sauce

I’m on vacation for the holidays, and plan to spend much of the next few weeks in the kitchen. This will result in more posts about food and fewer posts about Power BI. You’ve been warned.

This is my favorite cheese sauce. The recipe is adapted from one originally published by Modernist Cuisine, and made much better by the application of chilies.

Please note that the recipe uses proportions by weight, rather than specific amounts. You can easily scale the recipe up or down so long as the proportion is the same, but you will need an accurate kitchen scale for this one.

Ingredients

  • 100% extra sharp orange cheddar cheese
  • 100% jalapeno juice
  • 4% sodium citrate

Procedure

  • Juice 10-50 jalapeno chili peppers and weigh the juice.
  • Grate an equal weight of cheddar cheese.
  • In a saucepan over medium-high heat, combine the juice and sodium citrate and bring to a boil.
  • Gradually add grated cheddar to the liquid, blending with an immersion blender to incorporate each batch of cheese before adding more.
  • That’s it. It doesn’t get much simpler than this.

Storage

Store in an airtight container and refrigerate.

Applications

  • Dip tortilla chips into the cheese sauce and eat them. Eat all of them!!
  • Use on nachos.
  • Use on chili cheese fries, with homemade french fries and green chili sauce from El Patio De Albuquerque.

Notes

  • If you really love spicy food, you may be tempted to use habaneros or another spicier chili to start with. Don’t do that. Start with jalapenos and scale the heat up as appropriate by adding a few spicier chilies to the mix. Trust me on this one.
  • Alternately you can scale down the heat by replacing some of the liquid with water. Do what is right for you.
  • So long as you keep the same proportions, you can use any liquid and any cheese you want. The possibilities are limitless.
  • You can also use jalapeno juice in other recipes where you want to add flavor and heat. My other favorite application is in yeast breads, where you can replace the water with chili juice. I’ve made this recipe multiple times with great results using this technique.

Recipe: Sea Salt Caramels

This is my favorite caramel. The recipe is adapted from one originally published in Dessert Professional Magazine. As with any caramel you want to be careful with this one – don’t take your eyes off hot caramel!

IMG_20181103_130641_Bokeh.jpg

Ingredients

  • 123 grams glucose syrup
  • 821 grams granulated sugar
  • 10 grams Fleur de Sel sea salt
  • 657 grams unsalted butter
  • 388 grams heavy cream

Procedure

  • Line a 9×13″ baking/casserole dish with parchment paper. Have the prepared dish ready by the stove.
  • Combine the glucose, sugar and salt in a large pot over medium-high heat and cook until it reaches 293 F, stirring frequently.
  • Meanwhile, combine the butter and cream in a medium pot and bring to a boil.
  • Once the sugar mixture reaches 293 F, slowly and carefully pour the cream and butter into the pot with the sugar, whisking constantly.
  • Bring the mixture back up to 122 C / 252 F, stirring constantly.
  • Pour carefully into the prepared dish.
  • Cool the caramel to room temperature, 3-4 hours or overnight.
  • Cut the cooled caramels into squares.

Storage

Store in an airtight container at room temperature.

Vacuum seal and freeze indefinitely.

Applications

  • Eat them as is
  • Enrobe with tempered chocolate, and top with a few flakes of sea salt
  • Bake peanut butter cookies with a small thumbprint depression in their tops, and fill the depression with a generous lump of caramel

 

What is Your Gift?

I have a theory that everyone has a gift. Not a gift that they have been given, but a gift that they can give to others.

blue-box-container-675970.jpg

Your gift is something you’re good at, and which enables you to expend minimal effort to achieve disproportionally large benefit in return.

If you’re an IT professional, you’ve probably encountered situations where you could spend a few minutes configuring a loved one’s hardware or software. For you, this was a trivial task, done in minutes or seconds without any real effort, but for your father or aunt it would have been a stressful ordeal, with no guarantee of success.

Regardless of the context, there’s something that you’re better at. Cooking, baking, drawing, painting, photography, singing, rhyming, writing, carpentry, plumbing… something. As you read this list, hopefully something leaped to mind. For me, this list include a few things where I have serious skills, and a bunch of things where I feel hopelessly challenged[1].

Once you recognize and acknowledge your gift, you just need to keep your eyes open for opportunities to give it. Each day, look for the places where you could invest a few minutes to spare someone a few hours. Look for the chance to invest an hour to save someone days or weeks. And once you identify the opportunities, choose which of them to act upon. It’s not your responsibility to solve every problem you can, but your gift is only valuable when you share it.

What is your gift?

Who will you share it with?


[1] One day I’ll encounter a situation where someone desperately needs a skilled swordsman. But it is not likely to be this day.

Power BI Dataflows and Additional Data Sources

A few recent posts have touched on different aspects of how dataflows have capabilities that aren’t exposed in the current preview user experience in Power Query Online. This is true of Power Query “M” functions, and of using custom functions, and it is also true of data sources.

Not every data source that is supported in Power BI Desktop is supported today in dataflows, but these data sources should work[1]:

  • SAP Business Warehouse
  • SAP HANA
  • Analysis Services
  • Azure Analysis Services
  • Google Analytics
  • Adobe Analytics
  • ODBC
  • OLE DB
  • Folder
  • SharePoint Online folder
  • SharePoint folder
  • Hadoop HDFS
  • Azure HDInsight (HDFS)
  • Hadoop file HDFS
  • Informix (beta)

 

For any of these data sources, you should be able to build a query in Power BI Desktop, copy it and paste it into Power Query Online and have it work, even though the data sources are not yet listed in the UX.

Let’s look at an example using the Folder source, both because the Folder data source is not yet supported in the Power Query Online user experience, and because of how to relates to yesterday’s post on using custom functions[2]. We’ll begin by choosing a folder that contains our data files.

 

Once we’ve connected to the folder, we’ll edit the query to load the data from all CSV files that the folder contains, and then combine the contents of the files.

 

Once we’re done, Power Query in Power BI Desktop will give us this:

2018-12-09_17-53-17

There are three points worth mentioning in this image:

  1. The data itself – this is probably what grabs your attention, because it’s big and data
  2. The set of queries built by Power BI Desktop includes multiple queries and a custom function
  3. The main query loaded into the PBIX data model references the other query and the custom function

Now let’s take it all from Power BI Desktop and use it in a dataflow. As covered in the post on authoring Power BI dataflows in Power BI Desktop, right-clicking on the query and choosing “Copy” will copy all of the script we need. Like this:

// Demo File Source
let
Source = Folder.Files("C:\Demo File Source"),
#"Filtered Rows" = Table.SelectRows(Source, each Text.EndsWith([Name], ".csv")),
#"Filtered Hidden Files1" = Table.SelectRows(#"Filtered Rows", each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from Demo File Source", each #"Transform File from Demo File Source"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File from Demo File Source"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Demo File Source", Table.ColumnNames(#"Transform File from Demo File Source"(#"Sample File"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Date received", type date}, {"Product", type text}, {"Sub-product", type text}, {"Issue", type text}, {"Sub-issue", type text}, {"Consumer complaint narrative", type text}, {"Company public response", type text}, {"Company", type text}, {"State", type text}, {"ZIP code", type text}, {"Tags", type text}, {"Consumer consent provided?", type text}, {"Submitted via", type text}, {"Date sent to company", type date}, {"Company response to consumer", type text}, {"Timely response?", type text}, {"Consumer disputed?", type text}, {"Complaint ID", Int64.Type}})
in
#"Changed Type"

// Transform File from Demo File Source
let
Source = (#"Sample File Parameter1") => let
Source = Csv.Document(#"Sample File Parameter1",[Delimiter=",", Columns=18, Encoding=1252, QuoteStyle=QuoteStyle.Csv]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true])
in
#"Promoted Headers"
in
Source

// Sample File
let
Source = Folder.Files("C:\Demo File Source"),
#"Filtered Rows" = Table.SelectRows(Source, each Text.EndsWith([Name], ".csv")),
Navigation1 = #"Filtered Rows"{0}[Content]
in
Navigation1

Isn’t that sweet?

We can then paste each of these queries into the “Blank query” template in Power Query Online. Power Query Online is smart enough to automatically prompt for a gateway and credentials. The only edit we’ll need to make is to remove the one line that’s highlighted in red[3], and then… it just works.

2018-12-09_18-04-57

There’s certainly a little irony involved in using the phrase “it just works” after going through all of these steps… but it does work.

While Power BI dataflows and Power Query Online are in preview, there are likely to be some rough edges. But because dataflows are built on the same Power Query function language and connectivity platform, which means that in many scenarios you can use dataflows for tasks that are not yet fully supported in the interface.

During preview there are still functions and data sources that aren’t yet supported in dataflows, but this technique can unlock data sources and capabilities that aren’t yet exposed through the UI. Check it out.


[1] As the rest of the blog makes clear, I have not yet tested all of these. If you try any of them and they don’t work, remember this: Getting Help for Power BI Dataflows.

[2] But mainly because I don’t have any of the other listed data sources handy to use.

[3] Because this is not yet supported in dataflows.

Using Custom Functions in Power BI Dataflows

You can use custom Power Query “M” functions in Power BI dataflows, even though they’re not exposed and supported in the preview Power Query Online editor to the same extent they are supported in Power BI Desktop.[1]

As mentioned in a recent post on Authoring Power BI Dataflows in Power BI Desktop, the Power Query “M” queries that define your dataflow entities can contain a lot more than what can be created in Power Query Online. One example of this is support for custom functions in a dataflow. Functions work the same way in dataflows as they work in Power Query Desktop – there’s just not the same UX support.

Let’s see how this works. Specifically, let’s build a dataflow that contains a custom function and which invokes it in one of the dataflow entities. Here’s what we’ll do:

  1. We’ll define a custom function that accepts start and end dates, and returns a table with one row for each day between these dates. Specifically, we’ll use the date dimension approach that Matt Masson published five years ago[2], when Power Query functions were new.
  2. We’ll pull in sales order data from the SalesOrderHeader table in the AdventureWorks sample database to be an “Order” entity in the dataflow.
  3. We’ll use the min and max of the various date columns in the SalesOrderHeader table to get the parameter values to pass into the custom function. We’ll then call the custom function to build a Date entity in the dataflow.
  4. We’ll close our eyes and imagine doing the rest of the work to load other entities in the dataflow to make what we’d need to build a full star schema in a Power BI dataset, but we won’t actually do the work.

Let’s go. Since we’re just copying the code from Matt‘s blog, we’ll skip the code here, but the result in Power Query Online is worth looking at.

2018-12-08_12-58-57

Even though Power Query Online doesn’t have a dedicated “create function” option, it does recognize when a query is a function, and does include a familiar UX for working with a function. You will, however, need to clear the “Enable load” option for the query, since a function can’t be loaded directly.

The Order entity is super simple – we’re just pulling in a table and removing the complex columns that Power Query adds to represent related tables in the database. Here’s the script:

let
Source = Sql.Database("myserver.database.windows.net", "adventureworks"),
SalesLT_SalesOrderHeader = Source{[Schema = "SalesLT", Item = "SalesOrderHeader"]}[Data],
#"Removed columns" = Table.RemoveColumns(SalesLT_SalesOrderHeader, {"SalesLT.Address(BillToAddressID)", "SalesLT.Address(ShipToAddressID)", "SalesLT.Customer", "SalesLT.SalesOrderDetail"})
in
#"Removed columns"

Now we need to put the two of them together. Let’s begin by duplicating the Order entity. If we referenced the Order entity instead of duplicating it, we would end up with a computed entity, which would require Power BI Premium capacity to refresh.

This is what the query looks like before we invoke the custom function. With all of the awesome options on the “Add Column” tab in Power BI Desktop, implementing this logic was surprisingly easy.

let
Source = Sql.Database("myserver.database.windows.net", "adventureworks"),
SalesLT_SalesOrderHeader = Source{[Schema="SalesLT",Item="SalesOrderHeader"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(SalesLT_SalesOrderHeader,{"OrderDate", "DueDate", "ShipDate"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each "Group by all"),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Custom"}, {{"Min Order Date", each List.Min([OrderDate]), type datetime}, {"Max Order Date", each List.Max([OrderDate]), type datetime}, {"Min Due Date", each List.Min([DueDate]), type datetime}, {"Max Due Date", each List.Max([DueDate]), type datetime}, {"Min Ship Date", each List.Min([ShipDate]), type datetime}, {"Max Ship Date", each List.Max([ShipDate]), type datetime}}),
#"Changed Type" = Table.TransformColumnTypes(#"Grouped Rows",{{"Min Order Date", type date}, {"Max Order Date", type date}, {"Min Due Date", type date}, {"Max Due Date", type date}, {"Min Ship Date", type date}, {"Max Ship Date", type date}}),
#"Inserted Earliest" = Table.AddColumn(#"Changed Type", "Min Date", each List.Min({[Min Order Date], [Min Due Date], [Min Ship Date]}), type date),
#"Inserted Latest" = Table.AddColumn(#"Inserted Earliest", "Max Date", each List.Max({[Max Order Date], [Max Due Date], [Max Ship Date]}), type date),
#"Removed Other Columns1" = Table.SelectColumns(#"Inserted Latest",{"Min Date", "Max Date"})
in
#"Removed Other Columns1"

At the end of this interim query, we have two columns to pass in to the custom function. And once we do, it looks like this:

2018-12-08_13-48-40

And here’s the final script used to define the Date entity.

 
let
Source = Sql.Database("myserver.database.windows.net", "adventureworks"),
SalesLT_SalesOrderHeader = Source{[Schema="SalesLT",Item="SalesOrderHeader"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(SalesLT_SalesOrderHeader,{"OrderDate", "DueDate", "ShipDate"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each "Group by all"),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Custom"}, {{"Min Order Date", each List.Min([OrderDate]), type datetime}, {"Max Order Date", each List.Max([OrderDate]), type datetime}, {"Min Due Date", each List.Min([DueDate]), type datetime}, {"Max Due Date", each List.Max([DueDate]), type datetime}, {"Min Ship Date", each List.Min([ShipDate]), type datetime}, {"Max Ship Date", each List.Max([ShipDate]), type datetime}}),
#"Changed Type" = Table.TransformColumnTypes(#"Grouped Rows",{{"Min Order Date", type date}, {"Max Order Date", type date}, {"Min Due Date", type date}, {"Max Due Date", type date}, {"Min Ship Date", type date}, {"Max Ship Date", type date}}),
#"Inserted Earliest" = Table.AddColumn(#"Changed Type", "Min Date", each List.Min({[Min Order Date], [Min Due Date], [Min Ship Date]}), type date),
#"Inserted Latest" = Table.AddColumn(#"Inserted Earliest", "Max Date", each List.Max({[Max Order Date], [Max Due Date], [Max Ship Date]}), type date),
#"Removed Other Columns1" = Table.SelectColumns(#"Inserted Latest",{"Min Date", "Max Date"}),
#"Invoked Custom Function" = Table.AddColumn(#"Removed Other Columns1", "fn_DateTable", each fn_DateTable([Min Date], [Max Date], null)),
fn_DateTable1 = #"Invoked Custom Function"{0}[fn_DateTable],
#"Changed Type1" = Table.TransformColumnTypes(fn_DateTable1,{{"Year", Int64.Type}, {"QuarterOfYear", Int64.Type}, {"MonthOfYear", Int64.Type}, {"DayOfMonth", Int64.Type}, {"DateInt", Int64.Type}, {"DayInWeek", Int64.Type}, {"MonthName", type text}, {"MonthInCalendar", type text}, {"QuarterInCalendar", type text}, {"DayOfWeekName", type text}})
in
#"Changed Type1"

 

Most of the complexity in this approach is in the work required to get min and max values from three columns in a single table. The topic of the post – calling a custom function inside a dataflow entity definition – is trivial.

When we’re done, the list of entities only shows Order and Date, because these are the only two queries that are being loaded into the dataflow’s CDM folder storage. But the definition of the Date query includes the use of a custom function, which allows us to have rich and possibly complex functionality included in the dataflow code, and referenced by one or more entities as necessary.


[1] I was inspired to write this post when I saw this idea on ideas.powerbi.com. If this capability is obscure enough to get its own feature request and over a dozen votes, it probably justifies a blog post.

[2] And the same code, copied and pasted, like real developers do.