Using Custom Functions in Power BI Dataflows

Important: This post was written and published in 2018, and the content below no longer represents the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

You can use custom Power Query “M” functions in Power BI dataflows, even though they’re not exposed and supported in the preview Power Query Online editor to the same extent they are supported in Power BI Desktop.[1]

As mentioned in a recent post on Authoring Power BI Dataflows in Power BI Desktop, the Power Query “M” queries that define your dataflow entities can contain a lot more than what can be created in Power Query Online. One example of this is support for custom functions in a dataflow. Functions work the same way in dataflows as they work in Power Query Desktop – there’s just not the same UX support.

Let’s see how this works. Specifically, let’s build a dataflow that contains a custom function and which invokes it in one of the dataflow entities. Here’s what we’ll do:

  1. We’ll define a custom function that accepts start and end dates, and returns a table with one row for each day between these dates. Specifically, we’ll use the date dimension approach that Matt Masson published five years ago[2], when Power Query functions were new.
  2. We’ll pull in sales order data from the SalesOrderHeader table in the AdventureWorks sample database to be an “Order” entity in the dataflow.
  3. We’ll use the min and max of the various date columns in the SalesOrderHeader table to get the parameter values to pass into the custom function. We’ll then call the custom function to build a Date entity in the dataflow.
  4. We’ll close our eyes and imagine doing the rest of the work to load other entities in the dataflow to make what we’d need to build a full star schema in a Power BI dataset, but we won’t actually do the work.

Let’s go. Since we’re just copying the code from Matt‘s blog, we’ll skip the code here, but the result in Power Query Online is worth looking at.

2018-12-08_12-58-57

Even though Power Query Online doesn’t have a dedicated “create function” option, it does recognize when a query is a function, and does include a familiar UX for working with a function. You will, however, need to clear the “Enable load” option for the query, since a function can’t be loaded directly.

The Order entity is super simple – we’re just pulling in a table and removing the complex columns that Power Query adds to represent related tables in the database. Here’s the script:

let
Source = Sql.Database("myserver.database.windows.net", "adventureworks"),
SalesLT_SalesOrderHeader = Source{[Schema = "SalesLT", Item = "SalesOrderHeader"]}[Data],
#"Removed columns" = Table.RemoveColumns(SalesLT_SalesOrderHeader, {"SalesLT.Address(BillToAddressID)", "SalesLT.Address(ShipToAddressID)", "SalesLT.Customer", "SalesLT.SalesOrderDetail"})
in
#"Removed columns"

Now we need to put the two of them together. Let’s begin by duplicating the Order entity. If we referenced the Order entity instead of duplicating it, we would end up with a computed entity, which would require Power BI Premium capacity to refresh.

This is what the query looks like before we invoke the custom function. With all of the awesome options on the “Add Column” tab in Power BI Desktop, implementing this logic was surprisingly easy.

let
Source = Sql.Database("myserver.database.windows.net", "adventureworks"),
SalesLT_SalesOrderHeader = Source{[Schema="SalesLT",Item="SalesOrderHeader"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(SalesLT_SalesOrderHeader,{"OrderDate", "DueDate", "ShipDate"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each "Group by all"),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Custom"}, {{"Min Order Date", each List.Min([OrderDate]), type datetime}, {"Max Order Date", each List.Max([OrderDate]), type datetime}, {"Min Due Date", each List.Min([DueDate]), type datetime}, {"Max Due Date", each List.Max([DueDate]), type datetime}, {"Min Ship Date", each List.Min([ShipDate]), type datetime}, {"Max Ship Date", each List.Max([ShipDate]), type datetime}}),
#"Changed Type" = Table.TransformColumnTypes(#"Grouped Rows",{{"Min Order Date", type date}, {"Max Order Date", type date}, {"Min Due Date", type date}, {"Max Due Date", type date}, {"Min Ship Date", type date}, {"Max Ship Date", type date}}),
#"Inserted Earliest" = Table.AddColumn(#"Changed Type", "Min Date", each List.Min({[Min Order Date], [Min Due Date], [Min Ship Date]}), type date),
#"Inserted Latest" = Table.AddColumn(#"Inserted Earliest", "Max Date", each List.Max({[Max Order Date], [Max Due Date], [Max Ship Date]}), type date),
#"Removed Other Columns1" = Table.SelectColumns(#"Inserted Latest",{"Min Date", "Max Date"})
in
#"Removed Other Columns1"

At the end of this interim query, we have two columns to pass in to the custom function. And once we do, it looks like this:

2018-12-08_13-48-40

And here’s the final script used to define the Date entity.

 
let
Source = Sql.Database("myserver.database.windows.net", "adventureworks"),
SalesLT_SalesOrderHeader = Source{[Schema="SalesLT",Item="SalesOrderHeader"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(SalesLT_SalesOrderHeader,{"OrderDate", "DueDate", "ShipDate"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each "Group by all"),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Custom"}, {{"Min Order Date", each List.Min([OrderDate]), type datetime}, {"Max Order Date", each List.Max([OrderDate]), type datetime}, {"Min Due Date", each List.Min([DueDate]), type datetime}, {"Max Due Date", each List.Max([DueDate]), type datetime}, {"Min Ship Date", each List.Min([ShipDate]), type datetime}, {"Max Ship Date", each List.Max([ShipDate]), type datetime}}),
#"Changed Type" = Table.TransformColumnTypes(#"Grouped Rows",{{"Min Order Date", type date}, {"Max Order Date", type date}, {"Min Due Date", type date}, {"Max Due Date", type date}, {"Min Ship Date", type date}, {"Max Ship Date", type date}}),
#"Inserted Earliest" = Table.AddColumn(#"Changed Type", "Min Date", each List.Min({[Min Order Date], [Min Due Date], [Min Ship Date]}), type date),
#"Inserted Latest" = Table.AddColumn(#"Inserted Earliest", "Max Date", each List.Max({[Max Order Date], [Max Due Date], [Max Ship Date]}), type date),
#"Removed Other Columns1" = Table.SelectColumns(#"Inserted Latest",{"Min Date", "Max Date"}),
#"Invoked Custom Function" = Table.AddColumn(#"Removed Other Columns1", "fn_DateTable", each fn_DateTable([Min Date], [Max Date], null)),
fn_DateTable1 = #"Invoked Custom Function"{0}[fn_DateTable],
#"Changed Type1" = Table.TransformColumnTypes(fn_DateTable1,{{"Year", Int64.Type}, {"QuarterOfYear", Int64.Type}, {"MonthOfYear", Int64.Type}, {"DayOfMonth", Int64.Type}, {"DateInt", Int64.Type}, {"DayInWeek", Int64.Type}, {"MonthName", type text}, {"MonthInCalendar", type text}, {"QuarterInCalendar", type text}, {"DayOfWeekName", type text}})
in
#"Changed Type1"

 

Most of the complexity in this approach is in the work required to get min and max values from three columns in a single table. The topic of the post – calling a custom function inside a dataflow entity definition – is trivial.

When we’re done, the list of entities only shows Order and Date, because these are the only two queries that are being loaded into the dataflow’s CDM folder storage. But the definition of the Date query includes the use of a custom function, which allows us to have rich and possibly complex functionality included in the dataflow code, and referenced by one or more entities as necessary.


[1] I was inspired to write this post when I saw this idea on ideas.powerbi.com. If this capability is obscure enough to get its own feature request and over a dozen votes, it probably justifies a blog post.

[2] And the same code, copied and pasted, like real developers do.

13 thoughts on “Using Custom Functions in Power BI Dataflows

  1. Pingback: Dataflows in Power BI – BI Polar

  2. Frank Tonsen

    For the time being I can’t share your enthusiasm.

    Isn’t it correct to say that if I use dataflows to build my data model there will be no query folding?

    Furthermore, don’t I need twice as much storage space?

    Like

  3. Pingback: Power BI Dataflows and Additional Data Sources – BI Polar

  4. Is custom function premium only feature? I created a function which does transformation for the files read form azure blob but getting message

    One or more entities references a dynamic data source

    Like

      1. Parv Chana

        Thanks!! I have to change my function to make it work but in other function I was getting the error message. Have to check what was the cause that I received error message.

        Like

  5. Ed

    Hi Matthew, thanks for the post, very helpful and interesting (as is your whole blog really!). Can we share functions between dataflows in the same workspace? My scenario – in PBI Desktop, I have 10 queries and 1 function. The queries get data from different sources using the function. Refresh frequency is different for each data source so I want to create 10 dataflows. Do I have to include the function in all 10 dataflows? Thanks very much.

    Like

  6. Ahmed Hafidh

    Thank you so much for the great information!!
    I did a custom M query using dataflow but it fails, stating :
    “One or more tables references a dynamic data source”.

    Here is part of my custom m query:
    (number as number, Genre as text) as table =>
    let
    Source = Web.Contents(“https://www.imdb.com/search/title/?genres=”&Genre&”&start=”&Number.ToText(number)&”&explore=title_type,genres&ref_=adv_nxt”),…..

    Please help.

    Thank you

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s