Prepare for the BI Power Hour at PASS Summit

If you follow Power BI on Twitter, you may have noticed the #BIPowerHour tag over the last few weeks. For those who have been part of the Microsoft data community for a while, you probably know what this is all about, but I’ve seen quite a few people asking “What is the BI Power Hour?” This post is for them.

The BI Power Hour is a conference session that shows Microsoft data and business intelligence tools, products, and services being used for purposes other than that for which they were built. I’ve heard people describe it as “stupid BI tricks”[1], and as smart as some of the tricks have been, I can’t argue with that description. The session has been part of multiple conferences for over a decade, with some of the best technical presenters around[2] showing off humorous and thought-provoking[3] demos.

Some of my personal favorite BI Power Hour demos over the years include:

  • Building an online dating app using SQL Server Integration Services
  • Using SQL Server Integration Services, Master Data Services, and Data Quality services to respond to all of the social opportunities that came from the previous year’s dating app
  • Data mining the Titanic passenger list using the data mining add-in for Excel
  • Data mining the Oracle price list using the data mining add-in for Excel[4]
  • Building a multiplayer version of Battleship using SQL Server Reporting Services
  • Predicting the end of the world with Power BI[5]
  • Using Power BI and a database of heavy metal bands to demonstrate how Power BI got its name

The list goes on and on – there are too many highlights to call out here. Each year has been different, but each year has also been delightful.

This year we’re back, and we have some “pro tips” for those of you who are attending the PASS Summit next week in Seattle.

  1. Show up early. The last time we had a BI Power Hour at PASS in Seattle, the room was 100% full, with scores of people turned away.
  2. Sit near the front, even if you don’t sit in the front row. We’ll be throwing swag, and much of it is not aerodynamic, so sitting all the way in the back reduces the odds of your getting a t-shirt or other goodies.
  3. Sit next to someone who is a different size than you. This one may sound odd, but if the person next to you catches a t-shirt that doesn’t fit, your odds just increased. Am I right?
  4. Don’t expect to learn anything useful. We’ll mention this one again during the session intro, but it’s important to know in advance that we have explicitly defined real-world skills as Out Of Scope for this session. the BI Power Hour is all about having fun at the end of a long week – if you want to learn something, pick a different session.
  5. Use the #BIPowerHour hashtag everywhere you can!! Because no real-world skills are taught, and no useful topics are covered, some people[6] seem to think that the session isn’t a priority. If you disagree, please share and tweet before, during, and after the session, to let the powers that be understand your opinion.

Do you have any tips you would add to this list?

If you do, please let me know!

I’ll wrap up by casually reminding you that the BI Power Hour put “Power” and “BI” together many years before they ever appeared in an official, marketing-blessed name. I’m not saying that this is the session where the future gets determined, but…

 


[1] For those of you who don’t get the reference, I think this was the inspiration for this label: https://www.youtube.com/watch?v=xu2A8OK4vJg

[2] I have also been involved on occasion

[3] Typically the thoughts it provokes are something along the lines of “who let these people on stage”?

[4] I still can’t believe they did this one

[5] 2012 was such a simpler time, when we joked about the end of the world…

[6] Typically people who are responsible for choosing conference sessions

Power Query: Connecting to Excel Workbooks in SharePoint Online

Update September 2020: This post gets so many hits every day I decided to make a video. You should watch it. It is short.

Original post:

I try to avoid using Excel as a data source when I can. Although Excel has many benefits[1], its limitations[2] make it less than ideal for enterprise BI solutions. Despite this, it feels like I end up needing to analyze data in Excel at least once a month. And every time I’m connecting from Power Query to an Excel workbook located in SharePoint Online or OneDrive for Business, I find myself exclaiming “it shouldn’t be this hard!”[3]

What’s the problem? It’s not working with the data – Power Query makes this as painless as you’d expect. The problem is referencing the XSLX file. What path do I use?

In Excel and in SharePoint, there is a “copy link” option, which will copy a link to the workbook so you can share it with others.

doesn't work

It always feels like this should work, that I should be able to paste this URL into the Get Data experience in Power Query, and if it’s been more than a few months since the last time I tried, I expect it to work. But it never does.

This works:

this works

In the Excel ribbon, click on the File tab, and on the Info screen that appears click on the label under the file name. There’s no visual indicator that this label is a menu button until you hover over it, but when you click on it, a menu appears with a “Copy path to clipboard” option.

Update May 2020: Since this post was originally published, the Excel team has made this option a little easier to find. Your Excel options may look something like this:

2020-05-19-15-56-13-838--EXCEL

Clicking “Copy path” will give you a path in this format:

https://organization.sharepoint.com/path/path/path/NicelyEscapedFileName.xlsx?web=1

With this starting point, all you need to do is remove the ?web=1 query string parameter at the end of the URL, and you have the path to the workbook that Power Query needs.

Choose “Web” as the data source in the Get Data menu, and paste this URL into the Dialog:

From web

Power Query is smart enough to understand that this URL is pointing to an Excel workbook, so from this point on, you’ll get an experience where you can select tables and worksheets, and can start manipulating and analyzing your Excel data.

navigator

Life is good.

Full disclosure: Were it not for Power Query program manager Miguel Llopis, I never would have found this. During one of our dataflows sessions at Microsoft Ignite, Miguel spent a few minutes walking through the these steps at the beginning of a demo. My initial reaction was along the lines of “Why are you showing this?  This isn’t a dataflows topic!” but when I realized what he was showing it quickly changed to “Holy crap – I didn’t know you could do that!”

In any event, I’m working with data in Excel again today, and spent a few minutes trying to find this technique documented anywhere. When I couldn’t find any, this post was born… but all credit should go to Miguel.

 


[1] None of which will I list here. It’s not that kind of blog.

[2] Neither will the limitations be enumerated. Feel free to take my word for it, or go build a data warehouse or two that extract data from Excel, and then come back and let me know what you think.

[3] To be fair, this may be another of those “get off my lawn” moments. It’s hard to tell sometimes.

Create Something Awesome and Share It

One of the things I love the most about software development is that it lets me imagine something, create the thing I imagined, and share the thing I created.

For me, baking hits the same creative “sweet spot” that software development hits. And, of course baking can also be literally sweet as well. But beyond the joy of eating something delicious and homemade, making something with your own two hands can be deeply satisfying and therapeutic, in a way that making something at the keyboard is not.

Kind of like how fencing is sometimes described as “physical chess,” baking is like physical software development.[1] There are common components and frameworks. There are tools that are required, and tools which are optional, but which will will save you time and frustration. There are patterns that can be learned in one application, and applied to other, related applications. There are some rules you must follow, and some rules you can bend, and it takes study and experience to know the difference between the two. The parallels go on and on.

It doesn’t matter where you channel your creativity, but having a creative outlet where you can relax, learn, and express yourself.

In related news, if you’re going to be at the Power BI World Tour event today at the Microsoft Advanta offices in Bellevue, I baked cookies. Specifically, I baked caraway shortbread cookies inspired by the ones they serve at Dinner by Heston in London. Look for me at the evening reception event.

IMG_20181027_172304_Bokeh


[1] Fencing is also sometimes known as “chess with pain” and “pain” is French for bread. Coincidence? I think not.

Talking About Mental Health is Important

(If you’re only here for the technical content, please feel free to skip this post – but I hope you’ll read anyway.)

Imagine you’re building a BI solution, and you encounter problem behavior. You’re not sure if it’s an actual bug, but it is keeping you from achieving your goals. You search online, but can’t find any knowledge base articles, blog posts, or documentation that help. When you try to talk to other developers, none of them wants to talk about it, none will admit to having the same problem, and it seems like they treat you differently after you asked, like they think less of you because you couldn’t solve this problem on your own. Over time you develop workarounds to build a solution that meets your requirements. Some of the workarounds are quite elaborate, which impacts the performance and maintainability of your solution, but because no one will talk about the underling problems, you don’t have any way to communicate the impact of the workarounds to other stakeholders. It feels like everything is harder than it needs to be.

That sounds pretty awful, doesn’t it? Now imagine if this was life in general, not just a BI project.

I’ve suffered from clinical anxiety and depression since I was in my early teens. Both the anxiety and the depression have come and gone over the years, but each has also grown worse as I’ve aged. And to make things worse, until I was well into my 40s I didn’t realize what was going on.

I thought it was just me. I thought that I was just having a harder time doing things that were easier for others. I felt lessened, and inadequate. I felt broken.

But that was only when the depression and anxiety were present. When they receded, it was hard to remember how hard everything had been.

If this sounds strange, think back to the last time you were seriously ill, or injured[1]. The injury or illness was an ever-present part of your life, making it difficult to do some things and impossible to do others. If your recovery took more than a few weeks, it may have been difficult to even remember a time when you hadn’t needed to structure your life around your disability. But once you’ve recovered, even though you can remember the time you were sick or hurt, how well can you actually remember what it was like, how it actually felt at the time? It’s hard, isn’t it?

I didn’t start figuring out what was going on until mental health started being part of public discourse – until people started talking about it. Sadly, this typically meant some celebrity had taken his own life, and for a few weeks the news would talk about how the veneer of happiness and success had masked his sadness all along. Sometimes there would be meaningful discussion, but usually the news coverage was about as informed and meaningful as news coverage of complex technical issues, with all that implies.

In addition to this increasing public dialog, there were four specific resources that were incredibly helpful to me, and which empowered me to look differently at my own mental health:

  1. Actor Wil Wheaton’s epic blog post about depression: This raw and personal story had me in tears, because so much of it was my own story. The details are different, and it’s obvious that Wil’s struggles are more difficult than my own, but the themes and patterns were incredibly familiar.
  2. “The Anxiety Chronicles” on The Lily: This is a series of short articles by different women, each with the tagline “this is how I experience anxiety.” Again, while the details are different, many of these stories have strong and stark parallels with my own anxiety.
  3. The growing discussion about mental health in the tech field: I started noticing this trend this year, and I’m seeing more and more of it. When I spoke at the Craft Conference in Budapest there were a bunch of sessions including this one about depression and burnout from the amazing Gitte Klitgaard. And there is a wonderful community on Twitter including April Wensel and her “compassionate coding” efforts, and Microsoft’s Ann Johnson and Shona Chee.  Although I call out these three leaders by name, it’s not just them that help. What helps the most is that there is a discussion, an open and public discussion, that lets people know that it is ok to struggle, that it’s ok to admit that you’re struggling, and that there are people who want to help.
  4. Friends, especially younger friends: Most of the people in my martial arts community are much younger than I am, and it appears that the younger generation hasn’t picked up all the negative baggage that my generation did. Whether it is authors Joseph Brassey or Ed McDonald, coach and instructor Kaja Sadowski, or any of the awesome young people I train with, having people I know and respect who are openly discussing their trials and tribulations is incredibly valuable – because it shows me that I’m not alone in the challenges I face.

This is why I decided to write this post, and why I’m glad you took the time to actually read it. If even one person reading this post has felt as alone and lost as I have felt, and now realizes that it’s not just them, if this post can help even half as much as Wil Wheaton’s post helped me, that it was worth writing.

Please understand that this isn’t just about talking – I’ve taken concrete steps to improve my mental health. I exercise every day. I prioritize getting enough quality sleep every night. I eat well, and cook most of the food I eat. I quit drinking, and use a HappyLight during the winter months. I’m working to find a therapist to help me build better coping mechanisms for those times when life is overwhelming. I practice mindful self-care every day, building habits and routines when life is good so that I can keep them going when it is not.

But those steps only go so far in isolation. Knowing that there are other people out there going through what you’re going through is invaluable and empowering. Having a community, however loosely-knit, can make the bad times feel less lonely.

Don’t worry – I’ll keep posting about technical topics like Power BI dataflows, and metadata. But please realize that when things are dark and times are hard, those posts will be made possible in large part by the “elaborate workarounds” mentioned in the analogy at the beginning of this post. Like most[2] successful people who struggle with mental illness, I have over the years developed a set of coping mechanisms that let me keep going, keep creating and delivering, even when it is difficult to do so – and to make it appear like nothing is wrong. Please realize that there are people around you who are doing the same thing, and who may not be able or willing to talk about it – they’re struggling silently and trying to get by, fighting a battle that one one else can see.

So be kind. Practice being kind by default. It makes a bigger difference than you might think.

 


[1] I’m thinking back to the time in 2016 when I broke my right hand. I needed surgery to repair the break, and after surgery I was in a cast for two months, followed by months of physical therapy. Now, even though I still do physical therapy exercises every day, I can’t remember what it was like to have my mobility and function so limited for so long. It’s almost like it happened to someone else, because of the passage of time.

[2] I assume, but this assumption is based on personal anecdotes and observations, rather than on real data.

Web Scraping with Html.Table in Power Query

The older I get, the more consistently I encounter things that make me feel old. And it’s not just the usual things, like those damned kids on my lawn. It’s things like technology. And sometimes it’s things like new features in Power BI.

Let’s take, for example, the new “from web” experience that was introduced in Power BI Desktop over the summer, and which is generally available (out of preview) this month. This is a cool feature that at first glance looks pretty straightforward, but it caused me no end of problems.

I started off by reading the available information:

When it became clear that I wasn’t going to be able to do what I needed with only the UI, I started looking into the Power Query functions being used in the query created by the UI. The code in my PBIX file didn’t make any sense to me, so I look in more places, like:

And still, I couldn’t quite figure out how to solve my problem. I was frustrated and annoyed , not just because I couldn’t do what I needed, but because I was convinced that there was some simple, easy thing that I was missing.

And I was right.

This brings me back to feeling old.

When I was your age[1], we only had one HTML tag, and it was <marquee> and we were glad to have it. We didn’t need any of those fancy stylesheets that kids these days have cascading all over the place, because we had H1 and H2 and that was more than enough for any page we could think of.

All pretense aside, I didn’t know that CSS selectors existed before I started trying to use the Html.Table function in Power Query, and that was a real problem. The inline help mentioned CSS selectors in the function description, but without more prior knowledge and context, this wasn’t helpful to me. At all. So I asked for help.

One of the developers on the Power BI team took the time to listen to what I was trying to accomplish, and gave me the code that I’d been so consistently failing to write on my own[2]. Once I had something simple that worked, I was able – with a little more research and some reformatting – to build the queries that I needed, but I don’t think I could have done it without that help.

So, working under the assumption that I’m not the only old guy out there who knows data pretty well, but hasn’t kept up to date on web technologies like CSS, I wanted to share the story and the code.

Here’s the pattern I was trying to implement. This is a relatively common pattern in my experience, although the details are different each time:

  1. Extract a set of links and related data from a parent web page
  2. For each of the links extracted from the parent, extract a set of data from the child web page, where each of the child pages has a consistent format
  3. Use the data from the parent and the child pages to build a data set for reporting and analysis

The specific solution I needed to implement was this:

  1. Extract album name, release date, song names, and links to the individual song lyric pages, from this page: https://www.azlyrics.com/m/manowar.html
  2. Extract the lyrics from each individual song page, like this one: https://www.azlyrics.com/lyrics/manowar/fightingtheworld.html
  3. Perform mission-critical analysis on the lyrics.[3]

Let’s take a look at my learning process step by step.

I started off by using the UI, stepping through the new “From web” experience in Power BI Desktop. I knew that this wouldn’t give me everything I needed – you can’t get the URLs behind a link, you can only get the text – but I hoped it would give me a starting point.

from web

It gave me this:

Editor

I’m sharing this query as an image and not as text because I want to highlight one of the challenges I faced: the formatting. Because the Html.Table function accepts a complex parameter, I needed to “pretty format” it before I could visually parse it and compare the specific syntax to the available help, and to the source of the web page I was trying to use as a data source.

Even then, I was still stuck. There were not enough points of reference between the HTML and the query for me to relate them and to start modifying the query script. After fighting with this for longer than I care to admit, I asked for help. In return, I got this query:

let
    Source = Web.BrowserContents("https://www.azlyrics.com/m/manowar.html"),
    Custom1 = Html.Table(Source, {{"Song", "a"}, {"Song Link", "a", each [Attributes][href]}}, [RowSelector="div#listAlbum > a[href^='../lyrics/']"])
in
    Custom1

Before I could use it, I had to reformat it like this:

let
    Source = Web.BrowserContents("https://www.azlyrics.com/m/manowar.html"),
    Custom1 = Html.Table
        (
        Source,
            {
                {"Song", "a"},
                {"Song Link", "a", each [Attributes][href]}
            },
            [RowSelector="div#listAlbum > a[href^='../lyrics/']"]
        )
in
    Custom1

That was much better! With this code, formatted so it was clear how the various parts of the columnNameSelectorPairs parameter related to each other, I could finally make some progress on my own. In short order I had this query:

 let
    Source = Web.BrowserContents("https://www.azlyrics.com/m/manowar.html"),
    Custom1 = Html.Table
        (
        Source,
            {
                {"Song", "a"}
               ,{"Song Link", "a", each [Attributes][href]}
               ,{"Album", "div.album"}
            },
            [RowSelector="div#listAlbum > a[href^='../lyrics/'], .album , b"]
        ),
    #"Filled Down" = Table.FillDown(Custom1,{"Album"}),
    #"Inserted Text Between Delimiters" = Table.AddColumn(#"Filled Down", "Release Year", each Text.BetweenDelimiters([Album], "(", ")"), type text),
    #"Inserted Text Between Delimiters1" = Table.AddColumn(#"Inserted Text Between Delimiters", "Album Name", each Text.BetweenDelimiters([Album], """", """"), type text),
    #"Filtered Rows" = Table.SelectRows(#"Inserted Text Between Delimiters1", each [Song] <> null),
    #"Replaced Value" = Table.ReplaceValue(#"Filtered Rows","../","https://www.azlyrics.com/",Replacer.ReplaceText,{"Song Link"}),
    #"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Album"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Album Name", "Release Year", "Song", "Song Link"}),
    #"Invoked Custom Function" = Table.AddColumn(#"Reordered Columns", "Lyrics", each fn_GetLyrics([Song Link])),
    #"Expanded Lyrics" = Table.ExpandTableColumn(#"Invoked Custom Function", "Lyrics", {"Lyrics"}, {"Lyrics.1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Lyrics",{{"Lyrics.1", "Lyrics"}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns",{"Song Link"})
in
    #"Removed Columns1"

The function fn_GetLyrics uses the same technique to pull the actual lyrics from the child pages:

(lyricsPageURL) =>

let
   Source = Web.BrowserContents(lyricsPageURL),
   Custom1 = Html.Table
   (
      Source,
      {
         {"Lyrics", "div"}
      }

      ,[RowSelector="br+ div"]
   ),
   #"Kept First Rows" = Table.FirstN(Custom1,1)
in
   #"Kept First Rows"

And now I have the data I need – I just need to analyze it.

There were two additional tools that were invaluable to my success.

The first tool is the CSS selectors reference at W3Schools.com,  which provides the syntax for each CSS selector, an example of the syntax, and a description of each example. Just what I needed!

The second tool is the SelectorGadget Chrome extension, which  lets you click on elements on the page, and see the CSS selector that would select them. This may not have been an absolute requirement, but it definitely saved me time and swearing.

Hopefully this has been useful. I may be the only old dog out there trying to learn new tricks, but then again, I may not be. Ether way, get off my lawn!

 


[1] Please go back and read that in your best “crotchety old man” voice. Thank you.

[2] Thanks again, Ehren!

[3] Or maybe build a demo for the BI Power Session at the PASS Summit next month. It’s hard to tell the difference sometimes.

Power BI Release Notes: Help Us Help You

By now you probably know that the Microsoft Business Applications Group (BAG) publishes its upcoming plans in its release notes online: https://docs.microsoft.com/en-us/business-applications-release-notes/October18/index. You probably also know that the Power BI release notes are part of this site: https://docs.microsoft.com/en-us/business-applications-release-notes/October18/intelligence-platform/. You may even already know that the release notes are updated on a regular basis, that they’re not just a stale snapshot.

But did you know that you can help the Power BI team make the release notes better?

Take a look at the release notes.

2018-10-26_9-17-01.png

Look more closely. Look down… all the way down. Look here:

yes no

At the bottom of every page in the Power BI release notes[1] you can find a widget to provide feedback on whether or not the page is helpful. If you click “no” you’ll be prompted to share your ideas on how the page could be improved:

please share ideas.png

Even if you’ve noticed this feedback mechanism before, you probably haven’t used it. I say this because the number of suggestions received by the Power BI team is very small when compared to the number of people using Power BI, or even if compared to the number of people reading the documentation.

This could mean one of two things:

  1. The documentation is pretty much perfect, and it includes all of the information you were looking for. [2]
  2. You decided that this widget was generic and not specific to Power BI, or that it was a black hole, where no one really looked at the feedback anyway.

Based on the number of questions I hear each week about the Power BI roadmap, I’m tempted to suggest that option #2 is more likely. Fortunately, I’m here to let you know that this is not the case.

Every time you provide feedback on a page in the Power BI documentation – whether it is simply clicking “yes” or “no,” or if you take the time to provide more in-depth feedback, someone from the Power BI team reads it, and uses that feedback to improve the documentation.

If you’re looking for more information about what’s coming in Power BI, this is the best way to let the Power BI team know. For example, if a page in the release notes says something general about an upcoming feature, and you’re trying to understand if this includes something specific[3],  ask for this to be included in the page.

Please understand that if you’re asking general questions about upcoming capabilities that are not covered in the release notes, it generally means that the Power BI team isn’t ready to talk about it yet[4]. But if there is a topic already included in the release notes or other documentation and you would like it expanded or clarified, please do share your feedback. People are listening[5] – let them listen to you too.

 


[1] And every page in Microsoft’s online documentation, for that matter. Not just Power BI, and not just the release notes.

[2] Good job, Power BI team! Take the rest of the day off!

[3] Most of the questions I hear each week fall into this category.

[4] This is the reality of building software. The further out a feature is, the less certainty there can be about its details and timing. As a general rule, it’s better to say “we don’t know” or to not say anything than it is to say something with low confidences and then to not deliver as expected. We all know this, right?

[5] This blog post was inspired by a conversation I had last week. I was trying to figure out what I could do to help make the release notes better, because I was seeing a trend in questions that fall into the general scope of the release notes, but which could not be directly answered by the existing release notes content. Hearing that there was already a process in place to collect and act on feedback was a surprise to me, and the conversations I’ve had since then suggest that it may be a surprise to you as well. It could be that everyone but me already knew this, but I wanted to share this, just in case…

Dataflows in Power BI

If you’ve come to this blog to learn more about Power BI dataflows, here are all of the posts I’ve published to date, in the order they’re intended to be read.

First of all, here is a set of resources available elsewhere online. I strongly recommend watching the conference sessions, if for no other reason than they include hands-on demos that show the dataflows experience more dynamically than will the screen shots on this blog.

Start here:

Dataflows in Power BI: Resources

More Resources: Power BI Dataflows and Azure Data

Power BI Dataflows FAQ

Power BI dataflows and CDM Sessions from MBAS 2019

Next I have a series of posts designed to provide a meaningful overview of the concepts and capabilities delivered in Power BI dataflows. This series will likely grow over time, but as of today it has the content I intended to represent a viable “v1” release. if you read these posts, you’ll have a pretty good starting point for working with dataflows and understanding how they work.

Dataflows in Power BI: Overview Part 1 – Concepts

Dataflows in Power BI: Overview Part 2 – Hands-on

Dataflows in Power BI: Overview Part 3 – Premium

Dataflows in Power BI: Overview Part 4 – CDM Folders

Dataflows in Power BI: Overview Part 5 – Data Refresh

Dataflows in Power BI: Overview Part 6 – Linked and Computed Entities

Dataflows in Power BI: Overview Part 7 – External CDM Folders

Dataflows in Power BI: Overview Part 8 – Using an Organizational Azure Data Lake Resource

Dataflows in Power BI: Overview Part 9 – Lineage and Impact Analysis

Next, a few articles on how Power BI dataflows relate to other technologies, which is an emerging source of confusion during the public preview of dataflows.

Positioning Power BI Dataflows (Part 1)

Positioning Power BI Dataflows (Part 2)

Are Power BI Dataflows a Master Data Management Tool?

Understanding Power BI dataflows and datasets

Power BI dataflows and Azure Data Factory Data Flows

Next, patterns for applying Power BI dataflows as a tool to solve specific problems.

Power BI Dataflows and World Bank Climate Change Data

Power BI Dataflows and Data Profiling

Power BI Dataflows and Reuse without Premium

Power BI Dataflows and Data Profiling Without Premium

Power BI Dataflows and Slow Data Sources

Authoring Power BI Dataflows in Power BI Desktop

Using Custom Functions in Power BI Dataflows

Power BI Dataflows and Additional Data Sources

Lego Bricks and the Spectrum of Data Enrichment and Reuse

Unlimited dataflow refresh on Power BI Premium

Power BI dataflows enhanced compute engine

There’s also an occasional “quick tips” series that presents simple solutions to simple (but frequently encountered) challenges with Power BI dataflows and other aspects of Power BI development.

Quick Tip: Restricting access to linked entities in Power BI dataflows

Quick Tip: Power Query column by example

Quick Tip: Implementing “IsNumeric” in Power Query

Quick Tip: Creating “data workspaces” for dataflows and shared datasets

Quick Tip: Making self-documenting Power BI apps

Quick Tip: Factoring your dataflow entities

Quick Tip: Developing with large dataflows

Finally, resources from dataflows presentations that I have delivered for conferences and user groups.

Slides from SQL Saturday Victoria: Introduction to Power BI dataflows

Slides and recordings: Power BI dataflows and Azure Data Lake integration

Slides and recordings: Guidance for implementing managed self-service BI at enterprise scale

If you find this content valuable, or if you’re looking for something more, please let me know in the comments here, or on Twitter.

If you’re looking for help on using dataflows, or troubleshooting an error, please don’t post here or on Twitter. Instead, please post on the Power BI community forums, where your question and any answers will be seen by many more people, and can be more easily discovered.

And if dataflows are lacking a feature you need, please let the dataflows team know by submitting ideas or voting on ideas already submitted.

Dataflows in Power BI: Overview Part 6 – Linked and Computed Entities

If you’ve ever built an enterprise data warehouse or large-scale business intelligence solution, you understand the effort that goes into building the ETL / data preparation subsystem. Tools like SQL Server Integration Services, Azure Data Factory, and 3rd party tools like Informatica are powerful, but using them effectively to build a data pipeline requires specialized technical knowledge. They’re not tools that you would hand to a business analyst with expectations for success.

This is where Power Query really shines. Power Query provides a powerful, intuitive, and interactive data preparation experience that can be employed by business users as well as data professionals. Dataflows in Power BI expand the reach of Power Query, as the other posts in this series demonstrate. But Dataflows do more than that. Dataflows can help dramatically simplify the real complexity of data preparation.

Building a single ETL package can seem complex – regardless of the tools and technologies you use – but the true complexity comes from orchestrating the execution of tens or hundreds of packages. ETL orchestration logic must take into account many considerations, including:

  • Availability windows of source systems – each data source is already under load, and most enterprise sources have specific windows of availability where bulk processing is permitted
  • Dependencies between packages or groups of packages – it is common to have groups of packages related to a data source, a data domain, or s stage in the data preparation processing, and to have dependencies between them, where a dependent downstream set of packages should not start executing until the upstream packages have completed successfully.
  • Error handling – If one package fails to execute, the system as a whole should respond in a consistent and predictable manner to ensure that supports the needs of the system and of the business processes in which the data is used.
  • Much, much more – It’s been 10 years since I built an enterprise ETL system myself, and my memory is exactly what it used to be[1], so I’m sure I’ve forgotten more considerations than I remember.

Most ETL tools will include capabilities to define the orchestration logic within the scope of the tool[2], and some will provide APIs so users can define that logic outside of the tool. But no matter where you define the orchestration logic, getting it right is difficult, and complex.

But what if it wasn’t?

Consider if you will Microsoft Excel. Excel is not a data preparation tool in the context in which we’re discussing data preparation, but it is a data tool, and it has many characteristics that would be valuable in a data preparation tool.

Consider a formula in a cell in an Excel worksheet. The formula defines where the data in the cell comes from. The formula references other cells and ranges in the worksheet, or in other worksheets, and applies logic to the values in those cells and ranges to calculate the cell’s value. The cell’s data lineage is self-describing, because the only way that the data in the cell is updated is through the execution of the formula. And because all of the formulas are managed by Excel, Excel can understand the dependencies between them and ensure that when any value is updated, all formulas that reference it – directly or indirectly – are evaluated and their values automatically updated.[3]

Wouldn’t it be awesome[4] if data preparation tools were like Excel?

Power BI dataflows are like Excel.

Let’s break down that bold statement a little.

  • An entity in a dataflow is like a cell in an worksheet. The value of the entity is the data set that is produced by the query that defines the entity. The query is the entity’s formula.
  • Every dataflow entity is self-describing in its lineage. The query that defines the entity’s structure also authoritatively defines where the data in the entity comes from, and how it is transformed.
  • The only way that the data in an entity is updated is through the execution of the entity’s query.
  • All dataflows and entities are managed by the Power BI service, so the Power BI service understands all of the relationships and dependencies between them.

And because of this, with linked entities, when upstream dataflow is refreshed, downstream dependent dataflows can automatically refresh without the need for any explicit orchestration logic to be defined.

If you’ve ever seen one of my presentations on Power BI dataflows, this is the part where I get excited[5].  Let’s get you excited too. Let’s look at linked entities.

Linked entities are entities in one Power BI dataflow that reference entities in another dataflow. Before I can create linked entities, I need to have entities in a dataflow that I can reference. For this example, I’m pulling in two data sets from the UK NHS related to healthcare sites and food quality.

01 - Source Entities.png

02 - New Dataflow

And once I’ve created my dataflow, I’ll manually refresh it so that it contains data.

03 - Refreshing the dataflow

I should point out two things in these screen shots that may not seem important at first glance:

  • The workspace is backed by Power BI Premium dedicated capacity
  • The workspace is a new “v2” workspace, not a “v1” workspace tied to an Office 365 group

Both of these criteria must be met for linked entities to be used.

Now that I have the source dataflow created, I can create the dataflow for the linked entities.

04 - Create new dataflow

To get started with the dataflow, instead of selecting “Add new entities” like I’ve done before, I will select “Add linked entities” when prompted.

05 - Add linked entities

Linked entities are their own data source, so I’m prompted to authenticate, using my AAD organizational account to sign in to the Power BI service.

06 - Authenticate

Once I’ve signed in, I can choose from the available dataflows entities. Although this may not be obvious from the screen shot below[6],  the I can only select from “v2” workspaces on Premium capacity. There are additional workspaces in my Power BI tenant that I have permission to access, and which contain dataflows, which do not appear in this list.

07 - Selecting entities.png

Once I’ve selected the entities to which I want to link, I’m taken to Power Query editor.

You may be asking why the “Applied steps pane” shows multiple query steps, even though no transformations are applied to linked queries. In case you aren’t asking this, I’ve thoughtfully drawn a red arrow. See?

08 - Linked Entities Created.png

If you look at the Advanced editor for a linked entity, you’ll see that the query uses PowerBI.Dataflows as its source, and then navigates to the workspace, the dataflow, and the entity in order to create the linked entity. The linked entity will be read-only in this workspace.

09 - Advanced Editor

Now that I have two linked entities in my dataflow, I can use them as the starting point for additional data preparation and mashup. Keep in mind that this is a very simple example, with two linked entities coming from one dataflow. A more realistic scenario might involve linking to entities in multiple dataflows from multiple workspaces. It all depends on your scenario and requirements – Power BI will make the data easy to discover and navigate, so you can get started quickly and easily.

For this example, I’m going to merge these two linked entities to create a new computed entity.

10 - Merge as new.png

11 - Merge

Once the merge is complete and the new entity is created, I can edit it further. But first, there are a few things I should point out.

12 - Merged query with warning.png

First, notice the icons in the query list.

The first two have a “link” icon superimposed on the query icon. This shows that these are linked entities, and that they are read-only “pointers” to entities defined and stored elsewhere. The data for linked entities remains in the CDM Folder backing the source dataflow; only the pointer exists in this dataflow, and no data is duplicated.

The third entity has a “lightning bolt” icon that shows that this is “the entity who lived” after surviving the dark lord’s killing curse. Wait, no, that’s not right. Wrong lightning bolt. This icon shows that this is a computed entity. Computed entities are entities that reference linked entities, and which rely on in-storage calculations for performance and scale. Unlike “normal” entities[7] which use the Power Query M engine to extract data from external data sources, and to load it into CDM Folders in Azure Data Lake Storage gen2. For a computed entity, the source and destination are both CDM Folders managed by the Power BI service, which allows the compute to take place close to the storage.

Second, notice the warning at the bottom of the editor. Because I’ve just merged two queries, one of the columns in the resulting new query represents the right table for the merge. This complex type isn’t supported by CDM Folders, I will need to expand the column before I proceed. This is great, because I wanted to do that anyway.

13 - Expand food quality.png

See? Now the warning is gone!

14 - Finished query, no warning.png

Now I can click “Done” to save my changes and return to the entity list for the dataflow.

15 - Linked and Computed Entities

Notice how the “link” and “lightning” icons are retained here as well, and how the two linked entities are explicitly labeled and include a link to the source dataflow.

The last thing we need to do is save the dataflow that includes our linked and computed entities.

16 - Saving

Now we just need to refresh, to populate the computed entity with data.

But we do not need to refresh the dataflow that contains the linked and computed entities. That would be silly, because dataflows are like Excel, remember?

In an Excel workbook, what do you change if you want the value of a cell with a formula to change? That’s right – you change a cell referenced by the formula, and formula will automatically update value of its cell. And that’s what we’re going to do here.

We’re going to refresh the dataflow  that contains the “normal” entities that is referenced by the linked entities. If the queries behind linked and computed entities are like an Excel formula, “normal” entities are like the data entry cells in the Excel worksheet. Refreshing them is like entering new data into those cells – it is how new data enters the system.

So let’s refresh that source dataflow.

17 - Refreshing source dataflow

And as soon as its processing is complete, the Power BI service will automatically refresh the dependent dataflow.

18 - Magic

How does it know? The Power BI service includes a calculation engine that understands the relationships between all entities in all dataflows in all workspaces in the tenant. Just like Excel understands the relationships between cells, Power BI understands the relationships between entities. And just as Excel will automatically run all formulas that use a value that has changed, Power BI will automatically refresh the dataflows that contain linked and computed entities when the dataflows that they reference have been updated.

Wow, this post ended up being longer than I’d expected. Remember how I mentioned that this is the part of Power BI dataflows that I’m most excited about? Do you believe me now?[9]

 


[1] I  think. I’m pretty sure. Maybe.

[2] For example, control flow in SSIS.

[3] Were you rolling your eyes when you read this, wondering why I would go into so much detail on a topic that everyone has taken for granted for decades? Of course Excel does these things – that’s just how it works… right?

[4] Awesome, game-changing, magical, etc.

[5] Even more excited. I’m excited by default when I’m talking about dataflows.

[6] Unless you’ve been paying really close attention to the screen shots in other posts.

[7] We really need a name for entities that are neither linked nor computed, but lacking one, I’ll call them “normal” entities. Suggestions are welcome.

[8] Not that these suggestions will make their way into the product, of course, but they’re still welcome. If nothing else, receiving suggestions will let me know that people read the footnotes.

[9] Maybe you shouldn’t believe me just yet. Dataflows are big, really big, and there’s a lot coming that we’re not talking about yet. You’ll need to stay tuned.