Viral adoption: Self-service BI and COVID-19

I live 2.6 miles (4.2 km) from the epicenter of the coronavirus outbreak in Washington state. You know, the nursing home that’s been in the news, where over 10 people have died, and dozens more are infected.[1]

As you can imagine, this has started me thinking about self-service BI.

2020-03-10-17-44-57-439--msedge
Where can I find information I can trust?[2]
When the news started to come out covering the US outbreak, there was something I immediately noticed: authoritative information was very difficult to find. Here’s a quote from that last link.

This escalation “raises our level of concern about the immediate threat of COVID-19 for certain communities,” Dr. Nancy Messonnier, director of the CDC’s National Center for Immunization and Respiratory Diseases, said in the briefing. Still, the risk to the general public not in these areas is considered to be low, she said.

That’s great, but what about the general public in these areas?

What about me and my family?

When most of what I saw on Twitter was people making jokes about Jira tickets[3], I was trying to figure out what was going on, and what I needed to do. What actions should I take to stay safe? What actions were unnecessary or unhelpful?

Before I could answer these questions, I needed to find sources of information. This was surprisingly difficult.

Specifically, I needed to find sources of information that I could trust. There was already a surge in misinformation, some of it presumably well-intentioned, and some from deliberately malicious actors. I needed to explore, validate, confirm, cross-check, act, and repeat. And I was doing this while everyone around me seemed to be treating the emerging pandemic as a joke or a curiosity.

I did this work and made my decisions because I was a highly-motivated stakeholder, while others in otherwise similar positions were farther away from the problem, and were naturally less motivated at the time.[4]

And this is what got me thinking about self-service BI.

In many organizations, self-service BI tools like Power BI will spread virally. A highly-motivated business user will find a tool, find some data, explore, iterate, refine, and repeat. They will work with untrusted – and sometimes untrustworthy – data sources to find the information they need to use, and to make the decisions they need to make. And they do it before people in similar positions are motivated enough to act.

But before long, scraping together whatever data is available isn’t enough anymore. As the number of users relying on the insights being produced increases – even if the insights are being produced by a self-service BI solution – the need for trusted data increases as well.

Where an individual might successfully use disparate unmanaged sources successfully, a population needs a trusted source of truth.

At some point a central authority needs to step up, to make available the data that can serve as that single source of truth. This is easier said than done[5], but it must be done. And this isn’t even the hard part.

The hard part is getting everyone to stop using the unofficial and untrusted sources that they’ve been using to make decisions, and to use the trusted source instead. This is difficult because these users are invested in their current sources, and believe that they are good enough. They may not be ideal, but they work, right? They got me this far, so why should I have to stop using them just because someone says so?

This brings me back to those malicious actors mentioned earlier. Why would someone deliberately share false information about public health issues when lies could potentially cost people their lives? They would do it when the lies would help forward an agenda they value more than they value other people’s lives.

In most business situations, lives aren’t at stake, but people still have their own agendas. I’ve often seen situations where the lack of a single source of truth allows stakeholders to present their own numbers, skewed to make their efforts look more successful than they actually are. Some people don’t want to have to rebuild their reports – but some people want to use falsified numbers so they can get a promotion, or a bonus, or a raise.

Regardless of the reason for using untrusted sources, their use is damaging and should be reduced and eliminated. This is true of business data and analytics, and it is true of the current global health crisis. In both arenas, let’s all be part of the solution, not part of the problem.

Let us be a part of the cure, never part of the plague – we’ll only be remembered for what we create.[6]


[1] Before you ask, yes, my family and I are healthy and well. I’ve been working from home for over a week now, which is a nice silver lining; I have a small but comfortable home office, and can avoid the obnoxious Seattle-area commute.

[2] This article is the best single source I know of. It’s not authoritative source for the subject, but it is aggregating and citing authoritative sources and presenting their information in a form closer to the solution domain than to the problem domain.

[3] This is why I’ve been practicing social media distancing.

[4] This is the where the “personal pandemic parable” part of the blog post ends. From here on it’s all about SSBI. If you’re actually curious, I erred on the side of caution and started working from home and avoiding crowds before it was recommended or mandated. I still don’t know if all of the actions I’ve taken were necessary, but I’m glad I took them and I hope you all stay safe as well.

[5] As anyone who has ever implemented a single source of truth for any non-trivial data domain can attest.

[6] You can enjoy the lyrics even if Kreator’s awesome music isn’t to your taste.

Grandfather, tell me a story!

Are you delivering a technical presentation? Don’t tell me about capabilities.

Don’t tell me about products, or features, or tools.

Tell me a story that I can relate to[1].

2020-06-12-13-57-57-323--msedge
I promise this photo is very relevant to every technical presentation

Tell me a story about someone who struggled, and tell the story in a way that I can relate to the struggling character. When you tell me a story, you immediately have my attention, because my human brain is optimized for learning from stories[2].

Tell me a story about a problem, and the pain that problem has caused. The pain makes it real.

Now tell me about the solution. You already have my attention, and my emotional engagement. It’s time to take advantage of this fact.

Tell me about how your products, features, tools, and capabilities were used to solve the problem, and to eliminate the pain.

Why did the person choose a specific tool or feature? How did she use that capability, and what impact did it make? How did the person’s life change and improve because of the actions she took?

When you tell me a story instead of telling me about features, you probably won’t get to mention every feature you want. That’s OK.

If you tell me a story you might only be able to mention 5 or 10 features instead of the 50 or 100 features you wanted to mention. That’s OK, because I’ll remember the impact those few features made, and I’ll know how they can help me.

If you’d talked about all of the features, I probably would not have remembered any of them – unless I already had my own story and could connect the dots myself.

It’s the story that makes the message stick. I wish every technical presenter remembered this. Maybe I should tell them a story…


[1] No, no, not one of those – a real story!

[2] A quick search will yield many results that highlight the value and importance of stories for learning. This blog post presents an excellent summary.

 

Web Scraping with Html.Table in Power Query

The older I get, the more consistently I encounter things that make me feel old. And it’s not just the usual things, like those damned kids on my lawn. It’s things like technology. And sometimes it’s things like new features in Power BI.

Let’s take, for example, the new “from web” experience that was introduced in Power BI Desktop over the summer, and which is generally available (out of preview) this month. This is a cool feature that at first glance looks pretty straightforward, but it caused me no end of problems.

I started off by reading the available information:

When it became clear that I wasn’t going to be able to do what I needed with only the UI, I started looking into the Power Query functions being used in the query created by the UI. The code in my PBIX file didn’t make any sense to me, so I look in more places, like:

And still, I couldn’t quite figure out how to solve my problem. I was frustrated and annoyed , not just because I couldn’t do what I needed, but because I was convinced that there was some simple, easy thing that I was missing.

And I was right.

This brings me back to feeling old.

When I was your age[1], we only had one HTML tag, and it was <marquee> and we were glad to have it. We didn’t need any of those fancy stylesheets that kids these days have cascading all over the place, because we had H1 and H2 and that was more than enough for any page we could think of.

All pretense aside, I didn’t know that CSS selectors existed before I started trying to use the Html.Table function in Power Query, and that was a real problem. The inline help mentioned CSS selectors in the function description, but without more prior knowledge and context, this wasn’t helpful to me. At all. So I asked for help.

One of the developers on the Power BI team took the time to listen to what I was trying to accomplish, and gave me the code that I’d been so consistently failing to write on my own[2]. Once I had something simple that worked, I was able – with a little more research and some reformatting – to build the queries that I needed, but I don’t think I could have done it without that help.

So, working under the assumption that I’m not the only old guy out there who knows data pretty well, but hasn’t kept up to date on web technologies like CSS, I wanted to share the story and the code.

Here’s the pattern I was trying to implement. This is a relatively common pattern in my experience, although the details are different each time:

  1. Extract a set of links and related data from a parent web page
  2. For each of the links extracted from the parent, extract a set of data from the child web page, where each of the child pages has a consistent format
  3. Use the data from the parent and the child pages to build a data set for reporting and analysis

The specific solution I needed to implement was this:

  1. Extract album name, release date, song names, and links to the individual song lyric pages, from this page: https://www.azlyrics.com/m/manowar.html
  2. Extract the lyrics from each individual song page, like this one: https://www.azlyrics.com/lyrics/manowar/fightingtheworld.html
  3. Perform mission-critical analysis on the lyrics.[3]

Let’s take a look at my learning process step by step.

I started off by using the UI, stepping through the new “From web” experience in Power BI Desktop. I knew that this wouldn’t give me everything I needed – you can’t get the URLs behind a link, you can only get the text – but I hoped it would give me a starting point.

from web

It gave me this:

Editor

I’m sharing this query as an image and not as text because I want to highlight one of the challenges I faced: the formatting. Because the Html.Table function accepts a complex parameter, I needed to “pretty format” it before I could visually parse it and compare the specific syntax to the available help, and to the source of the web page I was trying to use as a data source.

Even then, I was still stuck. There were not enough points of reference between the HTML and the query for me to relate them and to start modifying the query script. After fighting with this for longer than I care to admit, I asked for help. In return, I got this query:

let
    Source = Web.BrowserContents("https://www.azlyrics.com/m/manowar.html"),
    Custom1 = Html.Table(Source, {{"Song", "a"}, {"Song Link", "a", each [Attributes][href]}}, [RowSelector="div#listAlbum > a[href^='../lyrics/']"])
in
    Custom1

Before I could use it, I had to reformat it like this:

let
    Source = Web.BrowserContents("https://www.azlyrics.com/m/manowar.html"),
    Custom1 = Html.Table
        (
        Source,
            {
                {"Song", "a"},
                {"Song Link", "a", each [Attributes][href]}
            },
            [RowSelector="div#listAlbum > a[href^='../lyrics/']"]
        )
in
    Custom1

That was much better! With this code, formatted so it was clear how the various parts of the columnNameSelectorPairs parameter related to each other, I could finally make some progress on my own. In short order I had this query:

 let
    Source = Web.BrowserContents("https://www.azlyrics.com/m/manowar.html"),
    Custom1 = Html.Table
        (
        Source,
            {
                {"Song", "a"}
               ,{"Song Link", "a", each [Attributes][href]}
               ,{"Album", "div.album"}
            },
            [RowSelector="div#listAlbum > a[href^='../lyrics/'], .album , b"]
        ),
    #"Filled Down" = Table.FillDown(Custom1,{"Album"}),
    #"Inserted Text Between Delimiters" = Table.AddColumn(#"Filled Down", "Release Year", each Text.BetweenDelimiters([Album], "(", ")"), type text),
    #"Inserted Text Between Delimiters1" = Table.AddColumn(#"Inserted Text Between Delimiters", "Album Name", each Text.BetweenDelimiters([Album], """", """"), type text),
    #"Filtered Rows" = Table.SelectRows(#"Inserted Text Between Delimiters1", each [Song] <> null),
    #"Replaced Value" = Table.ReplaceValue(#"Filtered Rows","../","https://www.azlyrics.com/",Replacer.ReplaceText,{"Song Link"}),
    #"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Album"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Album Name", "Release Year", "Song", "Song Link"}),
    #"Invoked Custom Function" = Table.AddColumn(#"Reordered Columns", "Lyrics", each fn_GetLyrics([Song Link])),
    #"Expanded Lyrics" = Table.ExpandTableColumn(#"Invoked Custom Function", "Lyrics", {"Lyrics"}, {"Lyrics.1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Lyrics",{{"Lyrics.1", "Lyrics"}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns",{"Song Link"})
in
    #"Removed Columns1"

The function fn_GetLyrics uses the same technique to pull the actual lyrics from the child pages:

(lyricsPageURL) =>

let
   Source = Web.BrowserContents(lyricsPageURL),
   Custom1 = Html.Table
   (
      Source,
      {
         {"Lyrics", "div"}
      }

      ,[RowSelector="br+ div"]
   ),
   #"Kept First Rows" = Table.FirstN(Custom1,1)
in
   #"Kept First Rows"

And now I have the data I need – I just need to analyze it.

There were two additional tools that were invaluable to my success.

The first tool is the CSS selectors reference at W3Schools.com,  which provides the syntax for each CSS selector, an example of the syntax, and a description of each example. Just what I needed!

The second tool is the SelectorGadget Chrome extension, which  lets you click on elements on the page, and see the CSS selector that would select them. This may not have been an absolute requirement, but it definitely saved me time and swearing.

Hopefully this has been useful. I may be the only old dog out there trying to learn new tricks, but then again, I may not be. Ether way, get off my lawn!

 


[1] Please go back and read that in your best “crotchety old man” voice. Thank you.

[2] Thanks again, Ehren!

[3] Or maybe build a demo for the BI Power Session at the PASS Summit next month. It’s hard to tell the difference sometimes.