Analyzing Plex Media Server Data using Power BI

Around a year ago I left Spotify and started using my locally hosted Plex Media Server to feed my daily music consumption. Since I spent much of the last 30+ years building an extensive CD and MP3 collection, the transition was surprisingly smooth, and the excellent Plexamp client app feels like an upgrade over Spotify in many ways.

Now it’s time to start digging into the data that my year of music listening has produced.

I blogged last year about how I report on my Spotify listening history. It’s a bit of a hack because Spotify doesn’t have a database or API you can use, so you need to make a GDPR request and wait a few weeks to get a JSON archive… but I made it work. Plex uses a SQLite database to keep track of its data and metadata, and it’s relatively easy to start using it as a data source for your own Power BI reports. Since this is a largely undocumented transactional database[1] I’m still figuring out some important details and locating important data points, but I wanted to share what I’ve learned.

Here’s how to get started.

Installing Plex, setting up libraries, listening

For this post to be useful to you, you need to run a Plex media server and use it to listen to your music collection[2]. Head over to plex.tv, download the software, install it, set up your libraries, and start listening to the music you love.

This is all well-documented by the fine folks at Plex, so I won’t share any additional details here.

Download a database backup

As mentioned above, Plex uses SQLite as its database, and the Plex web UI provides an option for downloading a copy of the database file. Although it may be possible to report directly on the “live” database, I’m erring on the side of caution and using Power BI to connect to backup files.

Here’s how to download the database:

  1. Open the Plex web UI at https://app.plex.tv/ and sign in to your server
  2. In the upper right, click on the “wrench” icon to open the settings
  3. In the lower right, select “Troubleshooting” in the “Manage” category of the settings options
  4. On the troubleshooting settings page, click the “Download Database” button

After a few seconds your browser will download a zip file containing a SQLite database file with a long system-generated file name.

Copy this file to folder and give it a more useful name for my project the database file is \Plex DB\plex.db.

SQLite driver and ODBC DSN

Power BI doesn’t have a native SQLite connector, but SQLite does have ODBC drivers. It looks like there are multiple options – I used this one mainly because it was the first one I found.

Once you’ve downloaded and installed the ODBC driver, create a DSN that points to the database file you downloaded earlier. The connection string is just the file path.

With the DSN created, you can start working in Power BI Desktop. Create a new PBIX, select ODBC from the Get Data dialog, and select the DSN.

Click OK and you’re ready to go.

Key Plex database objects

As mentioned earlier, the Plex database is largely undocumented. It has a somewhat normalized schema, which is slowing my exploration and discovery. When a table is mostly integer IDs and obscurely-named values, exploration can feel more like archaeology.

With that said, there are two tables that are probably going to be most useful: metadata_items and metadata_item_views.

The metadata_items table contains one row for each item (track, album, artist, episode, season, show, etc.) in your Plex media library. This table implements an implied three-tier hierarchy by including an id column and a parent_id column that define a relationship[3] between metadata items and the items that contain them.

The metadata_items table also includes vital data such as the title (track name, album name, artist name, etc.) of each item as well as index (track number on the album, episode number in the season, etc.) and such. Not all fields are used for all metadata item types, so you’ll see a lot more nulls than you might like.

The metadata_item_views table contains one row for each time a user has watched a video, listened to a track, and so on. This table includes the titles for the viewed item, as well as the parent and grandparent items, so you can easily get the track, album, and artist without needing to join other tables. You can join with  the metadata_items table if you need additional details like the album release year or the record label.

The metadata_item_views table includes a viewed_at column to track when a given item was viewed. This column is stored as a Unix timestamp, so you’ll need to convert it to datetime before you can start working with it. Plex tracks times using UTC, so if you want to work with data in a different time zone you’ll need to handle that in your PBIX.

Refreshing the data

Since your PBIX is connected to a database backup, it won’t automatically show your current library contents and listening history. If you want more current data, just download a new database backup and replace your plex.db (or whatever you named it) file with the new one. The next time you refresh your report, you’ll have new data.

I don’t have a use case for “real time” Plex reporting, so this approach works for me. It’s certainly easier and more timely than the hoops I had to jump through to “refresh” my Spotify data.

Next steps

I’m still just starting to get my hands dirty with this fun data source, so my next steps are mainly to keep exploring. Using my Spotify listening report as a target, I’m planning to duplicate that report and its insights with Plex data. Ideally I’ll have one report where I can see my legacy listening history in Spotify and my ongoing listening history in Plex… but that might be a few months away.

If you’re using Plex and have been thinking about exploring your data – or if you’re already doing this – I’d love to hear from you in the comments about your experiences.


 

[1] To the best of my knowledge after a year or more of occasionally googling to answer my questions, there is no official documentation for the database. The Plex community support forums have a few bits and pieces here and there, but most of what I’ve included in this post is the result of my own exploration. Honestly, it’s a real joy to have a side project with this type of work involved.

[2] Plex also has great features for managing and streaming TV shows, movies, and photos – it’s not just about music. This blog post is going to focus on the music side of things because that’s what’s most interesting to me today, but there’s a lot more to love in Plex.

[3] It’s worth noting that there are no “real” FK > PK relationships in the database, or if they are they’re not exposed through the client tools I’ve used. To figure out where the data relationships exist you need to explore the data.

Roche’s Maxim of Community

Roche’s Maxim of Community states:

A community is defined by the behaviors it tolerates.

I played the “maxim” card already a few months ago, I think it’s time to play it one more time, because this is another succinct formulation of a fundamental principle, general truth, or rule of conduct that I need to share[1].

An AI-generated image of a diverse group of people attending a conference

As you read this you might be thinking about the recent change in leadership at Twitter, and how the new leadership is inviting in and promoting prominent neo-Nazis, insurrectionist leaders, and other extremist, anti-democracy, and authoritarian figures, and how more and more members of the technical community are turning their backs on this now-sullied social network. I’m thinking about this too, but I’m also thinking more broadly because any maxim needs to be broadly and generally applicable.

So let’s start with a few examples. If you’ve made it this far I hope you stick with me for the rest.[3]

I’m part of of the global Historical European Martial Arts (HEMA) community. I love training and fighting with swords, and I love the camaraderie that can be found through martial arts and combat sports. When your community activities bring with them a direct risk of serious injury, it’s important to be able to trust your training partner, your sparring partner, and your tournament opponent to keep your safety in mind.

Back in 2017 I competed in a tournament organized and by a HEMA club with an established “Cobra Kai” reputation. I attended with a clubmate who is a fantastic swordsman and all-around martial artist. I was competing in open tournaments. He was competing in the advanced invitational tournament that included the head coach of the organizing club. The invitational tournament was also refereed and directed by members of that club. Yeah.

During the match between my clubmate and the coach of the club running the tournament, that coach exhibited the unsafe and unsportsmanlike behavior that earned him and his club their reputation.

I was in my clubmate’s corner both coaching and taking video. At the end of the day I shared the video[4], and it started making the rounds on HEMA groups on Facebook. Even though “sharing tournament footage so fighters and clubmates watch and learn” is pretty standard behavior, this one caused some waves and started some difficult conversations. I don’t believe that the video showed anything surprising – this was the behavior everyone knew about already, and which they talked about behind closed doors. The video simply shone light on it, and made it harder to ignore, or to pretend you didn’t know.

I started avoiding events where this club was involved. I didn’t make a big deal about it, but when people asked me why I wasn’t going back to a tournament where I’d competed the previous year, I was honest in sharing my reasons. Fast forward to the HEMA event circuit starting back up in 2021, this club is no longer involved in multiple tournaments that they previously ran. Several of their most senior and prominent members have left the club, and started their own.

I’m a pretty small part of the HEMA community, so I doubt my personal actions had that big of an impact – but I know they were part of this positive change. I saw behavior I would not tolerate and I voted with my feet. Others did the same, and the community evolved and grew.

Progressive and inclusive HEMA clubs like Valkyrie Western Martial Arts in Vancouver, BC and London Longsword Academy in London, UK demonstrate this maxim in a positive and proactive way. I’ve already written previously about the aggressive inclusivity that pervades Valkyrie’s culture, so here I’ll focus on a few aspects of London Longsword Academy[5] and their Fighters Against Racism (FAR) initiative.

As the FAR page says, Fighters Against Racism is a reaction to the “unpleasant element” in the HEMA community. If a club displays a FAR poster or banner, people who are offended by anti-racist statements will feel unwelcome – which they are. If a fighter wears a FAR t-shirt or patch to a tournament, they’re sending a clear message to everyone there what they believe, and that they are confident that this belief will be accepted and welcome at the event. My FAR t-shirt[6] is pretty successful at attracting YouTube comments, but so far I’ve never been confronted in person.

A community is defined by the behaviors it tolerates.

If you’re reading this blog (and I have every reason to believe that you are) you’re probably more interested in data and technology than in swords. My next example is more recent, where in August of last year[7] there was an online discussion about the need for codes of conduct at conferences and community events. My tweets have since been deleted, so I’ll reproduce the thread here, starting with the same warning: foul language later on.

The phrase “doesn’t need saying” is a red flag for me when it comes to conversations about communities, events, and codes of conduct.

If it “doesn’t need saying” that certain behaviors are not acceptable, there is no need to not say it, because no one will be offended or feel targeted, right?

For example (here comes that foul language) it probably doesn’t feel necessary to have a “no shitting on the carpet” rule, because who would ever do such a thing? This just “doesn’t need saying,” does it?

But if there was ever one time when someone did shit on the carpet and no one did anything about it, you might start to reconsider. Maybe we do need to be explicit in saying “no shitting on the carpet” for the next event.

What if it wasn’t just one time? What if after your event people were talking about how someone shit on the carpet… they just weren’t talking about it in front of you because your inaction signaled that you were ok with that sort of thing? Might a code of conduct be in order?

Or would it be better to say “I didn’t see the shit on my carpet” or “I couldn’t smell it from where I was” or maybe even “it was just that one time, and the person who did shit on the carpet is an important member of the community”?

What if the idea of a code of conduct feels like a slippery slope? If we tell people they can’t shit on the carpet, what is next? Telling them they can’t shit on the hardwood floors? On the tiles? Telling them they can’t piss on the carpet? Where does it all end???

A community is defined by the behaviors it tolerates. If a community organizer says “this problem isn’t important” that sends a very strong signal – and not a virtuous one. It says “this behavior is ok and I will permit it to continue.”

A community is defined by the behaviors it tolerates. If a community organizer says “this problem doesn’t affect me, so it isn’t a real problem” that sends a strong signal too.

A community is defined by the behaviors it tolerates. If a community organizer says no, this is not ok, this is our code of conduct, and these are things that we simply will not accept, that sends a signal. A virtuous one.

This “virtue signaling” will be heard by the people who have a history of (literally or figuratively) shitting the the carpet, and they’re not going to like it. They might stop participating, and they might make trouble and noise on the way out.

Because the reality is that far too many events in far too many communities have allowed shit on the carpet for far too many years. We’ve just pretended that it’s not a problem, because the people doing the shitting are well-known for other things, and because the people who stepped in the shit generally left in disgust without saying anything. Does it really need saying that you’re not allowed to shit on the carpet? Yes. It does. Your code of conduct is an explicit statement about the behaviors you will tolerate.

You don’t need to be perfect, but being willing to change is a prerequisite to positive change. Admitting that there is a problem is the first step to solving the problem. FFS. /thread

When I wrote that thread in 2021 I was thinking about that HEMA tournament from 2017, where the community event organizers were complicit. Thankfully, the thread was in response to an announcement by the organizers of Data Grillen, DataMinutes, and New Stars of Data posting about their code of conduct. These folks (in this case it was the always incredible William Durkin) know what type of community they want to build, and they were unwilling to back down when a few dudes[8] took umbrage with the news.

Earlier this year, renowned C++ developer, consultant, and speaker Patricia Aas stood up and spoke out when her community continued to welcome an organizer and speaker who was convicted of serious sexual crimes. She made it clear that her community’s milquetoast response was intolerable to her, and she made her voice heard, even though it was likely that this would impact her ability to participate in a community she had helped grow.

Warning: There’s some additional profanity ahead.

I mention this example not because I was personally involved, but because it feels like a concrete canonical example of so many similar stories. There’s some guy. He’s been around forever, and has made serious positive contributions to the community. He’s use these contributions to build a clique, a power base, a center of social gravity and influence that’s about him, not about the community.

When it comes to light that he’s using his power and influence for ends that run contrary to the community’s stated goals and culture his clique stands up for him, often supported by the actions or inactions of those not directly harmed by the guy’s actions.

“Think about all that he’s done for the community,” comes the familiar refrain. “Think about how much good he’s done, and how much we’d lose if not for his contributions.”

Fuck that. Fuck that, fuck him, and fuck his supporters.[9]

Instead, think about how much the community has already lost and will continue to lose because of him. Everyone he has harmed, everyone he has driven away from the community, everyone who has considered joining the community but was repulsed by his behavior and the community’s acceptance of his actions. Think about what each of these people could have contributed.

A community is defined by the behaviors it tolerates. This is why codes of conduct are important.

A community is defined by the behaviors it tolerates. This is why it is important to speak up when you see someone behaving in ways that violate the community’s written or unwritten rules.

A community is defined by the behaviors it tolerates. This is why it’s important to hold the powerful to account, and to hold them to a standard that is at least as high as the one by which the powerless are judged.

A community is defined by the behaviors it tolerates. Your participation in a community is a statement of your acceptance of the behaviors taking place in that community. It doesn’t matter if you weren’t the abuser, if you weren’t directly causing harm, if you weren’t actively shitting on the carpet yourself.

A community is defined by the behaviors it tolerates. Your participation in a community is a statement of your acceptance of the behaviors taking place in that community. If you’re at a party and someone walks in wearing blackface, wearing a white hood, wearing a swastika armband, you get to choose if you leave or if you stay. But if someone wearing a swastika armband joins the party[10] and you do decide to stay because they’re not personally bothering you – congratulations, you chose to attend a Nazi party.

A community is defined by the behaviors it tolerates. What behaviors do you choose to tolerate in your community? Does your community reflect your values or is it time to leave and join/start something better before you get shit on your shoes?

Wow. This ended up much longer than expected. Thank you for staying with me until the end. As is the case with Roche’s Maxim of Data Transformation there’s not really anything new or unique here. I’m saying something you probably already knew, and which was completely obvious once written down – and I took 100 pages to say it. Despite this, I believe it’s important to put in writing, and important to think and talk about, especially these days.

A community is defined by the behaviors it tolerates. Why not find me on Mastodon?


[1] I also haven’t published a blog post since July. The more technical topics I’ve been fermenting aren’t quite ready to be served, and I don’t want the whole month to pass[2] without finishing something.

[2] LOL @ Past Matthew who wrote footnote 1 in August 2021 (and who wrote footnote 2 in December 2021).

[3] Each example is based largely on my personal real-world experience, but deliberately avoids any names or details that would be obvious to people who weren’t already involved. I believe there is a time and a place to “name and shame” but I don’t believe this post is the right place, and I don’t want to encourage any sort of pile-on.

[4] I’m not going to link to the video here, but if you’re motivated you can find it on my personal YouTube channel. It’s the one with the very high view count and with comments disabled.

[5] London Longsword Academy is run by my dear friend David Rawlings, who often reminds us that he is “not straight, but still a great ruler.”

[6] Yes, you can buy one too.

[7] Yes, way back when I started writing this post.

[8]  It’s always dudes, isn’t it? Do better, my dudes

[9] That was the profanity I warned you about. I’m done now. Ok, almost done.

[10] Particularly if they’re personally invited and welcomed by the host of that party.

The Unplanned Career

If you’ve worked in tech for 25 years, you’ve seen some stuff, and you’ve learned a lesson or two. On October 1st, I presented a new session at Data Saturday Atlanta, sharing the story of my unplanned career, and some of the lessons I’ve learned along the way. This was the first time I presented this very personal session; and I’m incredibly grateful for the full crowd that attended, and for their feedback after the session.

My primary goal for the session is to show that you don’t need a computer science degree to build a successful career in tech – and that our industry needs more people from more non-traditional backgrounds.

My secondary goal for the session is to share some of the sharp edges that are typically hidden when people talk about their careers. Everyone wants to share their highlights, but sharing your own pain and failures is harder. This is important, because too often we’re each comparing our own blooper reel against everyone else’s highlights… and nothing good comes from that.

The Data Saturday event was in-person only[1], but I’d had a bunch of people mention they wanted to attend, but couldn’t make it to Atlanta. So…I packed a camera and microphone and recorded the video on my own.

If you weren’t able to attend, please check out the recording, and please let me know what you think!


[1] Please understand that there is no criticism implied here. Organizing a hybrid event is significantly more difficult than organizing an in-person or online-only event, and focusing on in-person community produced wonderful results.

Power BI guidance from the CAT

If you’re reading this blog, odds are you’re already familiar with the Power BI documentation. If there’s a feature in Power BI, there are a set of articles that describe its capabilities.

But what if you need more? What if you need guidance on which features to use, and on how to use them properly to achieve your goals? This is where the Power BI guidance documentation comes in.

a watercolor painting of a cat reading power bi guidance documentation, generated by dall-e2
A watercolor painting of a cat reading Power BI guidance documentation

I’ve written previously about some of what the Power BI CAT team[1] does, but the Power BI guidance documentation only gets a passing mention… and it’s worth going into more deeply.

A lot of what the Power BI CAT team does involves working with large enterprise customers. These customers are often trying to achieve difficult goals that often involve complex data architectures, and Power BI is often a significant part of their end-to-end information supply chain. We get involved[2] when these enterprise customers need help achieving their strategic goals, and this help often includes helping them effectively use the existing capabilities of Power BI.

In these customer engagements we help one customer at a time. This is valuable and important… but it doesn’t scale. Documentation scales. So when we identify the need for guidance, we use the Power BI guidance documentation as a channel to share common patterns, best practices, and key concepts that will help everyone be more successful with Power BI.

A lot of the guidance documentation can be summarized as “if everyone knew this thing, the Power BI CAT team wouldn’t need to keep helping customers solve this particular problem.” Included in this bucket are things like the importance of star schemas, using variables in DAX, or the value of separating reports from data models in the Power BI service.

There are also a few sections in the Power BI guidance documentation that are more ambitious in scope. These sections are designed to address common strategic patterns, not just tactical or technical challenges.

  • Migrating to Power BI – It’s very common for large organizations to retire legacy BI tools and to standardize on Power BI.[3] This guidance presents best practices for approaching this strategic change to minimize risk and maximize success.
  • Power BI adoption roadmap – Power BI includes an incredible set of powerful[4] capabilities for delivering information and insights… but technology only goes so far without the right people and processes in place. The Power BI adoption roadmap is a set of guidance that focuses on big picture side of succeeding with Power BI, including governance, establishing a center of excellence, and empowering a community of practice.
  • Power BI implementation planning[5] – If the core Power BI product documentation focuses on specific features and capabilities, and the Power BI adoption roadmap focuses on big-picture strategic topics, the Power BI implementation planning guidance falls somewhere in-between. This guidance presents a set of common usage scenarios, and how to implement these scenarios by using the right Power BI capabilities in the right way, supplemented by subject areas that look more closely at important topics like tenant setup and workspaces.

Fun fact: All three of these sections are written by MVP Melissa Coates of Coates Data Strategies. The Power BI CAT team is involved end to end, but we’re delighted to work with Melissa for this guidance.

If you’re reading this blog[6], you probably want to get the most from your personal and organizational investments in Power BI. If you do, you owe it to yourself to head on over to the Power BI guidance documentation right now.


[1] Yes, I know that the T in CAT stands for Team, so saying CAT team is redundant. No, I don’t care.

[2] If you’re reading this and asking yourself “how do I get the Power BI CAT team to help my organization?” the short answer is that you should work with your Microsoft account team. They’re the first line of defense, and are more than equipped to help with most Power BI challenges. If they need help, they have channels of escalation that include the CAT team.

[3] This may come as a surprise, but we want to make this as simple as possible.

[4] For once, pun not intended.

[5] As of July 2022, the Power BI implementation planning guidance is still a work in progress. There’s a lot of great content already published, but we’re less than halfway done. There are more usage scenarios and subject areas coming in the months ahead, so be sure to check back regularly.

[6] And I have every reason to believe that you are.

Dataflows with benefits

Power BI datamarts are like dataflows with benefits.

In case you missed the announcements this week from Microsoft’s Build conference, datamarts are a new major new capability coming to Power BI that are now available in public preview. There are also preview docs available, but the best datamarts content I’ve seen so far is this fantastic video from Adam and Patrick at Guy in a Cube[1].

I’m going to assume that since you’re reading this blog, you didn’t miss the announcement. I’m also going to assume that some small part of you is asking “what the heck are datamarts, anyway?”

For me, datamarts are like dataflows with benefits[2].

It should come as no surprise to any regular reader of this blog that I’m a big fan of dataflows in Power BI. Dataflows let Power BI users build reusable data tables  in a workspace using Power Query Online, and share them with other users for reuse in other workspaces. What’s not to love?[3]

If you’ve spent a lot of time working with dataflows, you can probably think of a few things you wished dataflows did differently, or better. These are some of the most common requests I’ve heard in customer conversations over the years:

  • “I wish I could define row-level security (RLS) on my dataflows so I could share them securely with more users.”
  • “I wish my users could connect to dataflows using SQL, because analysts in my org all know SQL.”
  • “I wish <operation that would benefit from a compute engine> performed better in dataflows.”[4]

You probably see where I’m going here. Datamarts in Power BI deliver solutions to these problems. Datamarts in Power BI build on the strength of dataflows, while enabling common scenarios where dataflows did not offer an obvious or simple solution.

Almost like datamarts were dataflows with benefits.

Datamarts, like dataflows, provide a Power Query Online experience to perform data preparation. Datamarts, like dataflows, allow users to create reusable data tables in a Power BI workspace.

But datamarts, unlike dataflows, store their data in a managed SQL database. Dataflows use CDM folders for their storage, which means CSV files with some extra metadata. Although this file-based approach provides some benefits for reuse in integration scenarios, it can also be a challenge for simply connecting to the data in tools other than Power BI.

With datamarts, the data produced by executing your Power Query queries[5] is loaded into tables in an Azure SQL database that’s managed by the Power BI service. Having data in a full-featured database, as opposed to folders full of text files, makes a lot of difference.

  • Datamarts support row-level security. Using a simple in-browser user experience a datamart author can define RLS rules that restrict what users can see what data when connecting to the datamart.
  • Anyone with the right permissions can query[6] the datamart’s underlying SQL database using any SQL query tool. This means that authorized users can perform exploratory data analysis in Excel, SQL Server Management Studio, Azure Data Studio, or whatever tool they’re most comfortable using. It’s a database.
  • Merges and joins and other operations common to building a star schema perform much better, because these are things that SQL has been awesome at for decades.[7]

Is it just me, or is this sounding a lot like dataflows with benefits?

But wait, you might say, what about measures and that automatically created dataset thingie and all the other stuff they showed at Build, and which I don’t really understand yet? What about deciding when I should use a datamart over a dataflow? What about the expanded web authoring experience, and querying the datamart directly from within the browser??

Yeah, I’m not going to cover any of that in this post. The post is already too long, and I didn’t really have time to write this one as it is.[8] But I think it’s those things that make the product team scowl when I describe datamarts as “dataflows with benefits” because they’re really a lot more. But if you think about dataflows with benefits, you’re probably off to a good start, and heading in the right direction.

I’m going to end on this note: all of my personal Power BI projects going forward will be built using datamarts. Datamarts do everything I need dataflows to do, and for me they do it better. I’ll still always love dataflows, and there will likely still be places where dataflows make sense, but for me…


 

[1] And I’m not only saying that because Roche’s Maxim makes an awkward surprise appearance.

[2] In case the “with benefits” descriptor isn’t meaningful to you, I’m attempting to make a play on the phrase “friends with benefits.” You can check out the mildly NSFW Urban Dictionary definition if you really feel like you need to. Honestly, I wouldn’t recommend it, but you do you.

[3] I literally got goosebumps typing that “reusable data tables” sentence. Even after all these years, having the promise of self-service data preparation and reuse realized in Power BI still feels a little bit like magic.

[4] Yes, this typically leads into a conversation about the dataflows “enhanced compute engine” but since using that engine requires following specific design patterns, this isn’t always as straightforward a conversation as you might want it to be.

[5] Power Queries? I always struggle with what to use for the plural noun for the queries that you build in Power Query. Maybe I should ask Alex.

[6] I use the term “query” here to mean SELECT statements. Only read operations are permitted, so if you try to UPDATE or whatever, you’ll get an error. Use Power Query to transform the data as you load it, like you would with a traditional data mart or data warehouse.

[7] I don’t have enough hands-on with the datamarts preview at this point to say much more than “faster” but in my non-scientific testing queries that would take “I guess I’ll make another pot of coffee” in dataflows take “oh it’s done already” in datamarts.

[8] If you had any idea how mean my calendar is this year, you’d weep. I’m weeping right now.

Coming to the PASS Data Community Summit in November: The Hitchhiker’s Guide to Adopting Power BI in Your Organization

At the 2022 PASS Community Data Summit this November, I’m thrilled to be co-presenting a full-day pre-conference session with the one and only Melissa Coates [blog | Twitter | LinkedIn]. We’ll be presenting our all-day session live and in-person in Seattle on Tuesday, November 15, 2022.

What’s the Session?

The Hitchhiker’s Guide to Adopting Power BI in Your Organization

What’s the Session About?

The Power BI Adoption Roadmap is a collection of best practices and suggestions for getting more value from your data and your investment in Power BI. The Power BI Adoption Roadmap is freely available to everyone — but not everyone is really ready to start their journey without a guide. Melissa and I will be your guides…while you’re hitchhiking…on the road…to reach the right destination…using the roadmap. (You get it now, right?!?)

We’ll do an end-to-end tour of the Power BI Adoption Roadmap. During the session we’ll certainly talk about all of the key areas (like data culture, executive sponsorship, content ownership and management, content delivery scope, center of excellence, mentoring and user enablement, community of practice, user support, data governance, and system oversight).

Smart Power BI architecture decisions are important – but there’s so much more to a successful Power BI implementation than just the tools and technology. It’s the non-technical barriers, related to people and processes, that are often the most challenging. Self-service BI also presents constant challenges related to balancing control and oversight with freedom and flexibility. Implementing Power BI is a journey, and it takes time. Our goal is to give you plenty of ideas for how you can get more value from your data by using Power BI in the best ways.

We promise this won’t be a boring day merely regurgitating what you can read online. We’ll share lessons learned from customers, what works, what to watch out for, and why. There will be ample opportunity for Q&A, so you can get your questions answered and hear what challenges that other organizations are facing. This will be a highly informative and enjoyable day for you to attend either in-person or virtually.

Who is the Target Audience?

To get the most from this pre-conference session: You need to be familiar with the Power BI Adoption Roadmap and the Power BI Implementation Planning guidance. You should have professional experience working with Power BI (or other modern self-service BI tools), preferably at a scope larger than a specific team. Although deep technical knowledge about Power BI itself isn’t required, but the more you know about Power BI and its use, the more you’ll walk away with from this session.

We hope to see you there! More details and to register: link to the PASS Data Community web site.

Who wrote this blog post?

It was Melissa.

She wrote it and emailed it to me and I shamelessly[1] stole it, which may be why there haven’t been any footnotes[2]. I even stole the banner image[3].


[1] With her permission, of course.
[2] Until these ones.
[3] Yes, Jeff. Stealing from Melissa is a Principal-level behavior.

On building expertise

How do you learn a new skill? How do you go from beginner to intermediate to expert… or at least from knowing very little to knowing enough that other people start to think of you as an expert?

This post describes the learning approach that has worked for me when gaining multiple skills, from cooking to sword fighting to communication. This approach may work for you or it may not… but I suspect that you’ll find something useful even if your learning style is different.

Purely decorative stock photo

When I’m building expertise[1] in a new area, there are typically five key activities that together help me make the learning progress I need. They don’t come in any particular order, and they all tend to be ongoing and overlapping activities, but in my experience they’re all present.

Practicing: Building expertise in an area or skill requires actively performing that skill. You can’t become great at something without doing that thing. Practice whenever you can, and look for opportunities to practice things that are outside your comfort zone. If your practice is always successful, this may be a sign that you’re not pushing your self enough, and your progress may be slower than it needs to be.

Studying: It’s rare to learn something completely new. Even if you’re trailblazing a brand new area of expertise, you probably get there by learning about related better-known areas, from existing experts. Read whatever you can, watch whatever you can, listen to whatever you can. Find content that matches the way you prefer to learn, and spend as much time as you can consuming that content. Make it part of your daily routine.

Observing: Find existing experts and watch them work. Where reading books or watching videos exposes you to the static aspect of expertise, being “expert adjacent” exposes you to the dynamic aspect. Mindfully observing how an expert approaches a problem, how they structure their work area, how they structure their day, will give you insights into how you can improve in these aspects.

Networking: Find ways to engage with a community of like-minded people who share your interest in your chosen area of expertise. Not only will these activities provide ongoing opportunities to learn from peers, the questions and problems that other community members share can serve as motivation to explore topics you may not have otherwise thought of independently.

Teaching: Teaching a skill to others forces you to think about that skill in ways that would probably not be needed if you were learning on your own. Teaching forces you to look at problems and concepts in ways that expose your biases and blind spots, and to ask (and answer) questions that would never have occurred to you on your own. Teaching a skill is in many ways the best way to deeply learn that skill.

Please note that these activities aren’t sequential, and no one activity is dependent on the others. In my experience, all five activities are ongoing and overlapping, and each one complements and enables the others.

What does this look like in practice?

I grew up in a family where cooking wasn’t really a thing[2], so I started learning to cook as an adult. Despite this, my cooking and baking have become something of a gold standard for friends and acquaintances. My learning experience looked something like this:

Studying: I bought and read dozens of cookbooks. If a topic or book looked interesting, I bought it and read it. I subscribed to many cooking magazines and read them when they arrived each month and watched pretty much every cooking show I could fit into my schedule.[3]

Practicing: I cooked and baked almost every day. I tried new ingredients and recipes and techniques that I discovered through my study, and figured out what worked, and what I needed to do to make it work for me. I organized fancy dinners for friends as a forcing function, and to keep myself from getting too lazy.

Observing: When I dined in restaurants[4], I would try to sit at the chef’s counter or somewhere I could get a good view of the kitchen, and would mindfully watch how the chef prepared and served each dish.

Networking: I made foodie friends. I talked about food and cooking with them, and occasionally cooked with them as well. Sometimes we’d go to cooking classes together. Sometimes we’d borrow equipment or cookbooks from each other. Eventually we’d invite each other over for dinner. Those were the days…

Teaching: I found opportunities to share what I’d learned with others. When someone would exclaim “Oh my goodness – how did you make this?!?”[5] I would do my best to tell them and show them. Sometimes this was a conversation, sometimes it was a 1:1 tutoring session, sometimes it was a small group class. Each time I learned something about the topic I was teaching because of the questions people asked.

Cooking is just one example, but I have similar experiences for every topic where people have asked me questions like “how do you know all this?” or “how did you get so good at this?” For every area where I have developed a reasonable degree of expertise, it’s because I have done some combination of these things, often over many years. I have every reason to believe that this approach will work for you as well.

Ok… that’s the blog post, but I’m not quite done. Back in January when I started writing this one, I started with the three “content seeds” you see below.

Data Chef

FFS

That’s right. Past Matthew left me the phrase “Data Chef,” the link to the PowerBI.tips video he’d just finished watching, and the phrase “FFS.” Three months later I have no idea how these relate to the topic of building expertise. If you can figure it out, please let me know.


[1] You may have noticed that I’m deliberately using the phrase “building expertise” instead of a different phrase like “learning” or “acquiring a new” skill. I’ve chosen this phrase because the approach described here isn’t what I do when I’m learning something casually – it’s what I’ve done when building a wide and deep level of knowledge and capability around a specific problem or solution domain.

[2] I love my mother dearly, but when I was a child she would cook broccoli by boiling it until it was grey, and you could eat it with a spoon. I never realized that food was more than fuel until I was in my 20s and met my future wife.

[3] Yes, I’m old, in case talking about magazines didn’t give it away. I assume the same pattern today would involve subscribing to blogs or other content sources. Also, can you remember back when you needed to watch TV shows when they were on, not when you wanted to watch them?

[4] Back in the day I used to travel for work a lot, so I ate in restaurants a lot. Sometimes I was on the road two or three weeks per month, which translated into a lot of restaurant dinners.

[5] If you’ve never met me in person and have never eaten the food I make and share, you might be surprised by how often this happens. If you know me personally, you probably won’t be surprised at all.

Recipe: Chicken Liver Mousse

It’s been a few years since I shared a recipe, but this one has kept coming up in conversation lately and it feels like the right time to share. This recipe is from Laurie Riedman of Elemental@Gasworks[1], and words can’t express how awesome it is.

Ingredients

  • 2 pounds chicken liver, soaked in milk
  • ½ pound unsalted butter
  • 4 shallots, sliced
  • 2 cloves garlic, sliced
  • 2 Granny Smith apples, peeled and diced
  • 8 sheets gelatin, soaked in water
  • 1/3 cup Grand Mariner
  • Salt and pepper to taste

Technique

  1. Melt butter in sauté pan
  2. Add shallots, garlic and apple – cook until soft, but do not brown
  3. Turn heat to medium high and add drained chicken livers
  4. Sauté until just cooked – livers should still be pink inside
  5. Add Grand Mariner and reduce by half
  6. Stir in gelatin until dissolved
  7. Cool slightly and process until very smooth, adding salt and pepper to taste
  8. Put in terrine mold, cover and weigh
  9. Chill overnight

Serving suggestion

Serve with toasted baguette and pickles

The mousse will keep for months in the freezer. I made a big batch in 2020, vacuum sealed 5 or 6 generous portions, and have been thawing one every few months when I feel the need for something rich, savory, and delicious.


[1] Elemental@Gasworks was my favorite restaurant for years. Before I moved to the Seattle area I would dine at Elemental at least once per week when I was visiting. I learned a lot from eating Laurie’s food, and from watching her cook in the tiny, tiny kitchen. Elemental closed in 2012, but it comes up in conversation almost every day among those of us fortunate enough to have experienced it.

Thank you for sticking around – 200th post!

I write this blog mainly for myself.

I write about topics that are of interest to me, mainly because they’re interesting to me but also because there doesn’t seem to be anyone in the Power BI community covering them. There are dozens of blogs and YouTube channels[1] covering topics like data visualization, data modeling, DAX, and Power Query – but I haven’t found any other source that covers the less-tangible side of being a data professional that is so compelling to me.

I write on my own chaotic schedule. Some months I might post every other day, or I might go weeks or months without posting anything[2]. These days I try to post once per week, but that’s definitely a stretch goal rather than something to which I will commit. Sometimes my creativity flows freely, but sometimes writing comes from the same budget of emotion and energy that I need for work and life… and blogging typically ends up near the bottom of my priority list.

And yet, here we are, 40 months and 200 blog posts later. In the past few weeks I’ve seen dozens of people reference Roche’s Maxim of Data Transformation, which started out as a tongue-in-cheek self-deprecating joke and then took on a life of its own. Earlier this week I spent time with another team at Microsoft that has organized been collectively reading and discussing my recent series on problems and solutions, and looking for ways to change how their team works to deliver greater impact. More and more often I talk with customers who mention how they’re using some information or advice from this blog… and it’s still weird every single time.

In these dark days it’s easy to feel isolated and alone. It’s easy to dismiss and downplay online interactions and social media as superficial or unimportant. It’s easy to feel like no one notices, like no one cares, and like nothing I’m doing really makes a difference[3].

So for this 200th post, I wanted to take the time to say thank you to everyone who has taken the time to let me know that I’m not alone, and that someone is listening. It makes a big difference to me, even if I don’t always know how to show it.

Let’s keep doing this. Together.


[1] I recently discovered that two of my co-workers have their own little YouTube channel. Who knew?

[2] Please don’t even get me started on how long it’s been since I posted a new video.

[3] This is a reminder that Talking About Mental Health is Important. I have good days and bad days. Although I make a real effort to downplay the bad and to amplify the good, there’s always a voice inside my head questioning and criticizing everything I do. It’s important to talk about mental health because this is a voice that many people have, and not everyone knows that it’s not just them. Not everyone knows that the voice is lying.

Risk Management Basics

When I began my software career back in the 90s, one of the software-adjacent skills I discovered I needed was risk management. When you’re building anything non-trivial, it’s likely that something will go wrong. How do you find the right balance between analysis paralysis and blindly charging ahead? How do you know what deserves your attention today, and what can be safely put on the back burner and monitored as needed?

This is where risk management comes in.[1]

In this context, a risk is something that could possibly go wrong that would impact your work if it did go wrong. Risk management is the process of identifying risks, and deciding what do do about them.

One simple and lightweight approach for risk management[2] involves looking at two factors: risk likelihood, and risk impact.

Risk likelihood is just what it sounds like: how likely is the risk to occur. Once you’re aware that a risk exists, you can measure or estimate how likely that risk is to be realized. In many situations an educated guess is good enough. You don’t need to have a perfectly accurate number – you just need a number that no key stakeholders disagree with too much.[3] Rather than assigning a percentage value I prefer to use a simple 1-10 scale. This helps make it clear that it’s just an approximation, and can help prevent unproductive discussions about whether a given risk is 25% likely or 26% likely.

Risk impact is also what it sounds like: how bad would it be if the risk did occur? I also like to use a simple 1-10 scale for measuring risk impact, which is more obviously subjective than the risk likelihood. So long as everyone who needs to agree agrees that the impact a given risk is 3 or 4 or whatever, that’s what matters.

Once you have identified risks and assigned impact and likelihood values to each one, multiply them together to get a risk score from 1 to 100. Sort your list by this score and you have a prioritized starting point for risk mitigation.

Risk mitigation generally falls into one or more of these buckets:[4]

  1. Risk prevention – you take proactive steps to reduce the likelihood of the risk occurring.
  2. Risk preparation – you take proactive steps to plan for how you’ll respond to reduce the impact if the risk does occur.

For risks with high risk scores, you’ll probably want to do both – you’ll take steps to make the risk more likely to occur, and you’ll take steps to be ready in case it still does.

Here are a few examples of risks that might be identified when performing risk management for a BI project, along with examples of how each might be mitigated:

  • Risk: A database server might be unavailable due to hardware failure, thus interrupting business operations
    • Possible prevention: Purchase and configure server server hardware with redundant storage and other subsystems
    • Possible preparation: Define and test a business continuity and disaster recovery[5] plan for recovering the database server
  • Risk: You might not get permissions to access a data source in a timely manner
    • Possible prevention: Prioritize getting access to all required data sources before committing to the project
    • Possible preparation: Identify an executive sponsor and escalation path for situations where additional data source access is required
  • Risk: A key team member might leave the team or the company
    • Possible prevention: Work to keep the team member happy and engaged
    • Possible preparation: Cross-train other members of your team to minimize the impact if that key member moves on
  • Risk: Your data center might lose power for a week
    • Possible prevention: Locate the data center in a facility with redundant power and a reliable power grid
    • Possible preparation: Purchase and install generators and fuel reserves
  • Risk: Your data center location might be destroyed by a giant meteor
    • Possible prevention: Um… nothing leaps to mind for this one
    • Possible preparation: Again um, but maybe using a geo-distributed database like Azure Cosmos DB to ensure that the destruction of one data center doesn’t result in downtime or data loss?[6]

You get the idea. I’m not going to assign likelihood or impact values to these hypothetical risks, but you can see how some are more likely than others, and some have a higher potential impact.

Now let’s get back to a question posed at the top of the post: how do you find the right balance between analysis paralysis and blindly charging ahead?

Even in simple contexts, it’s not possible to eliminate risk. Insisting that a mitigation strategy needs to eliminate a risk and not only reduce it is ineffective and counterproductive. It’s not useful or rational to refuse to get in a car because of the statistical risk of getting injured or killed in a collision – instead we wear seat belts and drive safely to find a balance.

And this is kind of what inspired this post:

The “perfect or nothing” mindset isn’t effective or appropriate for the real world. Choosing to do nothing because there isn’t an available perfect solution that eliminates a risk is simply willful ignorance.

Most real-world problems don’t have perfect solutions because the real world is complex. Instead of looking for perfect solutions we look for solutions that provide the right tradeoff between cost[7] and benefit. We implement those pragmatic solutions and we keep our eyes open, both for changes to the risks we face and to the possibility of new mitigations we might consider.

Whether or not risk management is a formal part of your development processes, thinking about risks and how you will mitigate them will help you ensure you’re not taken by surprise as often when things go wrong… as they inevitably do…


[1] Yes, I’m linking to a Wikipedia article for a technical topic. It’s surprisingly useful for an introduction, and any search engine you choose can help you find lots of articles that are likely to be more targeted and useful if you have a specific scenario in mind.

[2] This is the only approach to risk management that will be shared in this article. If you want something more involved or specialized, you’ll need to look elsewhere… perhaps starting with the Wikipedia article shared earlier, and following the links that sound interesting.

[3] If you are in a situation where “good enough” isn’t good enough, you’ll probably want to read more than just this introductory blog post. Are you starting to see a trend in these footnotes?

[4] That Wikipedia article takes a slightly different approach (direct link to section) but there’s a lot of overlap as well. What I describe above as “risk prevention” aligns most with their “risk reduction” and my “risk preparation” aligns most with their “risk retention” even though they’re not exact matches.

[5] The other BCDR.

[6] I had originally included the “giant meteor strike” risk as an example of things you couldn’t effectively mitigate, but then I remembered how easy Cosmos DB makes it to implement global data distribution. This made me realize how the other technical risks are also largely mitigated by using a managed cloud service… and this in turn made me realize how long ago I learned about mitigating risks for data projects. Anyway, at that point I wasn’t going back to pick different examples…

[7] However you want to measure that cost – money, time, effort, or some other metric.