Diversity of Background

Your background determines who you are.

Who you are determines your values and your priorities.

Your values and priorities determine what you do, and what you accept being done.

Now apply that to your work as a technical professional.

Photo by rawpixel on Unsplash

Think for a moment about the team you work with. What is the team breakdown by gender? By nationality? By education? By skin color? By language? By ethnic background? By religion? By cognitive ability?

Now think about the decisions your team makes when deciding what to build, how to build it, and how to determine when it’s ready to ship.

Consider this headline from earlier this year: “Microsoft’s facial recognition just got better at identifying people with dark skin.” That’s a good thing, right?

Now consider this excerpt from the article:

Microsoft stated that it was able to “significantly reduce accuracy differences across the demographics” by expanding facial recognition training data sets, initiating new data collection around the variables of skin tone, gender and age and improving its gender classification system by “focusing specifically on getting better results for all skin tones.”

“The higher error rates on females with darker skin highlights an industrywide challenge: Artificial intelligence technologies are only as good as the data used to train them. If a facial recognition system is to perform well across all people, the training dataset needs to represent a diversity of skin tones as well factors such as hairstyle, jewelry and eyewear.”

I’d like to emphasize that I’m not on the Face API team and don’t have any insight into the team beyond this story, but I think it’s probably safe to say that if the team had more darker-skinned men and women as team members, the decision to ship an API with high failure rates for darker-skinned men and women may not have been made.[1] Imagine a developer saying “but it works on my face” in the same tone you’ve heard one say “but it works on my machine” in the past. If it doesn’t work on your machine, that’s when even the most obstinate developer will admit that the code has a problem.

This example of the impact diversity of background can make is significant, but it’s also pretty common. If you follow tech news sites, you’ve heard this story and others like it before. So let’s look at another one.

In systems that you’ve built, how easy is it to change an email address or username? This might be a transactional system, where the email address or username is business key. Or it may be an analytics system, where these fields may not be handled as slowly changing dimension attributes. Think about your Microsoft Account[2] as an example – just how easy is it to change the email address you use for your cloud identity across dozens of Microsoft services?

As it turns out, it’s pretty darned easy today, and I have to wonder if there are transgender team members who are responsible for this fact.

For most cisgender people, the only time you’d think about changing your name is when you get married, and then it’s only your last/family name. Changing your first/given name may feel like a weird corner case, but it definitely won’t feel this way if you or someone you love is transgender. In that case, you understand and appreciate the impact of deadnaming, and you may well have experienced the struggle of making name and email changes in system after system after system.

Rather than going on with more examples, I’ll get to the point: If you have a more diverse team, you have a better chance of building a product that is better for more customers more quickly, and to ship the right thing sooner rather than later.

To me this is an obvious truth because I have seen it play out again and again, for good and for ill. Not everyone agrees. There are still people who use the term “diversity hire” as a pejorative. This summer the amazing intern working with my team was told by one of her fellow interns that the only reason she got the position was because she was female, and there was a quota[3]. Although some people[4] may be threatened by the recognition of the value of diversity, that doesn’t reduce the value in any way.

Join a diverse team. Form a diverse team. Support a diverse team. And build something that’s what the world needs, even if the world doesn’t look just like you.

 


[1] And to be fair, the blog post to which this article refers goes into some excellent depth into how Microsoft is addressing both tactical and strategic aspects of this challenge. Please also note that a lack of diversity in product development will produce a substantively different set of problems than will a lack of morality in product development.

[2] The cloud identity that used to be called Live ID, after it was called Passport. It’s the email address you use to sign into everything from Windows to Outlook to OneDrive.

[3] Hint: It wasn’t, and there wasn’t. She was awesome, and that’s why she got the internship. I sure hope she sticks with it, because if I was half as good when I was 21 as she is, I would be ruling the world today.

[4] Typically the people who directly benefit from a lack of diversity. Yes, typically white heterosexual cisgender males. Typically people who look like me.

Power BI Dataflows – Data Profiling Without Premium

Important: This post was written and published in 2018, and the content below no longer represents the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

If you’ve been reading this blog, you already know a few things:

  1. Power BI includes a capability for self-service data preparation called “dataflows”
  2. Dataflows include computed entities, which enable some powerful reuse scenarios, but which are available only in Power BI Premium
  3. You can use computed entities to make current data profiles available for your dataflow entities, so that this valuable metadata can be used wherever the data is used
  4. Dataflows enable scenarios for reuse that don’t require computed entities and Premium

This post presents a variation on the data profiling pattern that doesn’t require Premium capacity. Let’s jump right in.

This is the approach that I took last time: I created a single dataflow to contain the data profiles for entities in other dataflows. As you can see, my workspace is no longer backed by Premium capacity, so this approach isn’t going to work.

2018-11-24_12-09-47

Instead of having a dedicated “Data Profiles” dataflow, we’re going to have data profile entities in the same dataflows that contain the entities being profiled. Dataflows like this one.

2018-11-24_12-16-57

As you can see, this dataflow contains two entities. We want to profile each of them. The most intuitive approach would be to create new queries that reference the queries for these entities, and to put the profile in the dependent query…

2018-11-24_12-19-26.png

…but if you do this, Power BI thinks you’re trying to create a computed entity, which requires Premium.

2018-11-24_12-19-47

Please allow me to rephrase that last sentence. If you reference a query that is loaded into a dataflow entity, you are creating a computed entity, which requires Premium.

So let’s not do that.

Specifically, let’s use the same pattern we used in the “reuse without premium” post to address this specific scenario.

Let’s begin by disabling the data load for the two “starter” entities that reference the external data source.

2018-11-24_12-27-50.png

Once this is done, the Premium warning goes away, because we’re no longer trying to create computed entities.

2018-11-24_12-29-20.png

Let’s rename the queries, and look at the M code behind the new queries we’ve created.

2018-11-24_12-32-04

As you can see, the new queries don’t contain any real logic – all of the data acquisition and transformation takes place in the “source” queries. The new ones just reference them, and get loaded into the CDM Folder that’s backing the dataflow.

At this point we’re functionally right back where we started – we just have a more complex set of queries to achieve the same results. But we’re also now positioned to add in queries to profile these entities, without needing Premium.

To do this, we’ll simply add new queries that reference the “source” queries, and add a step that calls Table.Profile().[1]

2018-11-24_12-38-37

And that’s that.

When I save my dataflow and refresh it, the entities for the data, and the entities for the data profile, will load, and will be saved for reuse. When I connect to this dataflow from Power BI Desktop, I have available all four entities.

2018-11-24_12-46-22

At this point you may be wondering about what the difference is between this approach and the approach that uses computed entities. To help answer this question, let’s look at the refresh details in the CSV file that can be downloaded from the refresh history.[2]

2018-11-24_12-48-06

If you look at the start and end time for each of the four entities, you’ll see that each of them took roughly the same time to complete. This is because for each entity, the query extracted data from the data source and transformed it before loading into the CDM Folder. Even though the extract logic was defined in the shared “source” queries, when the dataflow is refreshed each entity is loaded by executing its query against the data source.

By comparison, in the data profiling pattern that relies on computed entities, the data source is not used to generate the profile. The computed entity uses the CDM Folder managed by Power BI as its source, and generates from profile from there. This means that the data source is placed under lighter load[3], and the profile generation itself should take less time.

For meaningfully large data sources, this different may be significant. For the trivial data sources used in this example, the difference is measured in seconds, not minutes or hours. You’ll probably want to explore these patterns and others – I’m eager to hear what you discover, and what you think…


[1] Yes, I literally copied the code from my other blog post.

[2] For more details on refresh, see Dataflows in Power BI: Overview Part 5 – Data Refresh.

[3] 50% lighter, if my math skills haven’t failed me.

 

Power BI Dataflows FAQ

Important: This post was written and published in 2018, and the content below no longer represents the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

Photo by Matthew Brodeur on Unsplash

Q: What are Power BI dataflows?

A: Dataflows are a capability in Power BI for self-service ETL and data preparation that enable analysts and business users to define and share reusable data entities. Each dataflow is created in a Power BI workspace and can contain one or more entities. Each entity is defined by a Power Query “M” query. When the dataflow is refreshed, the queries are executed, and the entities are populated with data.

Q: Where is the data stored?

A: Data is stored in Azure storage  in the CDM folder format. Each dataflow is saved in a folder in the data lake. The folder contains one or more files per entity. If an entity does not use incremental refresh, there will be one file for the entity’s data. For entities that do use incremental refresh, there will be multiple files based on the refresh settings. The folder also contains a model.json file that has all of the metadata for the dataflow and the entities.

Q: Do I need to pay for Power BI dataflows?

A: Yes, but you don’t need to pay extra for them. Dataflows are available to Power BI Pro and Premium users.

Q: Do I need Power BI Premium to use dataflows?

A: No. Although some specific features (incremental refresh of dataflow entities, linked/computed entities, the enhanced compute engine) do require Premium, dataflows are not a Premium-only capability.

Q: Do dataflows support incremental refresh?

A: Yes. Incremental refresh can be configured on a per-entity basis. Incremental refresh is supported only in Power BI Premium.

Q: Can I use on-premises data sources with dataflows?

A: Yes. Dataflows use the same gateways used by Power BI datasets to access on-premises data sources.

Q: How do I do X with dataflows? I can do it in a query in Power BI Desktop, but I don’t see it in the dataflows query editor UI!

A: Most Power Query functionality is available in dataflows, even if it isn’t exposed through the query editor in the browser. If you have a query that works in Power BI Desktop, copy the query and paste it into Power Query Online. In most cases it will work.

Q: Do I still need a data warehouse if I use dataflows?

A: If you needed a data warehouse before Power BI dataflows, you probably still need a data warehouse. Although dataflows serve a similar logical function as a data warehouse or data mart, modern data warehouse platforms provide capabilities that dataflows do not.

Q: Do I need dataflows if I already have a data warehouse?

A: Dataflows fill a gap in data warehousing and BI tools by allowing business users and analysts to prepare and share data without needing help from IT. With dataflows, users can build a “self service data mart” in Power BI that can be used in their solutions. Because each dataflow entity is defined by a Power Query “M” query, handing off the definitions to an IT team for operationalization/industrialization is more straightforward.

Q: Do dataflows replace Azure Data Factory?

A: No. Azure Data Factory (ADF) is a hybrid data integration platform designed to support enterprise-scale ETL and data integration needs. ADF is designed for use by professional data engineers. Power BI dataflows are designed for use by analysts and business users – people familiar with the Power Query experience from Power BI Desktop and Excel – to load data into ADLSg2.

Q: With the Wrangling Data Flow in Azure Data Factory do we still need Dataflows in Power BI?

A: Probably. Power BI dataflows are a self-service data preparation tool that enable analysts and other business users who may not be comfortable using SSIS or ADF to solve data prep problems without IT involvement. This is still relevant even after ADF now includes Power Query with Wrangling Data Flows.

Q: Can I use dataflows for realtime / streaming data?

A: No. Dataflows are for batch data, not streaming data.

Q: Do dataflows replace Power BI datasets?

A: No. Power BI datasets are tabular analytic models that contain data from various sources. Power BI dataflows can be some or all of the sources used by a dataset. You cannot build a Power BI report directly against a dataflow – you need to build reports against datasets.

Q: How can I use the data in a dataflow?

A: No. Oh, wait, that doesn’t make sense – this wasn’t even a yes or no question, but I was on a roll… Anyway, you use the data in a dataflow by connecting to the Power BI dataflows connector in Power Query. This will give you a list of all workspaces, dataflows, and entities that you have permission to access, and you can use them like any other data source.

Q: Can I connect to dataflows via Direct Query?

A: Yes. When using the dataflows enhanced compute engine, you can connect to dataflows using import or DirectQuery. Otherwise, dataflows are an import-only data source.

Q: Can I use the data in dataflows in one workspace from other workspaces?

A: Yes! You can import entities from any combination of workspaces and dataflows in your PBIX file and publish it to any workspace where you have the necessary permissions.

Q: Can I use the data in a dataflow from tools other than Power BI?

A: Yes. You can configure your Power BI workspace to store dataflow data in an Azure Data Lake Storage gen2 resource that is part of your Azure subscription. Once this is done, refreshing the dataflow will create files in the CDM Folder in the location you specify. The files can then be consumed by other Azure services and applications.

Q: Why do dataflows use CSV files? Why not a cooler file format like Parquet or Avro?

A: The dataflows whitepaper answers this one, but it’s still a frequently asked question. From the whitepaper: “CSV format is the most ubiquitously supported format in Azure Data Lake and data lake tools in general, and CSV is generally the fastest and simplest to write for data producers.” You should probably read the whole thing, and not just this excerpt, because later on it says that Avro and Parquet will also be supported.

Q: Is this an official blog or official FAQ?

A: No, no. Absolutely not. Oh my goodness no. This is my personal blog, and I always suspect the dataflows team cringes when they read it. Also I don’t update it quite as often as I should.

Diversity of Ability

I’m better than you are.

No, that’s not how I want to start…

What makes a high-performing team?

No, no, that’s still not right. Let me try again…

When you think about the people you work with, your co-workers, teammates, and colleagues, what is it that you find yourself admiring?

ability
Photo by Mark Raugas – http://innercapture.com/galleries/squatch18/index.htm

For me, it often comes down to things that I’m not good at – useful skills and knowledge that I lack, or where my team members are far more advanced. When you consider the people I work with, this can be pretty intimidating[1]. It’s great to know that I can reach out at any time to some of the best experts in the world, but it sometimes makes me wonder what I have to offer to the team when I see them kicking ass and optionally taking names.

As it turns out, I have a lot to offer. Specifically, I’ve been able to do some pretty significant things[2] that have changed the way my team works, and changed the impact that the team has on Power BI as a product. And if I believe what people have told me[3],  I’ve implemented changes that probably could not have been made without me.

This brings me back to this question: What makes a high-performing team?

Since I’m no expert[4], I’ll refer to an expert source. This article from Psychology Today lists 10 characteristics of high-performing teams, and #1 on the list is this:

Define and Create Interdependencies. There is a need to define and structure team members’ roles. Think of sports teams, everyone has their position to play, and success happens when all of the players are playing their roles effectively. In baseball, a double-play is a beautiful example of team interdependency.

Based on my experiences, I’ll paraphrase this a little. A high-performing team requires team members with diverse and complimentary abilities. Everyone should be good at something – but not the same thing – and everyone should be allowed and encouraged to contribute according to his or her personal strengths and interests[5]. Every team member should have significant abilities related to the priorities of the team – but not the same abilities. And equally importantly, each team member’s contributions need to be valued, appreciated, recognized, and rewarded by the team culture.

I’ve worked on more than a few teams where these criteria were not met. At the extreme, there were teams with a “bro” culture, where only a specific set of brash technical abilities were valued[6], so everyone on the team had the same skills and the same attitudes, and the same blinders about what was and what was not important. And of course the products they built suffered because of this myopic monoculture. Although this is an obvious extreme, I’ve seen plenty of other teams that needed improvement.

One example that stands out in my memory was the first major product team I worked on at Microsoft. There was one senior developer on the team who loved sustained engineering work. He loved fixing bugs in, and making updates to, old and complex code bases. He was good at it – really good – and his efforts were key to the product’s success. But the team culture didn’t value his work. The team leaders only recognized the developers who were building the flashy and exciting new features that customers would see in marketing presentations. The boring but necessary work that went into making the product stable and viable for enterprise customers simply wasn’t recognized. Eventually that team member found another team and another company.

I’m very fortunate today to work on a team with incredible diversity. Although most team members[7] are highly skilled at Power BI, everyone has his own personal areas of expertise, and an eagerness to use that expertise to assist his teammates. And just as importantly, we have a team leader who recognizes and rewards each team member’s strengths, and finds ways to structure team efforts to get the best work out of each contributor. Of course there are challenges and difficulties, but all in all it’s a thing of beauty.

Let’s wrap this up. If you’ve been reading the footnotes[8], you’ve noticed that I’ve mentioned imposter syndrome a few times. I first heard this term around 8 years ago when Scott Hanselman blogged about it. I’d felt it for much longer than that, but until I read his post, I’d never realized that this was a common experience. In the years since then, once I knew what to look for, I’ve seen it all around me. I’ve seen amazing professionals with skills I respect and admire downplay and undervalue their own abilities and contributions. And of course I see it in myself, almost every day.

You may find yourself feeling the same way. I wish I could give advice on how to get over it, but that’s beyond me at the moment. But what I can say is this: you’re better than the people you work with[9]. I don’t know what you’re better at, but I’m highly confident that you’re better at something – something important! – than the rest of your team. But if your team culture doesn’t value that thing, you probably don’t value it either – you may not even recognize it.

If you’re in this situation, consider looking for a different team. Consider seeking out a team that needs the thing that you have to give, and which will appreciate and value and reward that thing you’re awesome at, and which gives you joy. It’s not you – it’s them.

Not everyone is in a position to make this sort of change, but everyone can step back to consider their team’s diversity of ability, and where they contribute. If you’ve never looked at your role in this way before, you may be surprised at what you discover.

 


[1] Epic understatement alert. I work with these guys, and more like them. Imagine my imposter syndrome every damned day.

[2] I will not describe these things in any meaningful way in this post.

[3] Which is far from certain. See the comment on imposter syndrome, above.

[4] Imposter syndrome, remember? Are you noticing a theme yet?

[5] I explicitly include interests here because ability isn’t enough to deliver excellence. If you’re skilled at something but don’t truly care about it, you may be good, but you’re probably never be great.

[6] Bro, short for brogrammer, with all the pejorative use of this term implies. If you’ve been on one of these teams, you know what I mean, and I hope you’re in a better place now.

[7] Present company excluded, of course.

[8] Yes, these footnotes.

[9] Did this work? I was a little worried about choosing the opening sentence I did, but I wanted to set up this theme later on. Did I actually pull it off, or is this just a cheap gimmick? I’d love to know what you think…

Are Power BI Dataflows a Master Data Management Tool?

Important: This post was written and published in 2018, and the content below no longer represents the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

Are Power BI dataflows a master data management tool?

This guy really wants to know.

MDM
Image from https://www.pexels.com/photo/close-up-photography-of-a-man-holding-ppen-1076801/

Spoiler alert: No. They are not.

When Microsoft first announced dataflows[1] were coming to Power BI earlier this year, I started hearing a surprising question[2]:

Are dataflows for Master Data Management in the cloud?

The first few times I heard the question, it felt like an anomaly, a non sequitur. The answer[3] seemed so obvious to me that I wasn’t sure how respond.[4]

But after I’d heard this more frequently, I started asking questions in return, trying to understand what was motivating the question. A common theme emerged: people seemed to be confusing the Common Data Service for Apps used by PowerApps, Microsoft Flow, and Dynamics 365, with dataflows – which were initially called the Common Data Service for Analytics.

The Common Data Service for Apps (CDS) is a cloud-based data service that provides secure data storage and management capabilities for business data entities. Perhaps most specifically for the context of this article, CDS provides a common storage location, which “enables you to build apps using PowerApps and the Common Data Service for Apps directly against your core business data already used within Dynamics 365 without the need for integration.”[5] CDS provides a common location for storing data that can be used by multiple applications and processes, and also defines and applies business logic and rules that are applied to any application or user manipulating data stored in CDS entities.[6]

And that is starting to sound more like master data management.

When I think about Master Data Management (MDM) systems, I think of systems that:

  • Serve as a central repository for critical organizational data, to provide a single source of truth for transactional and analytical purposes.
  • Provide mechanisms to define and enforce data validation rules to ensure that the master data is consistent, complete, and compliant with the needs of the business.
  • Provide capabilities for matching and de-duplication, as well as cleansing and standardization for the master data they contain.
  • Include interfaces and tools to integrate in with related systems in multiple ways, to help ensure that the master data is used (and used appropriately) throughout the enterprise.
  • (yawn)
    And all the other things they do, I guess.[7]

Power BI dataflows do not do these things.

While CDS has many of these characteristics, dataflows fit in here primarily in the context of integration. Dataflows can consume data from CDS and other data sources to make them available for analysis, but their design does not provide any capabilities for the curation of source data, or for transaction processing in general.

Hopefully it is now obvious that Power BI dataflows are not an MDM tool. Dataflows do provide complementary capabilities for self-service data preparation and reuse, and this can include data that comes from MDM systems. But are dataflows themselves for MDM? No, they are not.


[1] At the time, they weren’t called dataflows. Originally they were called the Common Data Service for Analytics, which may well have been part of the problem.

[2] There were many variations on how the question was phrased – this is perhaps the simplest and most common version.

[3] “No.”

[4] Other than by saying “no.”

[5] Taken directly from the documentation.

[6] Please understand that the Common Data Service for Apps is much more than just this. I’m keeping the scope deliberately narrow because this post isn’t actually about CDS.

[7] MDM is a pretty complex topic, and it’s not my intent to go into too much depth. If you’re really interested, you probably want to seek out a more focused source of information. MDM Geek may be a good place to start.

What’s Missing from Power BI Dataflows?

Important: This post was written and published in 2018, and the content below no longer represents the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

2ncs2x

In case you don’t agree with this guy from that meme, you should go to ideas.powerbi.com to let the dataflows team know what you think, and why.

The “ideas” site is a feature voting tool where Power BI users can request new features and capabilities, either by submitting new ideas or by voting for ideas already submitted by other users.

Of course, none of this is unique to dataflows, and none of it is new. Despite this, I think it’s worth writing about for two reasons:

    1. As of November 24, 2018, there are a total of 34 ideas submitted for dataflows. Most of them have only a handful of votes, and many of them were submitted before dataflows were available in public preview. This suggests that a lot of the people who are using dataflows today have not taken the time to share their ideas with the dataflows team.
    2. Most of the ideas submitted and voted on by other users[1] do not have any supporting detail.

It’s difficult to overstate how important this is. When you’re requesting a feature or capability, take the time to explain your scenario. Any feature request has an implied purpose: “I want the product to include X, because I’m trying to do Y.” But unless you include Y, the people reading the request for X are left to guess, and those guesses are based almost exclusively on the detail shared in the request.

Please take the time to let the Power BI dataflows team know how you want the feature to evolve. And please take the time to let them know why this is important to you. The scenario details are as important as the idea or the vote itself.


 

[1] Like this one. Thank you, David. I think.

Thoughts on Expertise, Ignorance, Chocolate, and Bacon

In late September I had two very different experiences that I’ve found myself thinking about a lot since then. One involved chocolate, and the other involved bacon, and both involved ignorance.

My wife and I were taking a bacon making class one Sunday evening. I’d tried making homemade bacon before, but the results were disappointing, and I wanted some expert instruction. I’d been looking forward to the class for months.

While we were in the area, we decided to stop by Dawn’s Candy and Cake to inquire about tempered chocolate. I’ve been making chocolate treats for decades, and have tempered chocolate many times, but always found it to be a finicky pain in the butt. Dawn’s offers chocolate tempering classes, and I was hoping that there would be some way I could pop in to use chocolate that an employee had tempered to use with truffles I had made myself. This would let me focus on flavors, and let someone else worry about the annoying parts.

Sadly it was not meant to be. Although the young woman who was working that evening was very helpful, all of the tempering classes in the time-frame I needed were completely booked. But all was not lost. This amazing 17-year-old[1] stepped up to save the day. She printed out the tempering instructions they hand out during classes, and walked me through them in detail. They were pretty much what I’d done in the past, but with enough differences that i was happy to receive them. It was obvious she’s done this before; she explained the processes and concepts clearly and concisely…

So I asked her some follow-up questions, abut selecting chocolate with the right viscosity[2] for different purposes. She said she’d never even heard of viscosity, and wasn’t sure what to tell me.

When we got to the bacon class, we learned that the instructions we would be using were the same instructions I had followed in my earlier failed attempt. This was disappointing, but the middle-aged man teaching the class was very forward in letting us know how he’d made bacon scores of times, and how everyone who had tried his bacon was spoiled forever on the store-bought stuff. He let us know very explicitly that he was an expert…

And as the class progressed, he proceeded to read the copied-from-the-internet directions, evade every substantive question, and inform the students that the things they were concerned and asking about weren’t important for unexplained reasons, all while continuing to tout his expertise. The class itself turned into something of an ordeal, as the instructor used every trick in the bulshitting-to-cover-up-your-ignorance-in-front-of-the-class book.[3]

Sigh.

The two experiences came back to back, and they could not have been more different. One instructor was confident enough to admit that she didn’t know, and the other was not. One instructor let his ego get in the way of his students’ learning, and the other didn’t let her ego get involved at all. One was focused on helping, while one was focused on not looking like an idiot. One succeeded, and the other one failed.

I’m not going to dwell on the fact that one of these people was young, and the other was middle-aged, or the fact that one was female and the other was male. While these factors may be significant, I know enough exceptions to the stereotypes[4] that I’m not going to go there. I’d rather focus on the behaviors themselves.

I strongly believe that the willingness to express vulnerability or weakness – including ignorance – only comes with confidence.

When you’re not confident in yourself, if you don’t know your strengths and weaknesses, and if you haven’t made some sort of peace with them, it’s incredibly difficult to say “I don’t know,” especially if you’re in a position of authority.  Some people choose to deny gaps in their knowledge, trying to cover them with bluster and obfuscation. Some may get away with it some of the time, but it’s never a long-term strategy for success. This behavior eventually builds a reputation as an unreliable blowhard.

People who admit their ignorance accomplish two important things. First, they give themselves the opportunity to learn something. Second, they encourage trust and confidence in the people around them[5]. When you show that you’re able and willing to express ignorance, this adds weight and confidence to your words when you’re expressing knowledge and expertise.

Anyway… where was I going with this? What’s the moral of this story?

I think the moral of the story is don’t be that guy.

Following my two culinary learning experiences, I have returned to Dawn’s Candy and Cake on at least three occasions, and have recommended their training events and supplies to multiple people. I’ll never take another class at the other place. And after talking to other participants after the class, I know I’m not alone in that decision.


[1] I know she was 17 because when my wife and I were talking about my upcoming tattoo appointment, she talked excitedly about how she wanted to get a tattoo when she turned 18 in a few months. Yes, I felt as old as I’ve felt in ages.

[2] In case you’re interested, this is a decent overview: https://www.reference.com/food/viscosity-important-candy-making-459c24a3ca10028b

[3] I started my IT career as a trainer, and spent years in the classroom. I could write this book.

[4] And I try to be exceptional, every day.

[5] If the people around you are the type of people who would ridicule you for admitting you don’t know something, you have also benefited from learning that you should be surrounded by different, better people.

Power BI Dataflows – Reuse without Premium

Important: This post was written and published in 2018, and the content below no longer represents the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

There are two common misconceptions that I see related to Power BI dataflows and Power BI Premium. This goal of this post is to dispel both of them.

Misconception #1: Power BI dataflows are a Premium-only feature.

This misconception is hopefully easy to dispel. Just create a new app workspace and click on the “+ Create” button, and there it is. Boom.

Misconception #2: You can’t reuse dataflow entities without Premium.

This misconception is a little more interesting. Power BI dataflows include computed entities, which enable powerful reuse scenarios by performing in-lake compute on the CDM Folders where the entity data is stored. Computed entities are available only in workspaces backed by dedicated capacity.

Computed entities are so interesting we’ve been talking about them a lot, and this is likely at the root of this misconception: people have reached the conclusion that this is the only way to enable reuse with dataflows.

Let’s take a look at an example to demonstrate once and for all that this is not the case.

For this example I’ll be building on the Power BI Dataflows and World Bank Climate Change Data pattern I described last month. In that post I created a dataflow with two queries, each of which included M code that decompressed a zip file. Let’s refactor these queries in a new dataflow, and put the decompression logic in a query that can be referenced by the other two.

Let’s begin by removing the BI Polar Blog workspace from Premium capacity. Nothing up my sleeve![1]

2018-11-22_19-01-00

Power BI is kind enough to warn me that this will affect those dataflows I have already created that do use computed entities. Thank you, Power BI!

Now let’s create that new dataflow. If you look carefully, you’ll see that the Premium diamond won’t appear again in this post.

Since I already have the M script created, I’m just going to use the Blank Query option, and paste in that code[2].

2018-11-22_19-06-14

After I rename the query, I perform the most important step: I disable data loading for the entity I plan to reuse.

2018-11-22_19-07-37

Once this is done, I can proceed in the same way I would proceed in Power BI Desktop, by referencing this query to create new queries.

2018-11-22_19-07-53.png

As with the first entity, I already have the code written, so I’m just going to paste it in to the advanced editor. In other scenarios, I could also use the UI to complete the additional transformation steps… just like I could in Power BI Desktop.

When I’m done, I have three queries defined: two which will be loaded into entities, and one which contains reusable transformations and logic which is referenced by the other two.

2018-11-22_19-14-54.png

When I save my dataflow, only the two entities for which data loading is enabled are included in the entity list.

2018-11-22_19-25-21.png

And as expected, I can successfully save and refresh the dataflow, even though it contains query reuse, and even though it is not in a workspace backed by Premium capacity.

2018-11-22_19-28-35.png

At this point you may be asking yourself how this is different from computed entities. These are the differences that seem most significant to me:

  1. A computed entity gets data from files in a CDM Folder in any dataflow in any workspace[3], and loads data into the CDM Folder in any workspace. This is a data movement process, with source and destination entities each being backed by persisted data. To think in relational database terms, this is similar to an INSERT .. SELECT statement, where the query you write defines the data that is loaded into the target table.
  2. A referenced query like the one used in this example is never persisted anywhere. It defines reusable logic, but not reusable data. When the query is referenced by another query that defines a dataflow entity, its logic is included in the entity’s query when the dataflow is refreshed. To use another relational database analogy, this is similar to a view – it is a reusable data definition, but it is not persisted anywhere.
  3. Computed entities can reference entities in any dataflow in any workspace. Referenced queries can only reference queries in the same dataflow. This means that you cannot, for example, have a dedicated dataflow or workspace that serves as a repository of common reusable data logic for use in other dataflows and workspaces[4].

In summary, although computed entities are available only in Power BI Premium, this does not stop you from reusing query logic. These capabilities in Power BI dataflows are very similar to what you’ve always had in Power BI Desktop. I hope this helps clear the air a little!


[1] Please read this in the voice of Bullwinkle J. Moose. This is the voice in which I wrote it.

[2] In my earlier posts I often included screen shots of every single step. This was because at that point dataflows were still in private preview, and most readers did not yet have access to the feature. Now that all of you can do this yourselves, I’ll include screen shots of fewer steps. If you disapprove of this change, please let me know.

[3] Assuming that the workspace is on Premium capacity, is a new “v2” workspace, and that the dataflow developer has permissions to access it.

[4] If this last point is problematic for you, I would love to learn more about your scenario – what you’re trying to accomplish, and how this makes it more difficult for you to do so.