This capability has been in preview since early this year, so it’s not really new, but there are enough pieces involved that it may not be obvious how to begin – and I continue to see enough questions about this topic that another blog post seemed warranted.
The key point is that because dataflows are writing data to ADLSg2 in CDM folder format, Azure Machine Learning and Azure Databricks can both read the data using the metadata in the model.json file.
This json file serves as the “endpoint” for the data in the CDM folder; it’s a single resource that you can connect to, and not have to worry about the complexities in the various subfolders and files that the CDM folder contains.
This tutorial is probably the best place to start if you want to know more. It includes directions and sample code for creating and consuming CDM folders from a variety of different Azure services – and Power BI dataflows. If you’re one of the people who has recently asked about this, please go through this tutorial as your next step!
 It’s the best resource I’m aware of – if you find a better one, please let me know!
At first I chalked this off as an old dog trying to learn a new trick, but I’ve been thinking about this since then, and I’m not sure that this is the case. I think that it may have been the result of differing perspectives – and differing expectations resulting from those perspectives.
My perspective is that as a data integration tool, Power Query will work the way that my ~20 years as a data professional have trained me to expect data integration tools will work. If there’s a query language or formula language or expression language that is required to access a specific source, I expect that language to be identified and documented in the tool.
The Power Query team, on the other hand, may have had a different perspective here. I haven’t explicitly asked them, but I suspect that their perspective is that it’s 2018, and anyone working with HTML data already knows what CSS selectors are, and either knows how to use them, or where to look to learn enough to use them.
I don’t know which perspective is more valid. Part of me believes that mine is, and bemoans the time I spent struggling to achieve a simple goal, because the documentation didn’t connect the dots for me. But I also note that no one – not in comments here, not on Twitter – has said that they were similarly challenged.
But I can say this: A difference in perspective meant that what was delivered wasn’t what was needed, at least by one user.
Another example of this type of mismatch is one I see too often at conferences: Microsoft presenters using Microsoft’s specialized vocabulary when speaking with non-Microsoft audiences. This typically takes the form of using internal code names and acronyms, rather than official product names – if you’ve been to more than a handful of Microsoft events you’ve probably seen this yourself. I think the worst example I’ve seen was when a presenter mentioned that an upcoming feature was coming “in the scandium time frame.”
Every culture – whether it’s centered in a geographical region, a profession, a religion, or a large corporation – has a specialized internal nomenclature. It enables members to communicate more efficiently. This isn’t a bad thing – it’s natural and good, and helps teams and groups deliver on their goals and priorities.
But problems can and do arise when one party doesn’t take the other party’s background into consideration. This is where having a diverse team can help.
When a team has a diverse makeup, it makes it more likely that potential problems will be prevented before they need to be identified, and identified before they need to be fixed.
If you want to be more efficient and to produce products and services (and documentation!) that delivers on your customers’ needs the first time, every time, by default, your team makeup should reflect the customers who use your product. If you look around and everyone on your team looks the same, this should be a warning sign that customers who don’t look like you probably don’t have the same experience that you do.
And if you don’t see that as a problem, you should probably look elsewhere for your problem. Try looking in the mirror.
Update: Two days after this post was published, David Heinemeier Hansson posted a blistering example of why diversity is so important, using his wife’s experience with Apple’s new credit card to drive the point home. I strongly recommend reading the whole thread.
The @AppleCard is such a fucking sexist program. My wife and I filed joint tax returns, live in a community-property state, and have been married for a long time. Yet Apple’s black box algorithm thinks I deserve 20x the credit limit she does. No appeals work.
 I started writing this post in November 2018, and it’s been languishing in my drafts folder ever since. I’m making an effort to clean up my drafts by the end of the year, so hopefully this one will actually see the light of day before it’s 2020. Fingers crossed…
 CSS selectors.
 I feel like I’m enough of a problem child most days, so I try not to bother them unless it’s really necessary.
 Although thankfully not nearly as often as I used to.
 If you know what this means, you work on the Azure team. Sadly, the people in the audience did not work on the Azure team. Thankfully, someone in the audience stood up and asked for clarification.
 When I worked on the Azure team I still didn’t know. I was constantly asking for clarification in meeting after meeting and email after email. Maybe I am just slow…
Miguel’s profile picture is even older than Matthew’s profile picture
All snark aside, Miguel and the whole dataflows team have been awfully busy, and it’s great to see their work available to Power BI authors. I won’t attempt to repeat what’s in the announcement, but I will highlight the new capabilities that have me most excited:
Support for data profiling in Power Query Online – we’ve had this in Power BI Desktop for a while, but it’s just as important for dataflows as it is for datasets.
Better support for files and folders – a lot of the data I play with these days is in folders full of text files, and Power Query Online hasn’t had the best experience for working with this type of data.
Better support for query parameters – there are lots of scenarios where having parameterized queries makes working with dataflows easier, and now Power Query Online makes it easier to work with query parameters.
Do yourself a favor and check out the whole list. Odds are there’s something you’ve been waiting for that will excite you as much as these new capabilities excite me.
And I can’t wait to hear what they are…
 No, I don’t believe that’s possible either, but it is nice to see that you’ve been paying attention.
 Very little of my actual work involves data prep these days, so I need to find data to play with to avoid getting too bored.
This week’s Power BIte is the second in a series of videos that present different ways to create new Power BI dataflows, and the results of each approach.
When creating a dataflow by defining new entities, the final dataflow will have the following characteristics:
Data ingress path
Ingress via the mashup engine hosted in the Power BI service, using source data that is also managed by the Power BI service, taking advantage of locality of data.
Data stored in the CDM folder defined for the dataflow for computed entities. Data for linked entities remains in source dataflow and is not moved or copied.
The dataflow is refreshed based on the schedule and policies defined in the workspace.
Let’s look at the dataflow’s model.json metadata to see some of the details.
At the top of the file we can see the mashup definition, including the query names and load settings on lines 11 through 35 and the Power Query code for all of the entities on line 37. This will look awfully familiar from the last Power BIte post.
Things start to get interesting and different when we look at the entity definitions:
On line 80 we can see that the Product entity is defined as a ReferenceEntity, which is how the CDM metadata format describes what Power BI calles linked entities. Rather than having its attribute metadata defined in the current dataflow’s model.json file, it instead identifies the source entity it references, and the CDM folder in which the source entity is stored, similar to what we saw in the last example. Each modelId value for a linked entity references the id value in the referenceModels section as we’ll see below.
The Customers with Addresses entity, defined starting on line 93, is the calculated entity built in the video demo. This entity is a LocalEntity, meaning that its data is stored in the current CDM folder, and its metadata includes both the location, and its full list of attributes.
The end of the model.json file highlights the rest of the differences between local and linked entities.
At line 184 we can see the partitions for the Customers with Addresses entity, including the URL for the data file backing this entity. Because the other entities are linked entities, their partitions are not defined in the current model.json.
Instead, the CDM folders where their data does reside are identified in the referenceModels section starting at line 193. The id values in this section match the modelId values for the model.json file, above, and the location values are the URLs to the model.json files that define the source CDM folders for the linked entities.
If this information doesn’t make sense yet, please hold on. We’ll have different values for the same attributes for other dataflow creation methods, and then we can compare and contrast them.
I guarantee it will make as much sense as anything on this blog.
When I joined Microsoft, the team running my NEO session shared a piece of advice with my “class” of fresh-faced new hires:
“Think of your first year as a grace period where you can ask any question you want, without anyone thinking it’s a stupid question for which you should already know the answer.”
This sounded like empowering wisdom at the time, but the unspoken side of it was damaging. Between the lines, I heard this message as well, and it was this part that stuck with me:
“You’ve got one year to figure things out, and after that you’d better have your act together and know everything – because if you keep asking stupid questions we’ll know that we made a mistake hiring you.”
I hope it’s obvious that this wasn’t the intent of the advice, but I’ve spoken to enough people over the years to know that I’m far from the only one to take it this way.
In retrospect, I believe that I should have known better, but I let this unspoken message find a home in my brain, and I listened to it. I remained quiet – and remained ignorant – when I should have been asking questions.
Over the past few years, I have finally broken this self-limiting habit. Day after day, and meeting after meeting, I’m the guy asking the questions that others are thinking, and wishing they could ask, but don’t feel comfortable or confident enough to speak up. I’m the guy asking “why” again and again until I actually understand. And people are noticing.
How do I know that others want to ask the same questions?
I know because they tell me. Sometimes they say thank you in the meetings, and sometimes ask their own follow-up questions. Sometimes they stop me in the hall after the meeting to say thank you. And on a few occasions people have set up 1:1 meetings with me to ask about what I do, and how they can learn to do the same.
How do I know that people are noticing?
A few of the people I’ve interrupted to ask “why” have also set up time with me to discuss how they can better communicate and prepare to have more useful and productive meetings. These are generally more experienced team members who appreciate that my questions are highlighting unstated assumptions, or helped them identify areas where they needed a clearer story to communicate complex topics.
I’ve been with Microsoft since October 2008 – a little over 11 years. I’ve been working in the industry since the mid 90s. At this point in my career it’s easy for me to ask “stupid” questions because the people I work with know that I’m not stupid.
This isn’t true for everyone. If you’re younger, new to role or new to career, or from an underrepresented group, you may not have the position of relative safety that I have today. My simple strategy of “asking lots of questions in large groups” may not work for you, and I don’t have tested advice for what will.
My suggestion is to ask those questions in small groups or 1:1 situations where the risk is likely to be lower, and use this experience to better understand your team culture… but I would love to hear your experience and advice no matter what you do. Just as your questions will be different from mine while still being useful, your experience and advice will be different, and will be useful and helpful in different ways.
Whatever approach you take, don’t be quiet. Don’t stop asking questions.
Especially the stupid questions.
Those are the best ones.
P.S. While this post has been scheduled and waiting to go live, there have been two new articles that showed up on my radar that feel very relevant to this topic:
Ex-Microsoftie James Whittaker posted on Speaking Truth to Power, which presents critical observations of three eras of Microsoft, with examples of how different generations of leaders have affected the corporate culture.
These are very different articles, but they’re both fascinating, and well worth the time to read.
 New Employee Orientation. Sadly not anything to do with Keanu Reeves.
 I wish I could say that this was a deliberate strategic move on my part, but it was more reactive. Credit goes to the rapid pace of change and my inability to even pretend I could keep up.
 Yes, that means meetings where fewer people interrupt to ask “why.”
 I’m waiting for the flood of snarky comments from my teammates on this one…
 Or in any event, my asking questions is unlikely to change any minds on this particular point.
 If you’re not a middle-aged, white, cisgender man…