I’m running behind on my own YouTube publishing duties, but that doesn’t keep me from watching the occasional data culture YouTube video produced by others.
Like this one:
Ok… you may be confused. You may believe this video is not actually about data culture. This is an easy mistake to make, and you can be forgiven for making it, but the content of the video make its true subject very clear:
A new technology is introduced that changes the way people work and live. This new technology replaces existing and established technologies; it lets people do what they used to do in a new way – easier, faster, and further. It also lets people do things they couldn’t do before, and opens up new horizons of possibility.
The technology also brings risk and challenge. Some of this is because of the new capabilities, and some is because of the collision between the new way and the old way of doing things. The old way and the new way aren’t completely compatible, but they use shared resources and sometimes things go wrong.
At the root of these challenges is users moving faster than any relevant authorities. Increasing numbers of people are seeing the value of the new technology, assuming the inherent risk, and embracing its capabilities while hoping for the best.
Different groups see the rising costs and devise solutions for these challenges. Some solutions are tactical, some are strategic. And eventually some champions emerge to push for the creation of standard solutions. Or standards plural, because there always seems to be more than one of those darned things.
Not everyone buys into the standards at first, but over time the standards are refined and… actually standardized.
This process doesn’t slow down the technology adoption. The process and the standards instead provide the necessary shape and structure for adoption to take place as safely as possible.
With the passage of time, users take for granted the safety standards as much as they take for granted the capabilities of the technology… and can’t imagine using one without the other.
For the life of me I can’t imagine why they kept doubling down on the “lane markings” analogy, but I’m actually happy they did. This approach may get more people paying attention – I can’t find any other data culture videos on YouTube with 488K views…
 Part of this is because my wife has been out of town, and my increased parental responsibilities have reduced the free time I would normally spend filming and editing… but it’s mainly because I’m finding that talking coherently about data culture is harder for me than writing about data culture. I’ll get better, I assume. I hope.
 In this case, I watched while I was folding laundry. As one does.
 Yes, pun intended. No, I’m not sorry.
 Either through knowledge or through ignorance.
You may have seen things that make you say “that’s Power BI AF” but none of them have come close to this. It’s literally thePower BI AF.
That’s right – this week Microsoft published the Power BI Adoption Framework on GitHub and YouTube. If you’re impatient, here’s the first video – you can jump right in. It serves as an introduction to the framework, its content, and its goals.
Without attempting to summarize the entire framework, this content provides a set of guidance, practices, and resources to help organizations build a data culture, establish a Power BI center of excellence, and manage Power BI at any scale.
Even though I blog a lot about Power BI dataflows, most of my job involves working with enterprise Power BI customers – global organizations with thousands of users across the business who are building, deploying, and consuming BI solutions built using Power BI.
Each of these large customers takes their own approach to adopting Power BI, at least when it comes to the details. But with very few exceptions, each successful customer will align with the patterns and practices presented in the Power BI Adoption Framework – and when I work with a customer that is struggling with their global Power BI rollout, their challenges are often rooted in a failure to adopt these practices.
There’s no single “right way” to be successful with Power BI, so don’t expect a silver bullet. Instead, the Power BI Adoption Framework presents a set of roles, responsibilities, and behaviors that have been developed after working with customers in real-world Power BI deployments.
If you look on GitHub today, you’ll find a set of PowerPoint decks broken down into five topics, plus a few templates.
These slide decks are still a little rough. They were originally built for use by partners who could customize and deliver them as training content for their customers, rather than for direct use by the general public, and as of today they’re still a work in progress. But if you can get past the rough edges, there’s definitely gold to be found. This is the same content I used when I put together my “Is self-service business intelligence a two-edged sword?” presentation earlier this year, and for the most part I just tweaked the slide template and added a bunch of sword pictures.
And if the slides aren’t quite ready for you today, you can head over to the official Power BI YouTube channel where this growing playlist contains bite-size training content to supplement the slides. As of today there are two videos published – expect much more to come in the days and weeks ahead.
The real heroes of this story are Manu Kanwarpal and Paul Henwood. They’re both cloud solution architects working for Microsoft in the UK. They’ve put the Power BI AF together, delivered its content to partners around the world, and are now working to make it available to everyone.
What do you think?
To me, this is one of the biggest announcements of the year, but I really want to hear from you after you’ve checked out the Power BI AF. What questions are still unanswered? What does the AF not do today that you want or need it to do tomorrow?
Please let me know in the comments below – this is just a starting point, and there’s a lot that we can do with it from here…
 If you had any idea how long I’ve been waiting to make this joke…
 I can’t think of a single exception at the moment, but I’m sure there must be one or two. Maybe.
 Partners can still do this, of course.
 Other than you, of course. You’re always a hero too – never stop doing what you do.
Power BI dataflows have included capabilities for data lineage since they were introduced in preview way back in 2018. The design of dataflows, where each entity is defined by the Power Query that provides its data, enables a simple and easy view into its data lineage. The query is the authoritative statement on where the entity’s data comes from, and how it is transformed.
But what about everything else in a workspace? What about datasets, and reports, and dashboards? What about them?
Power BI has your back.
Late last month the Power BI team released a new preview capability that lets users view workspace content in a single end-to-end lineage view, in addition to the familiar list view.
Once the lineage view is selected, all workspace contents – data sources, dataflows, datasets, dashboards, and reports – are displayed, along with the relationships between them. Here’s a big-picture view of a workspace I’ve been working in lately:
There’s a lot to unpack here, so I’ll break down what feels to me like the important parts:
The primary data source is a set of text files in folders. The text files are produced by various web scraping processes, and each has a different format and contents.
The secondary data source is a set of reference and lookup data stored in Excel workbooks in SharePoint Online. These workbooks contain manually curated data that is used to cleanse, standardize and/or enrich the data from the primary data.
The primary data is staged with minimal transformation in a “raw” dataflow. This data is then progressively processed by a series of downstream dataflows, including mashing up with the secondary data from Excel, and reshaped into facts and dimensions.
There is one dataset based on the fact and dimension entities, and report based on this dataset. There’s a second dataset that includes data quality metrics from entities in multiple dataflows, and a report based on this dataset. And there are two dashboards, one that includes only visuals for data quality metrics, and one that presents the main data along with a few tiles from the quality report.
That overview is simplified enough as to be worthless from a technical understanding perspective, but it’s still a wall of text. Who wants to read that?
For a real-world workspace that implements a production BI application, there is likely to be more complexity, and less well defined boundaries between objects. How do you document the contents of a complex workspace, and the relationships between those components? How do you understand them well enough to identify and solve problems?
That’s where the lineage view comes in.
Let’s begin by looking at the data sources.
For data sources that use a gateway, I can easily see the gateway name. For other data sources I can see the data source location. We’re off to a good start, because I have a single place to look to see where my data is coming from.
Next, let’s look at the dataflows.
In addition to being able to see the dataflows and the dependencies between them, you can click on any dataflow to see the entities it contains, and can jump directly to edit the dataflow from this view.
This part of workspace lineage isn’t completely new – this is essentially what you could do with dataflows already. But now you can do it with datasets, reports, and dashboards as well.
Selecting a dataset shows me the tables it contains, and selecting a dashboard or report takes me directly to the visualization. But the real power of this view comes from the relationships between objects. The relationships are where data lineage comes to the fore.
The two primary questions asked in the context of data linage are around upstream “where does this data come from?” and downstream “where is this data used?” lineage scenarios.
The first question is often asked in the context of “why am I not seeing what I expect to see?” and the resulting investigation looks at upstream logic and data source to identify the root cause of problems.
The second question is often asked in the context of “what might break if I change this?” and the resulting investigation looks at downstream objects and processes.
The lineage view has a simple way to answer both questions: just click on the “double arrow” icon and the view will change to highlight all upstream and downstream objects. In a single click you can see where the data comes from, and where the data is used. Click again, and the view toggles back to the default view.
There’s more to lineage view than this, including support for shared and certified datasets, but this should be enough to get you excited. Be sure to check out the preview documentation as you check out the feature!
Update: We now have a video to supplement the blog post. Check it out!
Update: The Power BI blog now has the official announcement for this exciting feature. The blog post includes a look at where the lineage team is planning to invest to make this feature even better, and that all of the information in the lineage view is now available using the Power BI API. If you want to integrate lineage and impact analysis into your own tools, or if you want to build a cross-workspace lineage view, you now have the APIs you need to be successful!
 This is a pet project that may one day turn into a viable demo, assuming work and life let me devote a little more time to it…
 Different, annoying, and difficult to clean up.
 For example, the source web site allows any user to contribute, and although the contribution process is moderated there is no enforcement of content or quality. One artist may be credited for “guitar” on one album, “guitars” on another, “lead guitar” on a third. This sounds pretty simple until you take into account there were close to 50,000 different “artist roles” in the raw source data, that needed to be standardized down to a few hundred values in the final data model.
The reaction to this recent post on lineage and Power BI dataflows highlighted how important lineage is for Power BI. Although (as this post shows) there are lineage user experiences in place for dataflows today, and experiences coming for all artifacts in a workspace.
Even with these new experiences, there will still be times when you want or need to use the Power BI API to get insight into all the workspaces in the Power BI tenant for which you are an administrator. This ability isn’t new, but some recent updates to the Power BI admin API have made it easier.
GetGroupsAsAdmin is an API available to Power BI administrators that returns the workspaces for the Power BI tenant. With the information it returns, an admin can then call additional APIs like GetDatasetsInGroupAsAdmin and GetReportsInGroupAsAdmin to list their contents – and better understand and manage the tenant. This is a relatively straightforward pattern… but you do need to call up to four APIs for each workspace.
Now that GetGroupsAsAdmin supports $expand, you can get the full list of users, reports, dashboards, datasets, and dataflows in the workspace without needing to call any additional APIs. Pretty sweet.
With the information that’s returned, you can get a view of the contents of your Power BI tenant and start examining the relationships between the various objects, and now it’s simpler than ever. The API returns the workspace contents as JSON, which is easy enough to ingest and visualize using Power BI Desktop.
The Power BI team is continuing to add more features and experiences focused on governance and lineage, but the nature of oversight and governance is that most companies have specific tools and processes that require customization to one degree or another. Having a simple programmatic way to get workspace contents from your Power BI tenant will continue to be valuable even as these new experiences are delivered.
 As of when I’m writing, this lineage post has received more “first week views” than any other dataflows post I’ve made this year.
 When I wrote the first draft of this post in early July, dataflows were not yet included in the results of this API; they were added a few weeks later. I decided to wait to complete the post, and here it is, almost October. I really should know better by know.
There’s a lot to unpack here. and I don’t expect to do it all justice in this post, but Eric’s thought-provoking tweet made me want to reply, and I knew it wouldn’t fit into 280 characters… but I can tackle some of the more important and interesting elements.
First and foremost, Eric tags me before he tags Marco, Chris, or Curbal. I am officially number one, and I will never let Marco or Chris forget it.
With that massive ego boost out of the way, let’s get to the BI, which is definitely dead. And also definitely not dead.
Eric’s post starts off with a bold and simple assertion: If you have the reactive/historical insights you need today, you have enough business intelligence and should focus on other things instead. I’m paraphrasing, but I believe this effectively captures the essence of his claim. Let me pick apart some of the assumptions I believe underlie this assertion.
First, this claim seems to assume that all organizations are “good w/ BI.” Although this may be true of an increasing number of mature companies, in my experience it is definitely not something that can be taken for granted. The alignment of business and technology, and the cultural changes required to initiate and maintain this alignment, are not yet ubiquitous.
Should they be? Should we be able to take for granted that in 2019 companies have all the BI they need? 
The second major assumption behind Eric’s first point seems to be that “good w/ BI” today translates to “good w/ BI” tomorrow… as if BI capabilities are a blanket solution rather than something scoped and constrained to a specific set of business and data domains. In reality, BI capabilities are developed and deployed incrementally based on priorities and constraints, and are then maintained and extended as the priorities and constraints evolve over time.
My job gives me the opportunity to work with large enterprise companies to help them succeed in their efforts related to data, business intelligence, and analytics. Many of these companies have built successful BI architectures and are reaping the benefits of their work. These companies may well be characterized as being “good w/ BI” but none of them are resting on their laurels – they are instead looking for ways to extend the scope of their BI investments, and to optimize what they have.
I don’t believe BI is going anywhere in the near future. Not only are most companies not “good w/ BI” today, the concept of being “good w/ BI” simply doesn’t make sense in the context in which BI exists. So long as business requirements and environments change over time, and so long as businesses need to understand and react, there will be a continuing need for BI. Being “good w/ BI” isn’t a meaningful concept beyond a specific point in time… and time never slows down.
If your refrigerator is stocked with what your family likes to eat, are you “good w/ food”? This may be the case today, but what about when your children become teenagers and eat more? What about when someone in the family develops food allergies? What about when one of your children goes vegan? What about when the kids go off to college? Although this analogy won’t hold up to close inspection it hopefully shows how difficult it is to be “good” over the long term, even for a well-understood problem domain, when faced with easily foreseeable changes over time.
Does any of this mean that BI represents the full set of capabilities that successful organizations need? Definitely not. More and more, BI is becoming “table stakes” for businesses. Without BI it’s becoming more difficult for companies to simply survive, and BI is no longer a true differentiator that assures a competitive advantage. For that advantage, companies need to look at other ways to get value from their data, including predictive and prescriptive analytics, and the development of a data culture that empowers and encourages more people to do more things with more data in the execution of their duties.
And of course, this may well have been Eric’s point from the beginning…
 I’ve been serving on the jury for a moderately complex civil trial for most of August, and because the trial is in downtown Seattle during business hours I have been working early mornings and evenings in the office, and taking the bus to the courthouse to avoid the traffic and parking woes that plague Seattle. I am very, very tired.
 Please remind me to add “thought leader” to my LinkedIn profile. Also maybe something about blockchain.
 I’ll leave this as an exercise for the reader.
 At least in my reality. Your mileage may vary.
 Did this analogy hold up to even distant observation?
Every few weeks I see someone asking about using Analysis Services as a data source for Power BI dataflows. Every time I hear this, I cringe, and then include advice something like this in my response.
Using Analysis Services as a data source is an anti-pattern – a worst practice. It is not recommended, and any solution built using this pattern is likely to produce dissatisfied customers. Please strongly consider using other data sources, likely the data sources on which the AS model is built.
There are multiple reasons for this advice.
Some reasons are technical. Extraction of large volumes of data is not what an Analysis Services model is designed for. Performance for the ETL process is likely to be poor, and you’re likely end up with memory/caching issues on the Analysis Services server. Beyond this, AS models typically don’t include the IDs/surrogate keys that you need for data warehousing, so joining the AS data to other data sources will be problematic.
For some specific examples and technical deep dives into how and why this is a bad idea, check out this excellent blog post from Shabnam Watson. The focus of the post is on SSAS memory settings, but it’s very applicable to the current discussion.
Some reasons for this advice are less technical, but no less important. Using analytics models as data sources for ETL processing are a strong code smell (“any characteristic in the source code of a program that possibly indicates a deeper problem”) for business intelligence solutions.
Let’s look at a simple and familiar diagram:
There’s a reason this left-to-right flow is the standard representation of BI applications: it’s what works. Each component has specific roles and responsibilities that complement each other, and which are aligned with the technology used to implement the component. This diagram includes a set of logical “tiers” or “layers” that are common in analytics systems, and which mutually support each other to achieve the systems’ goals.
Although there are many successful variations on this theme, they all tend to have this general flow and these general layers. Consider this one, for example:
This example has more complexity, but also has the same end-to-end flow as the simple one. This is pretty typical for scenarios where a single data warehouse and analytics model won’t fulfill all requirements, so the individual data warehouses, data marts, and analytics models each contain a portion – often an overlapping portion – of the analytics data.
Let’s look at one more:
This design is starting to smell. The increased complexity and blurring of responsibilities will produce difficulties in data freshness and maintenance. The additional dependencies, and the redundant and overlapping nature of the dependencies means that any future changes will require additional investigation and care to ensure that there are no unintended side effects to the existing functionality.
As an aside, my decades of working in data and analytics suggest that this care will rarely actually be taken. Instead, this architecture will be fragile and prone to problems, and the teams that built it will not be the teams who solve those problems.
And then we have this one:
This is what you get when you use Analysis Services as the data source for ETL processing, whether that ETL and downstream storage is implemented in Power BI dataflows or different technologies. And this is probably the best case you’re likely to get when you go down this path. Even with just two data warehouses and two analytics models in the diagram, the complex and unnatural dependencies are obvious, and are painful to consider.
What would be better here? As mentioned at the top of the post, the logical alternative is to avoid using the analytics model and to instead use the same sources that the analytics model already uses. This may require some refactoring to ensure that the duplication of logic is minimized. It may require some political or cross-team effort to get buy-in from the owners of the upstream systems. It may not be simple, or easy. But it is almost always the right thing to do.
Don’t take shortcuts to save days or weeks today that will cause you or your successors months or years to undo and repair. Don’t build a house of cards, because with each new card you add, the house is more and more likely to fall.
Update: The post above focused mainly on technical aspects of the anti-pattern, and suggests alternative recommended patterns to follow instead. It does not focus on the reasons why so many projects are pushed into the anti-pattern in the first place. Those reasons are almost always based on human – not technical – factors.
 Something a lot like this. I copied this from a response I sent a few days ago.
 Many thanks to Chris Webb for some of the information I’ve paraphrased here. If you want to hear more from Chris on this subject, check out this session recording from PASS Summit 2017. The whole session is excellent; the information most relevant to this subject begins around the 26 minute mark in the recording. Chris also gets credit for pointing me to Shabnam Watson’s blog.
 I learned about code smells last year when I attended a session by Felienne Hermans at Craft Conference in Budapest. You can watch the session here. And you really should, because it’s really good.
 My eyes are itching just looking at it. It took an effort of will to create this diagram, much less share it.
When you build a Power BI dataflow, lineage is built in. It’s not an added feature – it’s a fundamental aspect of how dataflows work. Instead of having an ETL process that performs the data movement and compute in one place and the data storage in another, dataflow entities are defined by a Power Query “M” query. The ETL logic in the query and the data storage are defined as a single unit. The result is that not only does a dataflow contain the data, it contains the full lineage about where the data came from and how it was transformed.
This lineage information is what makes possible the diagram view that’s available today. As shown above, the diagram view provides a workspace-level lineage view of all dataflows in the workspace, their relationships, and the data sources from which they extract data. – including data sources that are dataflows in other workspaces.
It’s worth emphasizing that this diagram view isn’t an editor per se – it’s not where you define lineage relationships, it’s where you can view, explore, and understand the lineage relationships that are automatically created when you build your dataflows. At the same time, the diagram view does let you edit and manage the dataflows in your workspace in the same way as the list view.
The dataflows API lets developers build similar experiences and automated processes. Specifically, the Get Dataflow API returns the model.json file that contains all the dataflow metadata, including the M scripts for the queries that define the dataflow entities.
Of course, dataflows are just one part of a complete lineage and impact analysis story in Power BI. Ideally a Power BI workspace administrator would be able to see all data sources, dataflows, datasets, reports, and dashboards in a single view, and to easily navigate and manage the relationships and dependencies between them. Something like this:
At the Microsoft Business Applications Summit in June, Microsoft shared plans for lineage across all artifacts in a workspace. This represents a continuation of the current functionality available in dataflows and of the recently-announced shared and certified datasets.
With this upcoming functionality, users working with reports and dashboard can easily see where the data comes from, and how and where it has been transformed. I’m hesitant to call this the “holy grail” of business intelligence, but I’ve heard it described this was enough times that I wouldn’t argue too loudly if someone did. If I a dollar for every meeting I’ve been in where the agenda was derailed by and argument about whether the data was right… In any event, this is a very common problem, with few good solutions.
How are you using the lineage capabilities in dataflows today? Do you have processes that rely on the dataflows lineage UI or API? I’d love to hear about what you’re doing now and what you’re planning – let me know.
 Such as having a batch job, stored procedure, or SSIS Data Flow that loads into a SQL Server table or HDFS folder.