Microsoft Fabric has only been in preview for a week, and I’ve already written one post that covers data governance – do we really need another one already?
Dave’s excellent question and comment[1] got me thinking about why OneLake feels so important to him (and to me) even though Fabric is so much more than any one part – even a part as central as OneLake. The more I thought about it, the more the pieces fell into place in my mind, and the more I found myself thinking about one of my favorite quotes[2]:
A complex system that works is invariably found to have evolved from a simple system that works. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work.
Please take a minute to reflect on this quote. Ask yourself if Fabric is a complex system that works, what is the simple system that works? We’ll come back to that.
One of the most underappreciated benefits of Power BI as a managed SaaS data platform has been the “managed” part. When you create a report, dataset, dataflow, or other item in Power BI, the Power BI service knows everything about it. Power BI is the authoritative system for all items it contains, which means that Power BI can answer questions related to lineage (where does the data used by this report come from?) and impact analysis (where is the data in this dataset used?) and compliance (who has permissions to access this report?) and more.
If you’ve ever tried to authoritatively answer questions like these for a system of any non-trivial scope, you know how hard it is. Power BI has made this information increasingly available to administrators, through logs and APIs, and the community has built a wide range of free and paid solutions to help admins turn this information into insights and action. Even more excitingly, Power BI keeps getting better and better even as the newer parts of Fabric seem to be getting all of the attention.
What what does all this have to do with Fabric and OneLake and simple systems?
For data governance and enablement, Power BI is the simple system that works. OneLake is the mechanism through which the additional complexity of Fabric builds on the success of Power BI. Before the introduction of Fabric, the scope of Power BI was typically limited to the “final mile” of the data supply chain. There is a universe of upstream complexity that includes transactional systems, ETL/ELT/data preparation systems, data warehouses, lakes, and lakehouses, and any number of related building blocks. Having accessible insights into the Power BI tenant is great, but its value is constrained by the scope of the tenant and its contents.
All Fabric workloads use OneLake as their default data location. OneLake represents the biggest single step forward in moving from simpler to more complex, because it is the big expansion in the SaaS foundation shared by all Fabric workloads new and old. Because of Fabric, and because OneLake is the heart of Fabric, governance teams can now get more of the things they love about Power BI for more parts of the data estate.
Why should your governance team be excited about Microsoft Fabric? They should be excited because Fabric has the potential of making their lives much easier. Just as Fabric can help eliminate the complexity of integration, it can also help reduce the complexity of governance.
[1] Yes, we have Dave to thank and/or blame for this post.
[2] This massive pearl of wisdom is from The Systems Bible by John Gall. I first encountered it in the early 90s in the introduction to an OOP textbook, and have been inspired by it ever since. This quote should be familiar to anyone who has ever heard me talk about systems and/or data culture.
Over the holiday weekend I joined Adam Saxton for a Fabric-focused Guy in a Cube live stream. It was a ton of fun, as these streams always are, and we ended up going for 90 minutes instead of the scheduled 60[1].
Come for the hard-earned wisdom. Stay for the ridiculous facial expressions.
During the stream there were a few questions (including this one) where the answer involved understanding where Power BI ends and Microsoft Fabric begins. I suspect that this will be a common source of confusion in these early days of Fabric, and I hope this post will help clarify things.
Let’s begin at the beginning, with Power BI. When you think about Power BI, you’re probably thinking about something like this: Power BI as a SaaS cloud service for business intelligence.
The reality has always been a little more nuanced than this. Under the hood, Power BI has always included two main components: BI artifacts that operate on a general-purpose data-centric SaaS foundation.
Power BI includes artifacts like reports, datasets, dataflows, and more. Each artifact includes a set of user experiences and the underlying capabilities required to make those experiences work – things like the Analysis Services Tabular engine for datasets.
Power BI also includes things like workspaces, authentication and authorization, tools for content sharing, discovery, and management, administration and governance, and more. These foundational capabilities apply to every BI artifact.
When Power BI added support for dataflows, paginated reports, or goals, they worked just like you expected them to work based on your past experiences with datasets and reports. This familiarity was a function of that shared SaaS foundation, but for most people there was never a reason to think about Power BI as a service foundation with an increasing number of workloads running on that foundation.
With the introduction of Microsoft Fabric the distinction between the workloads and the service foundation becomes more apparent, because there are more workloads for more practitioner personas – and because the foundation includes OneLake[2], a shared capability that is used by all workloads, but isn’t fully abstracted into the service foundation.
With Fabric, the collection of artifacts and experiences running on top of the service foundation has expanded significantly, but those new experiences will be familiar to Power BI users because they run on the same foundation that Power BI experiences have been running on for years.
As you’re exploring Fabric you may find yourself asking questions like “will X feature from Power BI work for this new Fabric thing?” The most likely answer is if the feature is shared across multiple Power BI artifacts and experiences, it will work for new Fabric artifacts and experiences as well. While Fabric is in preview there will be more exceptions to this rule[3], but as Fabric moves to general availability these exceptions will be fewer.
With Fabric, the BI experiences and capabilities you’ve used and loved in Power BI will remain, and will continue to improve. And because all Fabric workloads run on the SaaS foundation that has supported Power BI for years, as that foundation grows and improves each workload will benefit from those improvements.
I’ll close with a gif. I’m generally not a big fan of gifs, but I think this one will illustrate and reinforce the story I’m trying to tell. Enjoy!
[1] Despite this bonus time we ended up leaving tons of questions unanswered. I think there might be enough interest in Fabric to keep the cube guys busy for a while.
DALL-E prompt “power bi tenant settings administrator” because I couldn’t think of a better image to use
Until now, there hasn’t been a way to programmatically monitor tenant settings. Administrators needed to manually review and document settings to produce user documentation or complete audits. Now the GetTenantSettings API enables administrators to get a JSON response with all tenant settings and their values. With this information you can more easily and reliably share visibility into tenant settings for all of the processes where you need them.
If you’re a visual learner, check out this excellent video from Robert Hawker at Meloro Analytics that walks through using and understanding the API.
That’s it. That’s the post. I almost missed this important announcement with all of the other news this week – and I wanted to make sure you didn’t miss it too.
[1] If you haven’t attended one of our past events, we’re both going to be in Dublin in less than two weeks, and I will be in Copenhagen in September. Given the way our schedules are looking, we don’t expect to have any more in-person appearances before the end of the year. If you’ve been waiting for an event closer to you, you’ll probably be waiting until 2024 or later.
The data internet this week is awash with news and information about Microsoft Fabric. My Introducing Microsoft Fabric post on Tuesday got just under ten thousand views in the first 24 hours, which I believe is a record for this blog.
Even more exciting than the numbers are the comments. Bike4thewin replied with this excellent comment and request:
I would love to hear your thought on how to adopt this on Enterprise level and what could be the best practices to govern the content that goes into OneLake. In real life, I’m not sure you want everyone in the organisation to be able to do all of this without compromising Data Governance and Data Quality.
There’s a lot to unpack here, so please understand that this post isn’t a comprehensive answer to all of these topics – it’s just my thoughts as requested.
In the context of enterprise adoption, all of the guidance in the Power BI adoption roadmap and my video series on building a data culture applies to Fabric and OneLake. This guidance has always been general best practices presented through the lens of Power BI, and most of it is equally applicable to the adoption of other self-service data tools. Start there, knowing that although some of the details will be different, this guidance is about the big picture more than it is about the details.
In the context of governance, let’s look at the Power BI adoption roadmap again, this time focusing on the governance article. To paraphrase this article[1], the goal of successful governance is not to prevent people from working with data. The goal should be to make it as easy as possible for people to work with data while aligning that work with the goals and culture of the organization.
Since I don’t know anything about the goals or culture that inform Bike4thewin’s question, I can’t respond to them directly.. but reading between the lines I think I see an “old school” perspective on data governance rearing its head. I think that part of this question is really “how do I keep specific users from working with specific data, beyond using security controls on the data sources?”
The short answer is you probably shouldn’t, even if you could. Although saying “no” used to work sometimes, no matter what your technology stack is, saying “yes, and” is almost always the better approach. This post on data governance and self-service BI[2] provides the longer answer.
As you’re changing the focus of your governance efforts to be more about enabling the proper use of data, Fabric and OneLake can help.
Data in OneLake can be audited and monitored using the same tools and techniques you use today for other items in your Power BI tenant. This is a key capability of Fabric as a SaaS data platform – the data in Fabric can be more reliably understood than data in general, because of the SaaS foundation.
The more you think about the “OneDrive for data” tagline for OneLake, the more it makes sense. Before OneDrive[3], people would store their documents anywhere and everywhere. Important files would be stored on users’ hard drives, or on any number of file servers that proliferated wildly. Discovering a given document was typically a combination of tribal knowledge and luck, and there were no reliable mechanisms to manage or govern the silos and the sprawl. Today, organizations that have adopted OneDrive have largely eliminated this problem – documents get saved in OneDrive, where they can be centrally managed, governed, and secured.
To make things even more exciting, the user experience is greatly improved. People can choose to save their documents in other locations, but by default every Office application saves to OneDrive by default, and documents in OneDrive can be easily discovered, accessed, and shared by the people who need to work with them, and easily monitored and governed by the organization. People still create and use the documents they need, and there are still consistent security controls in place, but the use of a central managed SaaS service makes things better.
Using OneLake has the potential to deliver the same type of benefits for data that OneDrive delivers for documents. I believe that when we’re thinking about what users do with OneLake we shouldn’t be asking “what additional risk is involved by letting users do the things they’re already doing, but in a new tool?” Instead, we should ask “how we enable users to do the things they’re already doing using a platform that provides greater visibility to administrators?”
In addition to providing administrator capabilities for auditing and monitoring, OneLake also includes capabilities to data professionals who need to discover and understand data. The Power BI data hub[4] has been renamed the OneLake data hub in Fabric, and allows users to discover data in the lake for which they already have permissions, or which the owners have marked as discoverable.
The combination of OneLake and the OneLake data hub provide compelling benefits for data governance: it’s easier for users to discover and use trusted data without creating duplicates, and it’s easier for administrators to understand who is doing what with what data.
I’ll close with two quick additional points:
Right before we announced Fabric, the Power BI team announced the preview of new admin monitoring capabilities for tenant administrators. I haven’t had the chance to explore these new capabilities, but they’re designed to make the administrative oversight easier than ever.
I haven’t mentioned data quality, even though it’s part of the comment to which this post is responding. Data quality is a big and complicated topic, and I don’t think I can do it justice in a timely manner… so I’m going to take a pass on this one for now.
Thanks so much for the awesome comments and questions!
[1] And any number of posts (1 | 2 | 3 | 4 | 5 | 6 | 7 | …) on this site as well.
[2] The linked post is from exactly two years ago, as I write this new post. What are the odds?
[3] In this context I’m thinking specifically about OneDrive for Business, not the consumer OneDrive service.
[4] The data hub was originally released in preview in late 2020, and has been improving since then. It’s one of the hidden gems in Power BI, and is a powerful tool for data discovery… but I guess since I haven’t blogged about it before now, I guess I can’t complain too loudly when people don’t know it exists.
Today I can announce that there will be three Microsoft Fabric sessions during the main Data Ceili event on Friday. All three will be presented by members of the Fabric CAT team at Microsoft, and each will be based on deep engagement with the product team and private preview customers.
The three sessions should complement each other well. I’ll be covering the basics of the topics Luke and Kasper will cover in more depth as part of a more comprehensive overview, and Kasper and Luke will recap the big-picture intro before getting into the details of their more focused technical sessions.
The full details are available on the conference schedule. This looks like it’s going to be an exciting event, and I hope to see you there!
At this point, I suspect someone might be saying “wait a minute – did you say Fabric CAT team?”
Why yes, yes I did.
We don’t know what the team logo will be, but because of generative AI we have lots of cute examples of what it definitely won’t be.
As you know, I’ve been part of the Power BI CAT team for the last five years or so, and I’m thriving on that team. One of the reasons I love this team so much is how it periodically reinvents itself to remain aligned with the evolving needs of the customers and product teams we support. Sometimes these changes are smaller, sometimes they’re bigger, and this time the change was big enough we needed to change the team name.
The not unlike how Microsoft Fabric represents the evolution of Power BI and Synapse, Fabric CAT represents the evolution of the Power BI CAT and Synapse CSE[2] teams. We’re now a single team that’s better together, and I have one more reason to be excited about the future.
I know I can look forward to seeing you in Dublin, so I guess that should be two more reasons to be excited.
[1] Yes, the discussion during the pre-conference will include Fabric.
This week at Microsoft Build, we announced the future.
With an introduction like that, I should probably remind everyone that this is my personal blog, my personal perspective, and my personal opinions. Although I am a Microsoft employee, I am not speaking for or otherwise representing my employer with this post or anything else on this blog.
With that disclaimer out of the way, let’s get back to the future. Let’s get back to Microsoft Fabric.
According to the official documentation, “Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, real-time analytics, and business intelligence.” Fabric is implemented as a SaaS service for all sorts of data, analytics, and BI capabilities, including:
I’ve been working on Fabric for around 18 months[1], and I could not be more excited to finally share it with the world. I don’t own any of the features coming in Fabric, but my team and I have been running an NDA private preview program with thousands of users from hundreds of customer organizations around the world building solutions using Fabric, and providing feedback to the product team.
This introductory blog post won’t attempt to be super technical or comprehensive. Instead, I’m going to share the information I’ve shared most frequently and consistently during the Fabric private preview – the information and context that will help you get started, and help put that more technical information into context.
For folks who are already familiar with Power BI[2], Fabric is going to feel familiar from day one. This is because the SaaS foundation on which Fabric is built is the Power BI service you already use every day.
The SaaS foundation of Microsoft Fabric
The foundation is evolving and improving, and there are new capabilities in lots of places, but the foundation of Fabric is the foundation of Power BI. This means that from day one you already know how to use it:
Workspaces – Fabric workspaces behave like Power BI workspaces, but with more item types available.
Navigation – If you know how to move around the Power BI portal you know how to move around the Fabric portal, because it works the same way.
Collaboration and content management – You can collaborate and share with Fabric items and workspaces just like you do with Power BI.
Capacities – New Fabric workloads use the capacity-based compute model used by Power BI Premium. If you don’t already have a capacity, you can start a free trial.
Administration – Fabric administration works like Power BI administration, and the Fabric admin portal is the evolution of the Power BI admin portal. To enable the Fabric preview in your Power BI tenant or for a specific capacity, you can use the admin portal.
Much, much more – I won’t try to list everything here, because there’s already so much documentation available.
At this point you probably get the idea. If you’re familiar with Power BI, you’re going to have an easy time getting used to Fabric. Power BI will continue to evolve and grow, and there are a lot of exciting improvements coming to Power BI in Fabric[3] even without taking the new capabilities into account.
But what about those new capabilities? What about all the new data integration, data science, data engineering, data warehousing, and real-time analytics capabilities? How familiar will they be?
That’s a slightly more complicated question. In a lot of ways these new Fabric workloads are the evolution of existing Azure data services, including Azure Synapse, Azure Data Factory, and Azure Data Explorer. These established PaaS services have been updated and enhanced to run on the Fabric shared SaaS foundation, and their user experiences have been integrated into the Fabric portal.
If you’re already familiar with Azure Synapse, Azure Data Factory, and/or Azure Data Explorer, the new capabilities in Fabric will probably be familiar too. You already know how to work with pipelines and notebooks, and you already know how to write SQL and KQL queries – in Fabric you’ll just be doing these familiar things in a new context.
There are a few key Fabric concepts that I’ve seen more new-to-Power BI preview customers as questions about. If you or your colleagues are more Azure-savvy than Power-savvy, you’ll probably want to pay attention to:
Capacities – Fabric uses capacities for compute across all experiences[4], which provides a consistent billing and consumption model, but which will necessitate a change in thinking for folks who are used to other service-specific approaches to billing and consumption.
Workspaces – Other services don’t have the same concept of a workspace as Power BI and Fabric do… but some of them have different concepts with the same name. Since workspaces are a crucial tool for content creation, organization, and security, understanding them and how they work will be important for success with Fabric.
A “managed” SaaS data service – In most data services, the “catalog” of items and their relationships is expressed through the metadata of a given instance. This means that capabilities like discovery, lineage, impact analysis are either absent, limited in scope, or only available through integration with an external data catalog or similar service. Fabric, like Power BI, maintains an internal data catalog for all items in the tenant, and their relationships to each other. This information is exposed through APIs and integrated into experiences like the workspace lineage view and the data hub, making it easier to discover, understand, and use data.
In addition to things in Fabric that will be familiar to people with Power BI experience and things in Fabric that will be familiar to people with Azure data experience, there’s one huge part of Fabric that is going to be new to everyone: OneLake.
OneLake is a SaaS data lake that is a key component of the Fabric SaaS foundation[5]. Every Fabric tenant includes a single OneLake instance, and all Fabric experiences work natively with data in the lake.
OneLake is open – OneLake is built on ADLS Gen2. You can store any type of file, and use the same APIs you use when connecting to ADLS Gen2. Storing data in OneLake doesn’t mean it’s locked into Fabric – it means it can be used where and how you need it to be used.
Delta by default – Fabric experiences store their data in OneLake in parquet delta files. Delta is an open source, compressed columnar format that supports ACID transactions, and is supported by a wide range of tools.
Store once, use everywhere – Because there’s one OneLake, data can be ingested and stored once and used where it’s needed. You can have a single set of delta files that are exposed as a lakehouse and manipulated using notebooks, while at the same time are exposed as a warehouse and manipulated using SQL, and exposed as a Power BI tabular dataset in DirectLake mode. This decoupling of storage and compute is enabled by OneLake, and I expect it to be one of the most significantly game-changing aspects of Fabric as a whole.
OneLake is integrated – Being open makes it easy for you to store your data in OneLake while using it with whatever tools and compute engines you choose. OneLake shortcuts allow you to keep your data where you have it today while logically exposing it as if it were stored in OneLake.
OneLake takes the familiar concept of a data lake, and puts it where no one sems to expect it: at the center of work, where it makes sense, deeply integrated into the tools and experiences used by everyone contributing to a project or product.
With all of these new and familiar capabilities coming together into a single SaaS platform, the next thing that Fabric delivers is a comprehensive set of user experiences.
Modern data projects often involve a wide range of practitioners – data scientists, data engineers, data developers, BI developers, report authors, and more. Before Fabric, each practitioner persona would typically work in their own set of tools and experiences, each of which had its own strengths and weaknesses and capabilities. When taken together, this means that most projects involve significant integration effort to make the output of one tool work with the next tools in the value chain – and often there are tradeoffs made to accommodate the mismatch between tools.
With Fabric, each task and persona has a purpose-built set of experiences that all work natively with the same data in OneLake. The result is that data practitioners can focus on delivering value through their data work.. not on building integrations so their tools will work together. Teams can set up workspaces that contain the data and items they need – lakehouses, warehouses, notebooks, spark jobs, pipelines, dataflows, dataset, reports, and more. Data in one workspace can be used in other workspaces as needed, and because of OneLake it can be stored once and used multiple times without duplication.
During the Fabric private preview, the chief data officer of a well-known global organization[6] said something to the effect of:
With Fabric I can finally be a Chief Data Officer instead of being a Chief Integration Officer.
And this is why I believe Fabric represents the future of data.
Think back 10-12 years when the first generation of PaaS data services were becoming available. Many data practitioners looked at them and dismissed them as solutions to imaginary problems – why would we ever need a cloud service when we had these beautiful database servers in our own data centers, with IO subsystems we’ve designed to our own specs and fine-tuned to the nth degree[7]? It took time for people to realize the value and advantage of the cloud, but today there are entire classes of problems that simply don’t exist anymore because of the cloud.
I believe that the integrated, open, flexible, SaaS nature of Fabric means that we’re at an inflection point as significant for data as the advent of the cloud. Fabric will eliminate entire classes of problems that we take for granted today – in a few years we will take this new platform and this new paradigm for granted, and wonder how we ever thought those problems were an acceptable part of our professional lives.
Welcome to Fabric. Welcome to the future of data.
Ok, that’s my post. Where should you go from here? In addition to all of the links above, you should definitely check out the official blog post for the official big picture. You should also join us Wednesday and Thursday for a “simulive” virtual event as we go deeper into many of the key capabilities now available in Fabric.
I’ll see you there.
[1] Old man voice: Back when I was your age we called Fabric “Trident” and we weren’t allowed to talk about it in public because if the Kaiser heard about it our boys fighting in France would be at risk! Let me tell you about the time I…
[2] If you’re reading this post on this blog, I suspect this includes you. I’d love to know if you agree with my “feels familiar from day one” assertion.
[3] I’m working under the assumption that the interwebs will be flooded today with blogs and announcements and guys in cubes, so I’ll leave it up to you to track down what’s exciting for you.
[4] Yes, you need a capacity for all new Fabric experiences. Power BI licensing is not changing, but to work with the new Fabric capabilities you need a capacity to run them on. Fabric capacities are available in smaller SKUs than Power BI capacities.
[5] You probably noticed it in the diagram image above, sitting there in the middle all integrated and important.
[6] You know this company, but since this is my personal blog I’m probably not going to get their permission to name them. Also, as I write this blog post I can’t find the verbatim customer quote, so you’ll need to rely on my imperfect memory for this one.
[7] As I type this in 2023, I can’t remember the last time I worked with an on-prem production database. It was probably 2011 or 2012. It feels like something from another age, another life.
My favorite part of flagship Microsoft conferences like Microsoft Build is that we get to share with the world some of the exciting things that we’ve been working on for months.
At this month’s Microsoft Build conference, the Power BI and Azure Synapse teams are going to unveil something special.[1] If you don’t believe me, check out the pre-announcement post on the Power BI blog. The details are there, but I can’t keep myself from sharing the session lineup here:
Microsoft Build keynote: Analytics in the age of AI
Transform productivity with AI experiences for analytics
Eliminate data silos with the next-generation data lake
Modernize your data integration for petabyte-scale analytics
Unlock value from your data with a modern data warehouse
Use Spark to accelerate your lakehouse architecture
Secure, govern, and manage your data at scale
Go from models to outcomes with end-to-end data science workflows
Empower every BI professional to do more with data
Sense, analyze, and generate insights with real-time analytics
Accelerate your data potential with Microsoft
The speakers are a who’s who of product leaders, and the whole thing is being hosted by your favorite Guys in Cubes, Adam and Patrick.
Microsoft Build kicks off on keynotes on Tuesday May 23, and continues on the 24th. If you can’t join us in Seattle, you can register online to attend virtually.
This September the greatest thing in the history of great things happening will happen. On Saturday September 16, in a metal club in Berlin, data platform MVPs Ben Kettner and Frank Geisler are JOINing[1] the two best things in the world: data and heavy metal.
Data Moshpit is a one-day community event with a heavy metal theme. The sessions will be metal. The speakers will be metal. The atmosphere will be metal. There will be beer and food and if my experience at metal shows in Germany have taught me anything, lots of brothers of metal and sisters of steal wearing leather and denim and the coolest jean jacket vests with giant Manowar back patches.
The Data Moshpit call for speakers is open now, and closes on July 7th. You can submit metal-themed sessions of your own, or just check out the exciting sessions already submitted.
If you’ve got a heart of steel and data for blood, this is the one event of 2023 that you cannot afford to miss. And if you’re not into metal[2], you should come anyway. It will be a great opportunity to connect with the community and learn something new. I hope to see you there!
If you’re interested in an extra learning day focused on organizational maturity and adoption of a data culture with Power BI, you should also consider joining me and Melissa Coates for our full day focused on “The Hitchhiker’s Guide to Adopting Power BI in Your Organization.” At the risk of hyperbole, this session presents the most important information that you need to succeed with Power BI, and to increase the return your organization gets on its investments in data, in business intelligence, and in you.
The most important information that you need to succeed with Power BI
The more I think about it, I honestly think the risk of hyperbole here is very low. The Power BI adoption roadmap is based on the experiences of hundreds of enterprise Power BI customer organizations. The agenda of this pre-conference session is what Melissa and Matthew believe is most important for most audiences based on our collective decades of working in this space. This is the best of the best, the most important parts of the most important subject[2].
Most sessions teach you how to drive, or teach you some interesting aspect of driving. This session gives you a map, teaches you how to read the map, teaches you how to find out where you are on the map, and then provides best practices for navigation. If you’re driving for the sheer fun of driving, maybe this session isn’t what you’re looking for. But if you actually need to get somewhere, this session is going to give you what the other sessions won’t, and it will make all of the driving sessions more valuable.
Power BI includes capabilities to enable users to understand the content they own, and how different items relate to each other. Sometimes you may need a custom “big picture” view that built-in features don’t deliver, and this is where the Scanner API comes in.
No, not this kind of scanner
The Power BI Scanner API is a subset of the broader Power BI Admin API. It’s designed to be a scalable, asynchronous tool for administrators to extract metadata for the contents of their Power BI tenant[1]. For an introduction to the Scanner API, check out this blog post from when it was introduced in December 2020.
The Power BI team has been updating the Scanner API since it was released. This week they announced some significant new capabilities added to the API, so administrators can get richer and more complete metadata, including:
Scheduled refresh settings for datasets, dataflows, and datamarts – this will make it easier for administrators to review their refresh schedules and identify problems and hotspots that may have undesired effects.
Additional RDL data source properties – this will make it easier for administrators to understand paginated reports and the data sources they use.
Additional “sub-artifact” metadata for datasets – this will make it easier for administrators to understand table- and query-level configuration including row-level security and parameters.
The Scanner API is a vital tool for any organization that wants to deeply understand how Power BI is being used, with a goal of enabling and guiding adoption and building a data culture. These updates represent an incremental but meaningful evolution of the tool. If you’re already using the Scanner API, you may want to look at how to include this new metadata in your scenario. If you’re not yet using the Scanner API, maybe now is the time to begin…
[1] One of the key scenarios enabled by the Scanner API is integration with Microsoft Purview and third party data catalog tools like Collibra. When these tools “scan” Power BI to populate their catalogs, they’re calling this API.