Dataflows in Power BI: Overview Part 5 – Data Refresh

Important: This post was written and published in 2018, and the content below no longer represents the current capabilities of Power BI. Please consider this post to be an historical record and not a technical resource. All content on this site is the personal output of the author and not an official resource from Microsoft.

Dataflows are all about enabling data reuse through self-service data preparation, and before you can reuse data, you need to have data. To get data into your dataflow entities you need to refresh your dataflow. Similar to the options available for Power BI datasets, there are multiple options for refreshing the data in your dataflows.

The simplest way to refresh your dataflow is to simply click the “refresh” icon in the dataflows list in your workspace. This will trigger a manual refresh. Each of the queries for the entities in the dataflow will execute, and the results of those queries will be stored in the underlying CDM Folders in Azure Data Lake Storage.

01 - Manual refresh

You can also configure scheduled refresh for your dataflows. In the “…” menu in the dataflows list, select “Settings” and then set up one or more times for the dataflow’s queries to run. This will ensure that the data in your dataflow entities remains as current as you need it to.

02 - Settings

03 - Scheduled Refresh.png

This settings page is also where you configure the gateway and credentials used to refresh the dataflow.

If you’re using Power BI Premium capacity, you can also enable incremental refresh for dataflow entities that contain a DateTime column. To configure incremental refresh for an entity, click on the right-most icon in the entity list for a dataflow.

04 - Incremental

After turning on incremental refresh for the entity, specify the DateTime column on which the incremental refresh logic is applied.

05 - Incremental Settings

The logic for dataflows is the same as it is for datasets, so instead of going into detail here, I’ll just point you to the incremental refresh documentation.[1]

Regardless of how you refresh your dataflow – manual or scheduled, full or incremental – you can view the refresh history in the dataflows list, by selecting “Refresh history” from the menu.

06 - refresh history menu.png

This will show you a list of times the dataflow was refreshed, whether the refresh was manual or scheduled, and whether the refresh succeeded or failed.

07 - Refresh history detials

For more detail about any refresh in the list you can click the “download arrow” next to the refresh. This will download a CSV file containing per-entity information for the refresh. If your refresh fails, this is probably the best place to look, and if you reach out for support with a failed refresh, the information in this file will  be valuable to share with support personnel.

That’s it for refresh. The next post in this series will introduce linked entities, which will add a little more complexity to the refresh story, but that’s a topic for another post…


[1] It’s late. I’m lazy.

17 thoughts on “Dataflows in Power BI: Overview Part 5 – Data Refresh

  1. Pingback: Dataflows in Power BI – BI Polar

  2. Pingback: Power BI Dataflows – Data Profiling Without Premium – BI Polar

  3. Pingback: Dataflows in Power BI: Overview Part 3 – Premium – BI Polar

  4. Pingback: Power Query Memory Usage, Dataflow Container Size And Refresh Performance « Chris Webb's BI Blog

  5. Peer

    Hi Matthew,
    Nice series of posts on Data Flow.

    Regarding refreshes I have stumpled upon a strange error. I have a PowerBI Dataflow storing data on an underlying ADL Gen2 Storage. This is not on a Premium Capacity and the source data resides on a on-premise Oracle. So a on-prem gateway has been setup as well in order to fetch the data.

    Running a refresh (manual or scheduled) produces the following error:

    Refresh can’t run: This dataflow uses Azure Data Lake Storage Gen2 and an associated gateway with shared capacity. You can enable refresh by removing the gateway or by upgrading this workspace to Power BI Premium capacity.

    Of course removing the gateway makes no sense as I need this for the on-prem data. And the documentation says nothing about premium capacity is required for refresh from on-prem sources.

    As of now I have had to create the data flow as an external data flow where files on the data lake are loaded via Data Factory and model.json is manually maintained.

    Any insights from you on the above?

    Like

      1. Peer

        Hi Matthew,

        I finally found some more insight on this and thought I might share it with you as well.

        According to the article here: https://docs.microsoft.com/en-us/power-bi/service-dataflows-configure-workspace-storage-settings on-premise data sources are not support on shared capacities.

        It states the following:
        On-premises data sources, in Power BI Shared capacities, are not supported in dataflows stored in your organization’s Azure Data Lake Storage Gen2.

        But thanks for your reply.

        Like

  6. Hi Matthew,
    I have read in another posts that in Data Flow we can have different refresh schedules for different entities but couldn’t find where the entity level refresh can be configured. Have you tried this?

    Like

  7. Dung Anh LE

    Hi Matthew,

    When the dataflow refreshes, will it automatically trigger all the report that are usign the dataflow in question?
    Or we need to schedule it like 30′ after the dataflow has been refreshed?

    Thank you,
    DA

    Like

    1. You still need to schedule your report refresh.

      I believe that the dataflows team is working on an “auto-refresh of dependent datasets” capability, but until it shows up on the public roadmap / release notes it’s not official.

      Like

  8. Neville de Sousa

    Hi Matthew,

    I’m finally getting to use dataflows and I’ve come across a refresh issue that is frustrating – one because the error message is cryptic and two, there’s very little information I can get online.

    I can connect to my on-prem data source and extract data and save the entity in the dataflow. However, when I try to refresh the entity, I get a message:

    Error: An internal error occurred… [with a RootActivityId] Param1 = Received error payload from gateway service with ID 111803: An exception encountered while creating the target data source. [with a Request ID:]

    A couple of searches, while I wait for MS support to get back to me, point to multi factor authentication and gateway version. Any clues/ideas?

    Like

  9. Mike Brough

    Hi Matthew. Great that we can use incremental refresh to load data into a dataflow, but is it possible to use incremental refresh in a dataset when the source data is from a dataflow?

    Like

    1. Yes?

      I have never done this myself, but I have multiple customers who have described to me using incremental dataset refresh against dataflows that in turn use incremental refresh. I’m reasonably confident this works, but can’t personally verify the end to end scenario.

      Like

  10. Dan Sprague

    Hi Matthew,

    Do you know if it’s possible to use Power Automate to trigger the refresh of a Dataflow?

    We currently have a Power Automate Solution that triggers refreshes of Power BI Datasets when source data is ready. We’d like to do the same with Dataflows, but I didn’t see any Dataflow actions in Power Automate.

    As you know, the shortcoming with Dataflow scheduled refreshes is that they’re based on a static time. So when using scheduled refreshes we run the risk of refreshing prematurely.

    Much appreciated!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s