Dataflows in Power BI: Overview Part 5 – Data Refresh

Dataflows are all about enabling data reuse through self-service data preparation, and before you can reuse data, you need to have data. To get data into your dataflow entities you need to refresh your dataflow. Similar to the options available for Power BI datasets, there are multiple options for refreshing the data in your dataflows.

The simplest way to refresh your dataflow is to simply click the “refresh” icon in the dataflows list in your workspace. This will trigger a manual refresh. Each of the queries for the entities in the dataflow will execute, and the results of those queries will be stored in the underlying CDM Folders in Azure Data Lake Storage.

01 - Manual refresh

You can also configure scheduled refresh for your dataflows. In the “…” menu in the dataflows list, select “Settings” and then set up one or more times for the dataflow’s queries to run. This will ensure that the data in your dataflow entities remains as current as you need it to.

02 - Settings

03 - Scheduled Refresh.png

This settings page is also where you configure the gateway and credentials used to refresh the dataflow.

If you’re using Power BI Premium capacity, you can also enable incremental refresh for dataflow entities that contain a DateTime column. To configure incremental refresh for an entity, click on the right-most icon in the entity list for a dataflow.

04 - Incremental

After turning on incremental refresh for the entity, specify the DateTime column on which the incremental refresh logic is applied.

05 - Incremental Settings

The logic for dataflows is the same as it is for datasets, so instead of going into detail here, I’ll just point you to the incremental refresh documentation.[1]

Regardless of how you refresh your dataflow – manual or scheduled, full or incremental – you can view the refresh history in the dataflows list, by selecting “Refresh history” from the menu.

06 - refresh history menu.png

This will show you a list of times the dataflow was refreshed, whether the refresh was manual or scheduled, and whether the refresh succeeded or failed.

07 - Refresh history detials

For more detail about any refresh in the list you can click the “download arrow” next to the refresh. This will download a CSV file containing per-entity information for the refresh. If your refresh fails, this is probably the best place to look, and if you reach out for support with a failed refresh, the information in this file will  be valuable to share with support personnel.

That’s it for refresh. The next post in this series will introduce linked entities, which will add a little more complexity to the refresh story, but that’s a topic for another post…


[1] It’s late. I’m lazy.

11 thoughts on “Dataflows in Power BI: Overview Part 5 – Data Refresh

  1. Pingback: Dataflows in Power BI – BI Polar

  2. Pingback: Power BI Dataflows – Data Profiling Without Premium – BI Polar

  3. Pingback: Dataflows in Power BI: Overview Part 3 – Premium – BI Polar

  4. Pingback: Power Query Memory Usage, Dataflow Container Size And Refresh Performance « Chris Webb's BI Blog

  5. Peer

    Hi Matthew,
    Nice series of posts on Data Flow.

    Regarding refreshes I have stumpled upon a strange error. I have a PowerBI Dataflow storing data on an underlying ADL Gen2 Storage. This is not on a Premium Capacity and the source data resides on a on-premise Oracle. So a on-prem gateway has been setup as well in order to fetch the data.

    Running a refresh (manual or scheduled) produces the following error:

    Refresh can’t run: This dataflow uses Azure Data Lake Storage Gen2 and an associated gateway with shared capacity. You can enable refresh by removing the gateway or by upgrading this workspace to Power BI Premium capacity.

    Of course removing the gateway makes no sense as I need this for the on-prem data. And the documentation says nothing about premium capacity is required for refresh from on-prem sources.

    As of now I have had to create the data flow as an external data flow where files on the data lake are loaded via Data Factory and model.json is manually maintained.

    Any insights from you on the above?

    Like

      1. Peer

        Hi Matthew,

        I finally found some more insight on this and thought I might share it with you as well.

        According to the article here: https://docs.microsoft.com/en-us/power-bi/service-dataflows-configure-workspace-storage-settings on-premise data sources are not support on shared capacities.

        It states the following:
        On-premises data sources, in Power BI Shared capacities, are not supported in dataflows stored in your organization’s Azure Data Lake Storage Gen2.

        But thanks for your reply.

        Like

  6. Hi Matthew,
    I have read in another posts that in Data Flow we can have different refresh schedules for different entities but couldn’t find where the entity level refresh can be configured. Have you tried this?

    Like

  7. Dung Anh LE

    Hi Matthew,

    When the dataflow refreshes, will it automatically trigger all the report that are usign the dataflow in question?
    Or we need to schedule it like 30′ after the dataflow has been refreshed?

    Thank you,
    DA

    Like

    1. You still need to schedule your report refresh.

      I believe that the dataflows team is working on an “auto-refresh of dependent datasets” capability, but until it shows up on the public roadmap / release notes it’s not official.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s