Using DirectQuery with Dataflow to apply Row-level Security in an Import mode Dataset

A DIY guide to applying RLS in a Power BI dataset from a DirectQuery dataflow

Dhyanendra Singh Rathore
Microsoft Power BI

--

Photo by Pixabay from Pexels

DirectQuery and dataflows can address various kinds of problems in Power BI. For example, one of the untraditional problems DirectQuery with dataflows can solve is restricting access to data where facts update less frequently than the data access rights.

This article will discuss how to set up RLS in an import mode Power BI dataset using DirectQuery with a Power BI dataflow. We’ll discuss the requirements, configurations, and modeling tips to optimize query performance.

Please read our terms of use before proceeding with this article: https://dhyanintech.medium.com/disclaimer-disclosure-terms-of-use-fb3bfbd1e0e5

Before we begin, let’s look at the scenario that prompted us to set up RLS using DirectQuery with dataflow. We have a reporting solution to track several KPIs. The dataset comprises many imported fact and dimension tables, some of which contain sensitive data. Report users can request access to the data from a support team who maintains the details of user and data access rights in an Azure SQL Server database. Multiple teams use this information to restrict data access at their ends.

The dataset imports the data access rights details from the Azure SQL Server to apply row-level security in the model. The data updates at random intervals; hence, scheduled refreshes cannot be used. Instead, the refreshes are done manually. Either the entire dataset is refreshed at once, or tables are refreshed selectively using the XMLA endpoint.

Given the nature of the dataset refreshes, the support team can’t guarantee their SLAs, and users are not sure when they can expect their new access rights to take effect in the report.

The solution to this problem is to remove the dependency on the data refresh using DirectQuery for the data access rights information. Unfortunately, using DirectQuery with the Azure SQL Server was out of the question as multiple teams connecting to the database resulted in unpredictable usage and downgraded performance.

An alternative solution is to use a dataflow that imports the user access rights information from the Azure SQL Server and refreshes daily. The dataset connects to the dataflow with DirectQuery to apply row-level security. This approach removes the dependency on the data refresh and creates a reusable asset without impacting the database performance. Finally, not to mention some happy support teams and users.

Prerequisites

Power BI dataflows are supported in all types of workspaces as long as the Power BI admin has enabled the creation and usage of dataflows for the tenant. However, to use DirectQuery with dataflows, the following prerequisites must also be fulfilled.

  • Premium workspace: To use DirectQuery, we need to enable the Enhanced compute engine for the dataflow. Enhanced compute engine is a premium-only feature and thus available only for Power BI Premium and Premium Per User (PPU) workspaces.
  • Workspace role: You must have content edit rights to the hosting workspace, i.e., admin, member, or contributor.
  • A dataflow with a table to be used for setting up the RLS.

If you’re new to Power BI dataflow, refer to the following link to learn how to create a dataflow.

Note: The steps in this article were performed on a Premium Gen2 workspace and may not work on a Premium Gen1 workspace. Also, the enhanced compute engine is not yet available in every region at the time of writing.

Dataset Setup

We will use a simple setup for our demo to configure the RLS in a dataset: We have an RLS Users table with two columns: id_reporting_unit and user_id, where user_id contains the User Principal Name (UPN) and id_reporting_unit includes the reporting unit a user is allowed to see.

In our fact table, we have an id_reporting_unit column with a many-to-many relationship to id_reporting_unit in the RLS Users. In short, a user has access to data from many reporting units. The following image shows the data model and relationships.

Power BI Desktop: Model view- RLS relationship in the dataset (Image by author)

In our exercise, we will connect to the RLS Users table using DirectQuery from the dataflow and define an RLS rule on the user_id column.

Configuring the dataflow

The first step is to create a dataflow and create the RLS Users table in the dataflow. So, you can carry out all the steps in the dataflow to replicate the table or copy-paste the Power Query code from Power BI Desktop to the dataflow.

Power BI service: RLS Users dataflow (Image by author)

Enable Enhanced Compute Engine: You must enable the enhanced compute engine and refresh the dataflow. Follow the below steps:

  1. In the workspace, next to the dataflow, select More options (…), then Settings.
Power BI service: Dataflow- more options (Image by author)

2. Expand the Enhanced compute engine settings, select On and then select Apply.

Power BI service: Dataflow- Enhanced compute engine settings (Image by author)

3. Wait for the changes to apply, then refresh the dataflow.

Power BI service: Refresh dataflow (Image by author)

4. (Optional) Set up a scheduled refresh for the dataflow.

The dataflow is now ready to be connected as DirectQuery from a dataset.

Setting up the RLS in the dataset

Open your dataset in the Power BI Desktop and connect to the dataflow we just configured.

  • On the Home ribbon, select Get Data > Dataflows. Then, on the Navigator pane, locate the dataflow under Workspaces (or use the search bar at the top).
Power BI Desktop: Connect to a dataflow table (Image by author)

A quick check to see if you’ve configured the dataflow correctly:

  • A View icon (two overlaid boxes) with the table name means DirectQuery is supported for the dataflow.
  • A Table icon with the table name means the dataflow only supports import.
Power BI Desktop: DirectQuery with dataflow support indicator (Image by author)
  • Select the table and Load. Wait for the Power BI to evaluate the operation, and you’ll get a Set storage mode dialog box. In this step, you choose how you want to connect to the table. Select DirectQuery.
Power BI Desktop: Storage mode (Image by author)
  • Next, acknowledge the Potential security issue warning. The warning serves as a reminder that your Power BI model will turn into a composite model as the model will have tables from two different source groups.
Power BI Desktop: Potential security risk warning (Image by author)
  • Wait for the Power BI to connect to the table and head to the Model view. Hover over the RLS Users table, and you can see the tooltip for quick confirmation. Next, define the relationship between the fact and the RLS Users table.
Power BI Desktop: Model view with DirectQuery RLS table (Image by author)
  • Lastly, we will define a dynamic row-level security rule on the user_id column using the DAX expression USERPRINCIPALNAME. Save and publish your dataset to the Power BI service.
Power BI Desktop: RLS role (Image by author)

Refer to the following link to learn how to define RLS in Power BI.

Performance tips

Before we proceed further, let’s look at a few modeling tips to improve query performance.

  • Create RLS rules that filter dimension tables instead of fact tables, especially if your fact table is enormous. One option to achieve this is to create a new table by taking the distinct of the RLS candidate column. Afterward, create a many-to-one relationship between the fact and the new table where the new table filters the fact. Finally, set up the RLS on the newly created dimension table.
  • Avoid many-to-many relationships in your model when possible. A straightforward option is to use an intermediate dimension table. Create a new table by taking the distinct of the relationship column and joining the two tables to the intermediate table. Intermediate tables break down a many-to-many relationship into two many-to-one relationships.
Power BI Desktop: Optimized model view with DirectQuery RLS table (Image by author)
  • Why Assume referential integrity is disabled? Enabling referential integrity improves the query performance of DirectQuery sources by using an INNER JOINrather than an OUTER JOINto join the tables. In our scenario, referential integrity is disabled as the tables belong to two different source groups.
Power BI Desktop: Relationship- Disabled assume referential integrity (Image by author)

Finish up the RLS in the Power BI service

The next step is to assign users or AD groups to the RLS role we’ve just created, and you can carry out the assignment only in the Powe BI service. So head over to your workspace and finish up the user assignment.

Lastly, the final step is to validate the RLS roles to ensure everything is working as expected. You can validate the roles in the Power BI Desktop or the service. A detailed process is beyond the scope of this article, but you can find the step by step by instructions for user assignment and validation here.

That’s all for this article. Let us know what other challenging problems you solved using DirectQuery with dataflows.

Next steps: Dataset performance tracing

Observing and analyzing your dataset after connecting to a DirectQuery source is an excellent way to ensure the report delivers an acceptable query performance. Performance Analyzer works in the Power BI Desktop, but things are different in the service.

Or, if you’re interested to learn what RLS looks like behind the scenes, the following article will be a good starting point. We will also show you how to get your hands on the DAX query used by the report to populate the visuals.

Like this post? Connect with Dhyan

Let’s be friends! You can find me on LinkedIn or join me on Medium.

P.S. Have you ever tried hitting the clap button here on Medium more than once to see what happens? ❤️

--

--