Skip to main content

ACLs in Azure Data Lake

Access Control Lists(ACLs) in azure are an extremely powerful toolset to provision granular levels of access in Azure Data Lake. Role-Based Access Control (RBAC) is best option to setup broader access levels however with ACLs you can reach the lowest possible grains as low as a file inside a blob container. Think of a scenario where you want to add more than 1 user to a folder inside a blob container and each one of them sees only their data - Possible with ACLs

Prerequisites

  • Azure Subscription
  • Storage blob with hierarchical namespace enabled
  • Reader Access on the storage object via RBAC

How to setup ACLs in Azure Data Lake

Like any other offering, Microsoft has a broad spectrum of tools/ways to setup ACLs, ranging from Azure Portal to writing python code. All the steps involved are available in Microsoft documentation, and in a very descriptive manner, therefore needless to rephrase again in this article. Instead, lets walk through some of the challenges one can come across while setting up ACLs.

  • To begin with, RBAC access will always supersede ACLs. We can not restrict an AD Group/User to a folder inside a container who has a contributor-level access on storage account. Their Role-Based Access Control(RBAC) will let them do anything they want inside container.
  • Additionally, ACLs provision access from the point in time they are setup. To elaborate more on this, let take an example. If you add SALES to an already existing folder 'SALESFEED', the data already existing in the folder will not be accessible to SALES, they will have have access to future objects they create in SALESFEED.
  • To make sure SALES can access objects created by users other than SALES in SALESFEED, they must ensure Default ACL is setup for SALESFEED.
  • SALES must have EXECUTE permissions on parent folders/container of SALESFEED, and reader access to subscription.

Principal in ACLs

Principal is an entity that defines an end user or a set of users to be setup in ACLs. Any of the objects in the image below can act as a Principal in ACLs. However, recommendation is to use Azure AD Security Groups as that enables owners to add and replace members with in the ACL without the hassle of reapplying ACL over and over again whenever a change is required. Role assignment happens at Principal level.

Permissions

Read, Write and Execute are the levels that can be granted via ACLs. READ enables a reader access on file, along with read on directory but EXECUTE permission is required on all the parent folders rolling up to root folder to enable 'TRAVERSE'. Likewise, WRITE will enable file creation/modification but this too will require EXECUTE until root folder. I would again like to emphasize that ACL permissions of an object are driven from the object itself and can not be inherited, let me rephrase it- If default permissions are set on the parent directory after child items were created then permissions will not be inherited.

Default ACLs

This option lets you setup visibility rules for future children inside a directory. SALES for instance will not be able to see any files created by a different owning user inside SALESFEED untill and unless Default Permissions are setup for SALESFEED directory. Setup is available under Default Permissions tab in Manage ACL option.

Mask ACLs

Mask overrides default permissions for NAMED USERS and NAMED GROUPS. In the case below, Singh,Harinder will have elevated READ access on the ACL directory though it was not set up in ACCESS ACLs. Similarly, Mask can also restrict access added in ACCESS ACL. In scenarios where a temporary access restriction is required on a directory for ACLs, a single mask can be used instead of removing all the ACLs and adding them back again to cater to a temproray requirement.

ACL Best Practices

Microsoft recommends using Azure AD Security groups in place of adding associates at the User level. Let us go back to the SALES example. Regina Filangie heads Sales at Central Perk and wants all her team to be added to SALESFEED. Regina, however, has skyrocketing attrition in her team and therefore to skip the hassle of adding new joiners and removing ex-employee in ACLs with the help of the Cloud security Team, she plans to setup a security group for SALES with herself as the owner. This enables her to add and remove her team to SALESFEED as and when she wants.
This approach also helps in segregation. Let's say Gunther from SALES OPS too wants access to SALESFEED for some reporting needs. He can work on setting up Azure AD Group-driven ACL for SALES OPS. This way it is easy to identify different categories of users added to storage making it easy to manage access and security on storage account.

Credits: Microsoft documentation

Comments

  1. Ms should simplify this. It's more of hassle then the feature..

    ReplyDelete
  2. They may want to work on highlighting details in the documentation.

    ReplyDelete

Post a Comment

Popular posts from this blog

How to work with XML files in Databricks using Python

This article will walk you through the basic steps of accessing and reading XML files placed at the filestore using python code in the community edition databricks notebook. We will also explore a few important functions available in the Spark XML maven library. Think of this article as a stepping stone in the databricks community edition. Features and functionalities elaborated herein can be scaled at the enterprise level using Enterprise editions of databricks to design reusable file processing frameworks. Requirements We will be using the Spark-XML package from Maven. **Spark 3.0 or above is required on your cluster for working with XML files. This article uses Databricks Community edition, refer to  this  video tutorial from Pragmatic works for getting started with  Databricks Community Edition . Create your first cluster in seconds : The next step is to install the Spark-XML library on your cluster. The cluster needs to be in a running state to install this li...

Microsoft Fabric

Complete Analytics Platform  In its new Software as a service offering, Microsoft basically clubbed every tool in their Analytics portfolio and gave it a new name - Fabric :). Claims are Fabric can serve every Data stakeholder ranging from a developer working with Data Lake to a Sales associate working on a self-serve Powerbi report. Microsoft has implemented tenant centric architecture in Fabric like office 365, In optimal design an organization will have 1 fabric similar to 1 office 365 tenant for entire organization. Lake centric and Open  All the data and apps built on Fabric provided solutions will get stored at a single lake, It auto calculates the lineage for objects stored on a single data lake. It uses delta file format and parquet data storage for all the objects.  Advantage: Table storage is shared across the fabric workspace, suppose you have a data issue in a Synapse datawarehouse query, just run a fix on the data set using Synapse data engineering python not...

Hierarchies in Oracle.

This article explores the functionality and features offered by CONNECT BY clause in Oracle with a hands-on exercise approach. Prerequisite: Oracle 9g or lastest installed, any oracle SQL client. We have used Oracle's sample schema for this article, you can download it too from the link below. Execute this SQL in your oracle client and you should be all set with data and schema. https://download.oracle.com/oll/tutorials/DBXETutorial/html/module2/les02_load_data_sql.htm Let's get started with CONNECT BY clause in Oracle. This is basically an oracle clause to place eligible datasets in a hierarchical fashion. Meaning, usage of this function is generally for creating a new resultant query that will elaborate hierarchical relations in a table. Here is the basic syntax [ START WITH condition ] CONNECT BY [ NOCYCLE ] condition START WITH is an optional keyword that can be used as a starting point for hierarchy. CONNECT BY describes the relationship between a child and parent r...