Exploring Azure Databricks Permissions
We are continuing on with our discussion about devops and security concerns with Azure Databricks. In this post, we will talk about setting up granular permissions inside of Azure Databricks.
By default, particularly with workspaces in the standard tier, all users have access to all resources within the workspace. By resources, I mean specific Databricks “objects” such as directories, notebooks, clusters, pools, jobs and tables. Luckily, Azure Databricks offers a premium plan, which allows administrators to configure custom role-based access controls based on the permissions API.
When you are creating production Databricks workspaces, you are likely going to have two main use-cases. The first is job specific. This workspace is used to run pre-created reports and functions that have followed some type of development process and have been promoted into production. The second type is going to be more for exploratory type processing. End users will want to experiment and play with the data, creating notebooks in an interactive fashion and examining the results.
From a job specific workspace perspective, you likely want to have creation of new notebooks, jobs, clusters, etc locked down to only approved CI/CD processes. Because these jobs will likely be using service principals, ensuring that users cannot just create notebooks and run them would be of extreme importance. Of course, you will still have support personnel who will need to monitor job execution and results. Azure Databricks role-based access control can help with this use case.
For interactive clusters, you will likely want to ensure that users have “safe” places to create their notebooks, run jobs, and examine results. Because the results of the notebooks are stored with the notebook themselves, you’ll want to create appropriate role-based access controls to ensure that only users with the same security clearance can see their outputs. You may also want to create “home” directories only accessible by the individual users. Again, role-based access control is a good fit here.
Permissions Architecture
From an architecture perspective, the permissions in Azure Databricks is quite simplistic. Each object within a Databricks workspace (for example a notebook) has a set of “permissions” that can be associated with it.
For example, notebooks can have the following permissions:
CAN_READ
Users can view and comment on a notebook
CAN_RUN
Users can view, comment and also attach/detach the notebook from a cluster. They can also run commands within that notebook
CAN_EDIT
All the above plus the ability to edit the notebook
CAN_MANAGE
All the above and can also change permissions on the notebook
These permissions can be assigned to the respective objects along with a user or a group. This typically manifests itself as adding an access control list to that particular object. Generically, it looks something like this:
{ "access_control_list": [ { "user_name":"<UserName>" || "group_name":"<GroupName>" , "permission_level": "<PermissionLevel> } ] }
Conclusion
At time of writing, permissions can be used in premium tier workspaces with workspace access control enabled. It is editable via the portal experience, and, if you ask nicely, you may get access to a preview for setting the permissions via script.