Using Azure Custom Roles to Secure your Azure Data Factory Resources
Azure data factory (ADF) is billed as an Extract/Transform/Load (ETL) tool that has a code-free interface for designing, deploying, and monitoring data pipelines. There is a lot you can do with the tool, and one of the interesting design features is that it is all built on top of Azure Resource Manager (ARM). One issue that arises due to this approach is that both development and operations of the ADF resource is done in the same place. This leads to some obvious questions such as:
1. Who can make what types of changes to your ADF resources?
2. In what environments do you allow certain change types?
3. Who is responsible for making changes "operational"?
The goal of this post is to discuss how to use Azure Custom Roles to help secure your ADF resources in both development and operations. In order to understand how to use this, we need to walk through a couple of concepts.
ADF Components
As per the documentation here there are several components that actually make up an ADF. These include developer components, such as pipelines and activities, data components, such as datasets, and then operational components, such as linked services and triggers.
When you develop or interact with ADF via the portal, you get to use a drag/drop interface to effectively configure an ARM template representation of that particular component. Each one is actually backed by a schema, and you can view these here.
Because the underlying service uses ARM as it's configuration engine, there is also an associated ARM Resource Provider which handles all the calls to create/update an ADF instance. You can view a full list of all the actions you can take here.
What are Azure Custom Roles?
If you've ever granted access to anyone to Azure resources, you've likely had to give them permissions to a target resource, resource group, or subscription. Likely, you've played around with several of the built-in roles, including the most common ones of Contributor, Owner, or Reader. Over the last while, Azure had greatly expanded the amount of built-in roles you can use, getting more granular with how permissions are applied.
All role definitions are effectively a set of Actions (or not-actions in the case of deny) that a role can take on the target. In the case of ADF, Azure provides one built-in role in additional to the default, called Data Factory Contributor.
Depending on your development and operations process, this single built-in role might not provide enough granularity. For example, in production, your support personnel would be able to change linked service configuration using the Data Factory Contributor role. You might not want this as it could be a potential security issue.
Azure custom roles effectively allow you to build your own permission sets that you can use to grant to users or groups in Azure. This allows you to get more fine grained with your permission model.
ADF Personas
With all the pre-amble out of the way, lets talk about a couple of different personas you might have. It is important to keep in mind that you will likely also have distinctions between environments as to who has access to what. Here is a stab:
Operator
An ADF operator is someone who manages the health of an ADF instance. They could, for example, monitor job runs, queue lengths, etc. They would never make changes to the ADF itself in terms of data pipeline code.
Developer
An ADF developer is someone who would have access to make changes to developer relates components such as pipelines, activities, and data sets. They wouldn't create or modify linked services, and they wouldn't schedule triggers or other runs.
Admin
This is your infrastructure administrator who is responsible for the ADF itself. Typically, they would have full access to the ADF. Most importantly, they would be the ones creating/modifying security settings, repository settings, and linked services.
Data
This role might make sense for a subset of developers who are responsible for creating/maintaining datasets and the various configurations in that dataset. You could create a data role separate from your developer role.
Creating Custom Roles
A full discussion around creating custom roles is out of scope for this post, but here is an example of an implemented operator role.
{
"Name":"Data Factory Operator",
"Id": "",
"IsCustom": true,
"Description":"A custom role that grants users access to operate ADF",
"Actions":[
"Microsoft.Authorization/*/read",
"Microsoft.Resources/subscriptions/resourceGroups/read",
"Microsoft.ResourceHealth/availabilityStatuses/read",
"Microsoft.DataFactory/datafactories/*/read",
"Microsoft.DataFactory/factories/*/read",
"Microsoft.Resources/deployments/*/read",
"Microsoft.DataFactory/datafactories/datapipelines/pause/action",
"Microsoft.DataFactory/datafactories/datapipelines/resume/action",
"Microsoft.DataFactory/datafactories/gateways/connectioninfo/action",
"Microsoft.DataFactory/factories/cancelpipelinerun/action",
"Microsoft.DataFactory/factories/pipelines/createrun/action",
"Microsoft.DataFactory/factories/pipelineruns/cancel/action",
"Microsoft.DataFactory/factories/triggers/start/action",
"Microsoft.DataFactory/factories/triggers/stop/action",
"Microsoft.DataFactory/factories/pipelines/sandbox/action",
"Microsoft.DataFactory/factories/pipelines/sandbox/create/action",
"Microsoft.DataFactory/factories/pipelines/sandbox/run/action",
"Microsoft.DataFactory/factories/getDataPlaneAccess/action"
],
"NotActions": [
],
"AssignableScopes": [
"/an/assignable/scope"
]
}
The goal of the above role is to give the operator enough permissions to monitor the ADF resource and restart jobs as required.
Conclusion
ADF is a pretty powerful tool, but combines the development and operation planes together. This separation is still required for both security reasons and development process reasons. While you can use policy to hopefully enforce separation of concerns, Azure Custom Roles provides an effective way of enforcing the same via Azure RBAC.