Authenticate access to Azure Databricks using OAuth token federation

This article guides you through configuring OAuth federation to access Azure Databricks account and workspace resources using tokens from your identity provider.

Important

Databricks OAuth federation is in Public Preview.

What is Databricks OAuth token federation?

Databricks OAuth token federation enables you to securely access Databricks APIs using tokens from your identity provider (IdP). OAuth token federation eliminates the need to manage Databricks secrets such as personal access tokens and Databricks OAuth client secrets.

Using Databricks OAuth token federation, users and service principals exchange JWT (JSON Web Tokens) tokens from your identity provider for Databricks OAuth tokens, which can then be used to access Databricks APIs.

Databricks supports two types of token federation:

  • Account-wide token federation enables all users and service principals in your Databricks account to access Databricks APIs using tokens from your identity provider. Account-wide token federation allows you to centralize the management of token issuance policies in your identity provider, and is typically used in combination with SCIM, so users in your identity provider are synchronized into your Azure Databricks account.
  • Workload identity federation allows your automated workloads running outside of Azure Databricks to access Databricks APIs without the need for Databricks secrets. With workload identity federation, your application (workload) authenticates to Databricks as a Databricks service principal using tokens issued by the workload runtime.

Note

Microsoft Azure users can also use MS Entra tokens to securely use the Azure Databricks CLI and APIs.

Account-wide token federation

Account admins can configure OAuth token federation in the Azure Databricks account using an account federation policy. An account federation policy enables all users and service principals in your Azure Databricks account to access Databricks APIs using tokens from your identity provider. An account federation policy specifies:

  • The identity provider or issuer from which Azure Databricks will accept tokens.
  • The criteria for mapping a token to the corresponding Azure Databricks user or service principal.

To configure an account federation policy, provide the following:

  • The required token issuer, specified in the iss claim of your tokens. The issuer is an HTTPS URL that identifies your identity provider.

  • The allowed token audiences, specified in the aud claim of your tokens. This identifier represents the recipient of the token. As long as the audience in the token matches at least one audience in the policy, the token is considered a match. If unspecified, the default value is your Azure Databricks account ID.

  • The subject claim. This indicates which token claim contains the Azure Databricks username of the user the token was issued for. If unspecified, the default value is sub.

  • Optionally, the public keys used to validate the signature of your tokens, in JSON Web Key Sets (JWKS) format. If unspecified (recommended), Databricks automatically fetches the public keys from your issuer’s well known endpoint. Databricks strongly recommends relying on your issuer’s well known endpoint for discovering public keys.

    Note

    If you do not specify a JWKS in your federation policy (recommended), your identity provider must serve OpenID Provider Metadata at {issuer-url}/.well-known/openid-configuration. The OpenID Provider Metadata must include a jwks_uri that specifies the location of the public keys used to verify token signatures.

The following is an example account federation policy:

issuer: "https://idp.mycompany.com/oidc"
audiences: ["databricks"]
subject_claim: "sub"

The following example JWT body matches the above policy and can be used to authenticate to Databricks as user username@mycompany.com:

{
  "iss": "https://idp.mycompany.com/oidc",
  "aud": "databricks",
  "sub": "username@mycompany.com"
}

Configure an account federation policy

Account admins can configure an account federation policy using the Databricks CLI (version 0.239.0 and above) or the Databricks API. You can specify up to five account federation policies in your Azure Databricks account.

Databricks CLI

  1. Install or update to the newest version of the Databricks CLI.

  2. As an account admin, authenticate to your Databricks account using the CLI. Specifying the ACCOUNT_CONSOLE_URL (e.g.https://accounts.cloud.databricks.com) and your Databricks ACCOUNT_ID:

    databricks auth login --host ${ACCOUNT_CONSOLE_URL} --account-id ${ACCOUNT_ID}
    
  3. Create the account federation policy. For example:

    databricks account federation-policy create --json \
    '{
      "oidc_policy": {
        "issuer": "https://idp.mycompany.com/oidc",
        "audiences": [
          "databricks"
        ],
        "subject_claim": "sub"
      }
    }'
    

Databricks Account API

The following is an example Databricks REST API call to create an account federation policy:

curl --request POST \
  --header "Authorization: Bearer $TOKEN" \
  "https://accounts.cloud.databricks.com/api/2.0/accounts/${ACCOUNT_ID}/federationPolicies" \
  --data '{
    "oidc_policy": {
      "issuer": "https://idp.mycompany.com/oidc",
      "audiences": [
        "databricks"
      ],
      "subject_claim": "sub"
    }
  }'

You might need to configure your identity provider to generate tokens for your users to exchange with Databricks. See the documentation for your identity provider for instructions.

Example account federation policies

Federation policy Example matching token
issuer: "https://idp.mycompany.com/oidc"
audiences: ["2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"]
{
"iss": "https://idp.mycompany.com/oidc",
"aud": "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d",
"sub": "username@mycompany.com"
}
issuer: "https://idp.mycompany.com/oidc"
audiences: ["2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"]
subject_claim: "preferred_username"
{
"iss": "https://idp.mycompany.com/oidc",
"aud": ["2ff814a6-3304-4ab8-85cb-cd0e6f879c1d",
"other-audience"],
"preferred_username": "username@mycompany.com",
"sub": "some-other-ignored-value"
}
issuer: "https://idp.mycompany.com/oidc"
audiences: ["2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"]
jwks_json: {"keys":[{"kty":"RSA","e":"AQAB","use":"sig",
"kid":"<key-id>","alg":"RS256", "n":"uPUViFv..."}]}
{
"iss": "https://idp.mycompany.com/oidc",
"aud": "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d",
"sub": "username@mycompany.com"
}
(signature verified using public key in policy)
issuer: "https://idp.mycompany.com/oidc"
audiences: ["2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"]
jwks_json: {"keys":[{"kty":"RSA","e":"AQAB","use":"sig",
"kid":"<key-id>","alg":"RS256", "n":"uPUViFv..."}]}
{
"iss": "https://idp.mycompany.com/oidc",
"aud": "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d",
"sub": "username@mycompany.com"
}
(signature verified using public key in policy)

Workload identity federation

Workload identity federation allows your automated workloads running outside of Azure Databricks to access Databricks APIs without the need for Databricks secrets. Account admins can configure workload identity federation using a service principal federation policy.

A service principal federation policy is associated with a service principal in your Azure Databricks account, and specifies:

  • The identity provider (or issuer) from which the service principal can authenticate.
  • The workload identity (or subject) that is permitted to authenticate as the Databricks service principal.

To configure a service principal federation policy, provide the following:

  • The required token issuer, specified in the iss claim of workload identity tokens. The issuer is an HTTPS URL that identifies the workload identity provider.

  • The required token subject, specified in the sub claim of workload identity tokens. The subject uniquely identifies the workload in the workload runtime environment.

  • The allowed token audiences, specified in the aud claim of workload identity tokens. The audience represents the recipient of the token. As long as the audience in the token matches at least one audience in the policy, the token is considered a match. If unspecified, the default value is your Azure Databricks account ID.

  • Optionally, the public keys used to validate the signature of the workload identity tokens, in JSON Web Key Sets (JWKS) format. If unspecified (recommended), Azure Databricks automatically fetches the public keys from the issuer’s well known endpoint. Databricks strongly recommends relying on the issuer’s well known endpoint for discovering public keys.

  • Optionally, the subject claim. This indicates which token claim contains the workload identity (or subject) of the token. If it is unspecified, the default value is sub. Databricks strongly recommends using the default sub claim for workload identity federation. A claim other than sub should only be used in cases where the sub claim is not an appropriate or stable subject identifier, which is uncommon. See Example service principal federation policies below for details.

    Note

    If you do not specify a JWKS in your federation policy (recommended), your identity provider must serve OpenID Provider Metadata at {issuer-url}/.well-known/openid-configuration. The OpenID Provider Metadata must include a jwks_uri that specifies the location of the public keys used to verify token signatures.

The following is an example service principal federation policy for a Github Actions workload:

issuer: "https://token.actions.githubusercontent.com"
audiences: ["https://github.com/my-github-org"]
subject: "repo:my-github-org/my-repo:environment:prod"

The following example JWT body matches the above policy and can be used to authenticate to Azure Databricks:

{
  "iss": "https://token.actions.githubusercontent.com",
  "aud": "https://github.com/my-github-org",
  "sub": "repo:my-github-org/my-repo:environment:prod"
}

Configure a service principal federation policy

Account admins can configure a service principal federation policy using the Databricks CLI (version 0.239.0 and above) or the Databricks API. You can create up to five service principal federation policies per Databricks service principal.

Databricks CLI

  1. Install or update to the newest version of the Databricks CLI.

  2. As an account admin, authenticate to your Databricks account using the CLI. Specifying the ACCOUNT_CONSOLE_URL (e.g.https://accounts.cloud.databricks.com) and your Databricks ACCOUNT_ID:

    databricks auth login --host ${ACCOUNT_CONSOLE_URL} --account-id ${ACCOUNT_ID}
    
  3. Get the numeric ID of the service principal that will have the federation policy applied to it. (For example, 3659993829438643.)

    If you know the service principal application ID (typically a GUID value, such as bc3cfe6c-469e-4130-b425-5384c4aa30bb) in advance, you can then determine the service principal numeric ID using the Databricks CLI:

    databricks account service-principals list --filter 'applicationId eq "<service-principal-application-id>"'
    
  4. Create the service principal federation policy. Here is an example of creating a federation policy for a GitHub Action:

    databricks account service-principal-federation-policy create ${SERVICE_PRINCIPAL_NUMERIC_ID} --json \
    '{
      "oidc_policy": {
        "issuer": "https://token.actions.githubusercontent.com",
        "audiences": [
          "https://github.com/my-github-org"
        ],
        "subject": "repo:my-github-org/my-repo:environment:prod"
      }
    }'
    

Databricks Account API

  1. Get the numeric ID of the service principal that will have the federation policy applied to it. (For example, 3659993829438643.) If you know the service principal application ID (typically a GUID value, such as bc3cfe6c-469e-4130-b425-5384c4aa30bb) in advance, you can then determine the service principal numeric ID using the Databricks service principal REST API:

    curl --get \
      --header "Authorization: Bearer $TOKEN" \
      "https://accounts.cloud.databricks.com/api/2.0/accounts/${ACCOUNT_ID}/scim/v2/ServicePrincipals" \
      --data-urlencode 'filter=applicationId eq "<service-principal-application-id>"'
    

    The service principal numeric ID is returned in the id field of the response.

  2. Create the service principal federation policy. Here is an example of creating a federation policy for a GitHub Action:

    curl --request POST \
      --header "Authorization: Bearer $TOKEN" \
      "https://accounts.cloud.databricks.com/api/2.0/accounts/${ACCOUNT_ID}/servicePrincipals/${SERVICE_PRINCIPAL_NUMERIC_ID}/federationPolicies" \
      --data '{
        "oidc_policy": {
          "issuer": "https://token.actions.githubusercontent.com",
          "audiences": [
            "https://github.com/my-github-org"
          ],
          "subject": "repo:my-github-org/my-repo:environment:prod"
        }
      }'
    

Example Databricks account and service principal federation policies

Tool Federation policy Example matching token
GitHub Actions issuer: "https://token.actions.githubusercontent.com"
audiences: ["https://github.com/<github-org>"]
subject: "repo:<github-org>/<repo>:environment:prod"
{
"iss": "https://token.actions.githubusercontent.com",
"aud": "https://github.com/<github-org>",
"sub": "repo:<github-org>/<repo>:environment:prod"
}
Kubernetes issuer: "https://kubernetes.default.svc"
audiences: ["https://kubernetes.default.svc"]
subject: "system:serviceaccount:namespace:podname"
jwks json: {"keys":[{"kty":"rsa","e":"AQAB","use":"sig",
"kid":"<key-id>","alg":"RS256","n":"uPUViFv..."}]}
{
"iss": "https://kubernetes.default.svc",
"aud": ["https://kubernetes.default.svc"],
"sub": "system:serviceaccount:namespace:podname"
}
Azure DevOps issuer: "https://vstoken.dev.azure.com/<org_id>"
audiences: ["api://AzureADTokenExchange"]
subject: "sc://my-org/my-project/my-connection"
{
"iss": "https://vstoken.dev.azure.com/<org_id>",
"aud": "api://AzureADTokenExchange",
"sub": "sc://my-org/my-project/my-connection"
}
GitLab issuer: "https://gitlab.example.com"
audiences: ["https://gitlab.example.com"]
subject: "project_path:my-group/my-project:..."
{
"iss": "https://gitlab.example.com",
"aud": "https://gitlab.example.com",
"sub": "project_path:my-group/my-project:..."
}
CircleCI issuer: "https://oidc.circleci.com/org/<org_id>"
audiences: ["<org_id>"]
subject: "7cc1d11b-46c8-4eb2-9482-4c56a910c7ce"
subject_claim: "oidc.circleci.com/project-id"
{
"iss": "https://oidc.circleci.com/org/<org_id>",
"aud": "<org_id>",
"oidc.circleci.com/project-id": "7cc1d11b-46c8-4eb2-9482-4c56a910c7ce"
}

After you have configured a federation policy for your account, you can use a JWT from your identity provider to access the Databricks API. To do so, first exchange a JWT token from your identity provider for a Databricks OAuth token, and then use the Databricks OAuth token in the Bearer: field of the API call to gain access and complete the call. Tokens must be valid JWTs that are signed using the RS256 or ES256 algorithms.

For guidance on this process, see Use an identity provider token to authenticate to Databricks.

Resources