Configure the serverless environment
This article explains how to use a serverless notebook's Environment side panel to configure dependencies, serverless budget policies, memory, and environment version. This panel provides a single place to manage the notebook’s serverless settings. Settings configured in this panel only apply when the notebook is connected to serverless compute.
To expand the Environment side panel, click on the button to the right of the notebook.
For information on configuring environment settings on non-notebook job tasks, see Configure environment for non-notebook job tasks.
Use high memory serverless compute
Important
This feature is in Public Preview.
If you run into out-of-memory errors in your notebook, you can configure the notebook to use a higher memory size. This setting increases the size of the REPL memory used when running code in the notebook. It does not affect the memory size of the Spark session. Serverless usage with high memory has a higher DBU emission rate than standard memory.
- In the notebook UI, click the Environment side panel
.
- Under Memory, select High memory.
- Click Apply.
This setting also applies to notebook job tasks, which run using the notebook’s memory preferences. Updating the memory preference in the notebook affects the next job run.
Select a serverless budget policy
Important
This feature is in Public Preview.
Serverless budget policies allow your organization to apply custom tags on serverless usage for granular billing attribution.
If your workspace uses serverless budget policies to attribute serverless usage, you can select the serverless budget policy you want to apply to the notebook. If a user is assigned to only one serverless budget policy, that policy is selected by default.
You can select the serverless budget policy after your notebook is connected to serverless compute by using the Environment side panel:
- In the notebook UI, click the Environment side panel
.
- Under Budget policy select the serverless budget policy you want to apply to your notebook.
- Click Apply.
When this setup is complete, all notebook usage inherits the serverless budget policy’s custom tags.
Note
If your notebook originates from a Git repository or does not have an assigned serverless budget policy, it defaults to your last chosen serverless budget policy when it is next attached to serverless compute.
Select an environment version
Environment versions allow serverless workloads to receive independent engine upgrades without affecting application compatibility. To see details on each environment version, see Serverless environment versions. Databricks recommends picking the latest version to get the most up-to-date notebook features.
To select an environment version:
- In the notebook UI, click the Environment side panel
.
- Under Environment version, select a version.
- Click Apply.
Add dependencies to the notebook
Because serverless does not support compute policies or init scripts, you must add your custom library dependencies using the Environment side panel. You can either add libraries individually or use a shareable base environment to install multiple libraries.
To individually add a library dependency:
- In the notebook UI, click the Environment side panel
.
- In the Dependencies section, click Add Dependency and enter the path of the library dependency in the field. You can specify a dependency in any format that is valid in a requirements.txt file.
- Click Apply. This installs the dependencies in the notebook virtual environment and restarts the Python process.
A job that uses serverless compute installs the notebook's environment specification before running the notebook code. This means that you don't need to add dependencies when scheduling notebooks as jobs.
Important
Do not install PySpark or any library that installs PySpark as a dependency on your serverless notebooks. Doing so will stop your session and result in an error. If this occurs, remove the library and reset your environment.
To view installed dependencies, click the Installed tab in the Environments side panel. The pip installation logs for the notebook environment are also available by clicking pip logs at the bottom of the panel.
Configure a base environment
A base environment is a YAML file stored as a workspace file or on a Unity Catalog volume that specifies additional environment dependencies. Base environments can be shared among notebooks. To configure a base environment:
Create a YAML file that defines settings for a Python virtual environment. The following example YAML, which is based on the MLflow projects environment specification, defines a base environment with a few library dependencies:
client: '1' dependencies: - --index-url https://pypi.org/simple - -r "/Workspace/Shared/requirements.txt" - my-library==6.1 - '/Workspace/Shared/Path/To/simplejson-3.19.3-py3-none-any.whl' - git+https://github.com/databricks/databricks-cli
Upload the YAML file as a workspace file or to a Unity Catalog volume. See Import a file or Upload files to a Unity Catalog volume.
To the right of the notebook, click the
button to expand the Environment side panel. This button only appears when a notebook is connected to serverless compute.
In the Base Environment field, enter the path of the uploaded YAML file or navigate to it and select it.
Click Apply. This installs the dependencies in the notebook virtual environment and restarts the Python process.
Users can override the dependencies specified in the base environment by installing dependencies individually.
Reset the environment dependencies
If your notebook is connected to serverless compute, Databricks automatically caches the content of the notebook’s virtual environment. This means you generally do not need to reinstall the Python dependencies specified in the Environment side panel when you open an existing notebook, even if it has been disconnected due to inactivity.
Python virtual environment caching also applies to jobs. When a job is run, any task of the job that shares the same set of dependencies as a completed task in that run is faster, as required dependencies are already available.
Note
If you change the implementation of a custom Python package used in a job on serverless, you must also update its version number so that jobs can pick up the latest implementation.
To clear the environment cache and perform a fresh install of the dependencies specified in the Environment side panel of a notebook attached to serverless compute, click the arrow next to Apply and then click Reset environment.
If you install packages that break or change the core notebook or Apache Spark environment, remove the offending packages and then reset the environment. Detaching then reattaching the notebook does not clear the entire environment cache.
Configure environment for non-notebook job tasks
For job task types such as Python script, Python wheel, or dbt tasks, the library dependencies are inherited from the serverless environment version. To view the list of installed libraries, see the Installed Python libraries section of the environment version you are using. If a task requires a Python library that is not installed, you can install the library from workspace files, Unity Catalog volumes, or public package repositories.
To add a library when you create or edit a job task:
In the Environment and Libraries dropdown menu, click
next to the Default environment or click + Add new environment.
Select the environment version from the Environment version drop-down. See Serverless environment versions. Databricks recommends picking the latest version to get the most up-to-date features.
In the Configure environment dialog, click + Add library.
Select the type of dependency from the dropdown menu under Libraries.
In the File Path text box, enter the path to the library.
For a Python Wheel in a workspace file, the path should be absolute and start with
/Workspace/
.For a Python Wheel in a Unity Catalog volume, the path should be
/Volumes/<catalog>/<schema>/<volume>/<path>.whl
.For a
requirements.txt
file, select PyPi and enter-r /path/to/requirements.txt
.
- Click Confirm or + Add library to add another library.
- If you’re adding a task, click Create task. If you’re editing a task, click Save task.
Configure default Python package repositories
Workspace admins can configure private or authenticated package repositories within workspaces as the default pip configuration for both serverless notebooks and serverless jobs. This allows users to install packages from internal Python repositories without explicitly defining index-url
or extra-index-url
. However, if these values are specified in code or in a notebook, they take precedence over the workspace defaults.
This configuration leverages Databricks secrets to securely store and manage repository URLs and credentials. Administrators can configure the setup using the workspace admin settings page or using a predefined secret scope and the Databricks CLI secrets commands or the REST API.
Setup default dependencies for a workspace
Workspace admins can add or remove the default Python package repositories using the workspace admin settings page.
- As a workspace administrator, log in to the Databricks workspace.
- Click your username in the top bar of the Databricks workspace and select Settings.
- Click on the Compute tab.
- Next to Default Package Repositories, click Manage.
- (Optional) Add or remove an index URL, extra index URLs or a custom SSL certificate.
- Click Save to save the changes.
Note
Modifications or deletions to secrets are applied after reattaching serverless compute to notebooks or rerunning the serverless jobs.
Setup using the secrets CLI or REST API
To configure default Python package repositories using the CLI or REST API, create a predefined secret scope and configure access permissions, then add the package repository secrets.
Predefined secret scope name
Workspace administrators can set default pip index URLs or extra index URLs along with authentication tokens and secrets in a designated secret scope under predefined keys:
- Secret scope name:
databricks-package-management
- Secret key for index-url:
pip-index-url
- Secret key for extra-index-urls:
pip-extra-index-urls
- Secret key for SSL certification content:
pip-cert
Create the secret scope
A secret scope can be created using the Databricks CLI secrets commands or the REST API. After creating the secret scope, configure access control lists to grant all workspace users read access. This ensures that the repository remains secure and cannot be altered by individual users. The secret scope must use the predefined secret scope name databricks-package-management
.
databricks secrets create-scope databricks-package-management
databricks secrets put-acl databricks-package-management admins MANAGE
databricks secrets put-acl databricks-package-management users READ
Add Python package repository secrets
Add the Python package repository details using the predefined secret key names, with all three fields being optional.
# Add index URL.
databricks secrets put-secret --json '{"scope": "databricks-package-management", "key": "pip-index-url", "string_value":"<index-url-value>"}'
# Add extra index URLs. If you have multiple extra index URLs, separate them using white space.
databricks secrets put-secret --json '{"scope": "databricks-package-management", "key": "pip-extra-index-urls", "string_value":"<extra-index-url-1 extra-index-url-2>"}'
# Add cert content. If you want to pip configure a custom SSL certificate, put the cert file content here.
databricks secrets put-secret --json '{"scope": "databricks-package-management", "key": "pip-cert", "string_value":"<cert-content>"}'
Modify or delete private PyPI repository secrets
To modify PyPI repository secrets use the put-secret
command. To delete PyPI repository secrets, use delete-secret
as shown below:
# delete secret
databricks secrets delete-secret databricks-package-management pip-index-url
databricks secrets delete-secret databricks-package-management pip-extra-index-urls
databricks secrets delete-secret databricks-package-management pip-cert
# delete scope
databricks secrets delete-scope databricks-package-management