Perform safe rollout of new deployments for real-time inference
APPLIES TO:
Azure CLI ml extension v2 (current)
Python SDK azure-ai-ml v2 (current)
In this article, you see how to deploy a new version of a machine learning model in production without causing any disruption. You use a blue-green deployment strategy, which is also known as a safe rollout strategy, to introduce a new version of a web service to production. When you use this strategy, you can roll out your new version of the web service to a small subset of users or requests before rolling it out completely.
This article assumes you use online endpoints, or endpoints that are used for online (real-time) inferencing. There are two types of online endpoints: managed online endpoints and Kubernetes online endpoints. For more information about endpoints and the differences between endpoint types, see Managed online endpoints vs. Kubernetes online endpoints.
This article uses managed online endpoints for deployment. But it also includes notes that explain how to use Kubernetes endpoints instead of managed online endpoints.
In this article, you see how to:
- Define an online endpoint with a deployment called
blue
to serve the first version of a model. - Scale the
blue
deployment so that it can handle more requests. - Deploy the second version of the model, which is called the
green
deployment, to the endpoint, but send the deployment no live traffic. - Test the
green
deployment in isolation. - Mirror a percentage of live traffic to the
green
deployment to validate it. - Send a small percentage of live traffic to the
green
deployment. - Send all live traffic to the
green
deployment. - Delete the unused
blue
deployment.
Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
The Azure CLI and the
ml
extension to the Azure CLI. For more information, see Install, set up, and use the CLI (v2).Important
The CLI examples in this article assume that you are using the Bash (or compatible) shell. For example, from a Linux system or Windows Subsystem for Linux.
An Azure Machine Learning workspace. If you don't have one, use the steps in the Install, set up, and use the CLI (v2) to create one.
A user account that has at least one of the following Azure role-based access control (Azure RBAC) roles:
- An Owner role for the Azure Machine Learning workspace
- A Contributor role for the Azure Machine Learning workspace
- A custom role that has
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*
permissions
For more information, see Manage access to Azure Machine Learning workspaces.
Optionally, Docker Engine, installed and running locally. This prerequisite is highly recommended. You need it to deploy a model locally, and it's helpful for debugging.
Prepare your system
Set environment variables
You can configure default values to use with the Azure CLI. To avoid passing in values for your subscription, workspace, and resource group multiple times, run the following code:
az account set --subscription <subscription-ID>
az configure --defaults workspace=<Azure-Machine-Learning-workspace-name> group=<resource-group-name>
Clone the examples repository
To follow along with this article, first clone the examples repository (azureml-examples). Then go to the repository's cli/
directory:
git clone --depth 1 https://github.com/Azure/azureml-examples
cd azureml-examples
cd cli
Tip
Use --depth 1
to clone only the latest commit to the repository, which reduces the time that's needed to complete the operation.
The commands in this tutorial are in the deploy-safe-rollout-online-endpoints.sh file in the cli directory, and the YAML configuration files are in the endpoints/online/managed/sample/ subdirectory.
Note
The YAML configuration files for Kubernetes online endpoints are in the endpoints/online/kubernetes/ subdirectory.
Define the endpoint and deployment
Online endpoints are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and send responses back in real time.
Define an endpoint
The following table lists key attributes to specify when you define an endpoint.
Attribute | Required or optional | Description |
---|---|---|
Name | Required | The name of the endpoint. It must be unique in its Azure region. For more information about naming rules, see Azure Machine Learning online endpoints and batch endpoints. |
Authentication mode | Optional | The authentication method for the endpoint. You can choose between key-based authentication, key , and Azure Machine Learning token-based authentication, aml_token . A key doesn't expire, but a token does expire. For more information about authentication, see Authenticate clients for online endpoints. |
Description | Optional | The description of the endpoint. |
Tags | Optional | A dictionary of tags for the endpoint. |
Traffic | Optional | Rules on how to route traffic across deployments. You represent the traffic as a dictionary of key-value pairs, where the key represents the deployment name and the value represents the percentage of traffic to that deployment. You can set the traffic only after the deployments under an endpoint are created. You can also update the traffic for an online endpoint after the deployments are created. For more information about how to use mirrored traffic, see Allocate a small percentage of live traffic to the new deployment. |
Mirror traffic | Optional | The percentage of live traffic to mirror to a deployment. For more information about how to use mirrored traffic, see Test the deployment with mirrored traffic. |
To see a full list of attributes that you can specify when you create an endpoint, see CLI (v2) online endpoint YAML schema. For version 2 of the Azure Machine Learning SDK for Python, see ManagedOnlineEndpoint Class.
Define a deployment
A deployment is a set of resources that are required for hosting the model that does the actual inferencing. The following table describes key attributes to specify when you define a deployment.
Attribute | Required or optional | Description |
---|---|---|
Name | Required | The name of the deployment. |
Endpoint name | Required | The name of the endpoint to create the deployment under. |
Model | Optional | The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification. In this article's examples, a scikit-learn model does regression. |
Code path | Optional | The path to the folder on the local development environment that contains all the Python source code for scoring the model. You can use nested directories and packages. |
Scoring script | Optional | Python code that executes the model on a given input request. This value can be the relative path to the scoring file in the source code folder. The scoring script receives data submitted to a deployed web service and passes it to the model. The script then executes the model and returns its response to the client. The scoring script is specific to your model and must understand the data that the model expects as input and returns as output. This article's examples use a score.py file. This Python code must have an init function and a run function. The init function is called after the model is created or updated. You can use it to cache the model in memory, for example. The run function is called at every invocation of the endpoint to do the actual scoring and prediction. |
Environment | Required | The environment to host the model and code. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification. The environment can be a Docker image with Conda dependencies, a Dockerfile, or a registered environment. |
Instance type | Required | The virtual machine size to use for the deployment. For a list of supported sizes, see Managed online endpoints SKU list. |
Instance count | Required | The number of instances to use for the deployment. You base the value on the workload you expect. For high availability, we recommend that you use at least three instances. Azure Machine Learning reserves an extra 20 percent for performing upgrades. For more information, see Azure Machine Learning online endpoints and batch endpoints. |
To see a full list of attributes that you can specify when you create a deployment, see CLI (v2) managed online deployment YAML schema. For version 2 of the Python SDK, see ManagedOnlineDeployment Class.
Create an online endpoint
You first set the endpoint name and then configure it. In this article, you use the endpoints/online/managed/sample/endpoint.yml file to configure the endpoint. That file contains the following lines:
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: key
The following table describes keys that the endpoint YAML format uses. To see how to specify these attributes, see CLI (v2) online endpoint YAML schema. For information about limits related to managed online endpoints, see Azure Machine Learning online endpoints and batch endpoints.
Key | Description |
---|---|
$schema |
(Optional) The YAML schema. To see all available options in the YAML file, you can view the schema in the preceding code block in a browser. |
name |
The name of the endpoint. |
auth_mode |
The authentication mode. Use key for key-based authentication. Use aml_token for Azure Machine Learning token-based authentication. To get the most recent token, use the az ml online-endpoint get-credentials command. |
To create an online endpoint:
Set your endpoint name by running the following Unix command. Replace
YOUR_ENDPOINT_NAME
with a unique name.export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"
Important
Endpoint names must be unique within an Azure region. For example, in the Azure
westus2
region, there can be only one endpoint with the namemy-endpoint
.Create the endpoint in the cloud by running the following code. This code uses the endpoint.yml file to configure the endpoint:
az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml
Create the blue deployment
You can use the endpoints/online/managed/sample/blue-deployment.yml file to configure the key aspects of a deployment named blue
. That file contains the following lines:
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: my-endpoint
model:
path: ../../model-1/model/
code_configuration:
code: ../../model-1/onlinescoring/
scoring_script: score.py
environment:
conda_file: ../../model-1/environment/conda.yaml
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest
instance_type: Standard_DS3_v2
instance_count: 1
To use the blue-deployment.yml file to create the blue
deployment for your endpoint, run the following command:
az ml online-deployment create --name blue --endpoint-name $ENDPOINT_NAME -f endpoints/online/managed/sample/blue-deployment.yml --all-traffic
Important
The --all-traffic
flag in the az ml online-deployment create
command allocates 100 percent of the endpoint traffic to the newly created blue
deployment.
In the blue-deployment.yaml file, the path
line specifies where to upload files from. The Azure Machine Learning CLI uses this information to upload the files and register the model and environment. As a best practice for production, you should register the model and environment and specify the registered name and version separately in the YAML code. Use the format model: azureml:<model-name>:<model-version>
for the model, for example, model: azureml:my-model:1
. For the environment, use the format environment: azureml:<environment-name>:<environment-version>
, for example, environment: azureml:my-env:1
.
For registration, you can extract the YAML definitions of model
and environment
into separate YAML files and use the commands az ml model create
and az ml environment create
. To find out more about these commands, run az ml model create -h
and az ml environment create -h
.
For more information about registering your model as an asset, see Register a model by using the Azure CLI or Python SDK. For more information about creating an environment, see Create a custom environment.
Confirm your existing deployment
One way to confirm your existing deployment is to invoke your endpoint so that it can score your model for a given input request. When you invoke your endpoint via the Azure CLI or the Python SDK, you can choose to specify the name of the deployment to receive incoming traffic.
Note
Unlike the Azure CLI or Python SDK, Azure Machine Learning studio requires you to specify a deployment when you invoke an endpoint.
Invoke an endpoint with a deployment name
When you invoke an endpoint, you can specify the name of a deployment that you want to receive traffic. In this case, Azure Machine Learning routes the endpoint traffic directly to the specified deployment and returns its output. You can use the --deployment-name
option for the Azure Machine Learning CLI v2, or the deployment_name
option for the Python SDK v2 to specify the deployment.
Invoke the endpoint without specifying a deployment
If you invoke the endpoint without specifying the deployment that you want to receive traffic, Azure Machine Learning routes the endpoint's incoming traffic to the deployments in the endpoint based on traffic control settings.
Traffic control settings allocate specified percentages of incoming traffic to each deployment in the endpoint. For example, if your traffic rules specify that a particular deployment in your endpoint should receive incoming traffic 40 percent of the time, Azure Machine Learning routes 40 percent of the endpoint traffic to that deployment.
To view the status of your existing endpoint and deployment, run the following commands:
az ml online-endpoint show --name $ENDPOINT_NAME
az ml online-deployment show --name blue --endpoint $ENDPOINT_NAME
The output lists information about the $ENDPOINT_NAME
endpoint and the blue
deployment.
Test the endpoint by using sample data
You can invoke the endpoint by using the invoke
command. The following command uses the sample-request.json JSON file to send a sample request:
az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/model-1/sample-request.json
Scale your existing deployment to handle more traffic
In the deployment described in Deploy and score a machine learning model by using an online endpoint, you set the instance_count
value to 1
in the deployment YAML file. You can scale out by using the update
command:
az ml online-deployment update --name blue --endpoint-name $ENDPOINT_NAME --set instance_count=2
Note
In the previous command, the --set
option overrides the deployment configuration. Alternatively, you can update the YAML file and pass it as input to the update
command by using the --file
option.
Deploy a new model but don't send it traffic
Create a new deployment named green
:
az ml online-deployment create --name green --endpoint-name $ENDPOINT_NAME -f endpoints/online/managed/sample/green-deployment.yml
Because you don't explicitly allocate any traffic to the green
deployment, it has zero traffic allocated to it. You can verify that fact by using the following command:
az ml online-endpoint show -n $ENDPOINT_NAME --query traffic
Test the new deployment
Even though the green
deployment has 0 percent of traffic allocated to it, you can invoke it directly by using the --deployment
option:
az ml online-endpoint invoke --name $ENDPOINT_NAME --deployment-name green --request-file endpoints/online/model-2/sample-request.json
If you want to use a REST client to invoke the deployment directly without going through traffic rules, set the following HTTP header: azureml-model-deployment: <deployment-name>
. The following code uses Client for URL (cURL) to invoke the deployment directly. You can run the code in a Unix or Windows Subsystem for Linux (WSL) environment. For instructions for retrieving the $ENDPOINT_KEY
value, see Get the key or token for data plane operations.
# get the scoring uri
SCORING_URI=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query scoring_uri)
# use curl to invoke the endpoint
curl --request POST "$SCORING_URI" --header "Authorization: Bearer $ENDPOINT_KEY" --header 'Content-Type: application/json' --header "azureml-model-deployment: green" --data @endpoints/online/model-2/sample-request.json
Test the deployment with mirrored traffic
After you test your green
deployment, you can mirror a percentage of the live traffic to your endpoint by copying that percentage of traffic and sending it to the green
deployment. Traffic mirroring, which is also called shadowing, doesn't change the results returned to clients—100 percent of requests still flow to the blue
deployment. The mirrored percentage of the traffic is copied and also submitted to the green
deployment so that you can gather metrics and logging without impacting your clients.
Mirroring is useful when you want to validate a new deployment without impacting clients. For example, you can use mirroring to check whether latency is within acceptable bounds or to check that there are no HTTP errors. The use of traffic mirroring, or shadowing, to test a new deployment is also known as shadow testing. The deployment that receives the mirrored traffic, in this case, the green
deployment, can also be called the shadow deployment.
Mirroring has the following limitations:
- Mirroring is supported for versions 2.4.0 and later of the Azure Machine Learning CLI and versions 1.0.0 and later of the Python SDK. If you use an older version of the Azure Machine Learning CLI or the Python SDK to update an endpoint, you lose the mirror traffic setting.
- Mirroring isn't currently supported for Kubernetes online endpoints.
- You can mirror traffic to only one deployment in an endpoint.
- The maximum percentage of traffic you can mirror is 50 percent. This cap limits the effect on your endpoint bandwidth quota, which has a default value of 5 MBps. Your endpoint bandwidth is throttled if you exceed the allocated quota. For information about monitoring bandwidth throttling, see Bandwidth throttling.
Also note the following behavior:
- You can configure a deployment to receive only live traffic or mirrored traffic, not both.
- When you invoke an endpoint, you can specify the name of any of its deployments—even a shadow deployment—to return the prediction.
- When you invoke an endpoint and specify the name of a deployment to receive incoming traffic, Azure Machine Learning doesn't mirror traffic to the shadow deployment. Azure Machine Learning mirrors traffic to the shadow deployment from traffic sent to the endpoint when you don't specify a deployment.
If you set the green
deployment to receive 10 percent of mirrored traffic, clients still receive predictions from the blue
deployment only.
Use the following command to mirror 10 percent of the traffic and send it to the green
deployment:
az ml online-endpoint update --name $ENDPOINT_NAME --mirror-traffic "green=10"
You can test mirrored traffic by invoking the endpoint several times without specifying a deployment to receive the incoming traffic:
for i in {1..20} ; do
az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/model-1/sample-request.json
done
You can confirm that the specified percentage of the traffic is sent to the green
deployment by checking the logs from the deployment:
az ml online-deployment get-logs --name green --endpoint $ENDPOINT_NAME
After testing, you can set the mirror traffic to zero to disable mirroring:
az ml online-endpoint update --name $ENDPOINT_NAME --mirror-traffic "green=0"
Allocate a small percentage of live traffic to the new deployment
After you test your green
deployment, allocate a small percentage of traffic to it:
az ml online-endpoint update --name $ENDPOINT_NAME --traffic "blue=90 green=10"
Tip
The total traffic percentage must be either 0 percent, to disable traffic, or 100 percent, to enable traffic.
Your green
deployment now receives 10 percent of all live traffic. Clients receive predictions from both the blue
and green
deployments.
Send all traffic to the new deployment
When you're fully satisfied with your green
deployment, switch all traffic to it.
az ml online-endpoint update --name $ENDPOINT_NAME --traffic "blue=0 green=100"
Remove the old deployment
Use the following steps to delete an individual deployment from a managed online endpoint. Deleting an individual deployment doesn't affect the other deployments in the managed online endpoint:
az ml online-deployment delete --name blue --endpoint $ENDPOINT_NAME --yes --no-wait
Delete the endpoint and deployment
If you aren't going to use the endpoint and deployment, you should delete them. When you delete an endpoint, all its underlying deployments are also deleted.
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait