Perform safe rollout of new deployments for real-time inference

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, you see how to deploy a new version of a machine learning model in production without causing any disruption. You use a blue-green deployment strategy, which is also known as a safe rollout strategy, to introduce a new version of a web service to production. When you use this strategy, you can roll out your new version of the web service to a small subset of users or requests before rolling it out completely.

This article assumes you use online endpoints, or endpoints that are used for online (real-time) inferencing. There are two types of online endpoints: managed online endpoints and Kubernetes online endpoints. For more information about endpoints and the differences between endpoint types, see Managed online endpoints vs. Kubernetes online endpoints.

This article uses managed online endpoints for deployment. But it also includes notes that explain how to use Kubernetes endpoints instead of managed online endpoints.

In this article, you see how to:

  • Define an online endpoint with a deployment called blue to serve the first version of a model.
  • Scale the blue deployment so that it can handle more requests.
  • Deploy the second version of the model, which is called the green deployment, to the endpoint, but send the deployment no live traffic.
  • Test the green deployment in isolation.
  • Mirror a percentage of live traffic to the green deployment to validate it.
  • Send a small percentage of live traffic to the green deployment.
  • Send all live traffic to the green deployment.
  • Delete the unused blue deployment.

Prerequisites

Before following the steps in this article, make sure you have the following prerequisites:

  • A user account that has at least one of the following Azure role-based access control (Azure RBAC) roles:

    • An Owner role for the Azure Machine Learning workspace
    • A Contributor role for the Azure Machine Learning workspace
    • A custom role that has Microsoft.MachineLearningServices/workspaces/onlineEndpoints/* permissions

    For more information, see Manage access to Azure Machine Learning workspaces.

  • Optionally, Docker Engine, installed and running locally. This prerequisite is highly recommended. You need it to deploy a model locally, and it's helpful for debugging.

Prepare your system

Set environment variables

You can configure default values to use with the Azure CLI. To avoid passing in values for your subscription, workspace, and resource group multiple times, run the following code:

az account set --subscription <subscription-ID>
az configure --defaults workspace=<Azure-Machine-Learning-workspace-name> group=<resource-group-name>

Clone the examples repository

To follow along with this article, first clone the examples repository (azureml-examples). Then go to the repository's cli/ directory:

git clone --depth 1 https://github.com/Azure/azureml-examples
cd azureml-examples
cd cli

Tip

Use --depth 1 to clone only the latest commit to the repository, which reduces the time that's needed to complete the operation.

The commands in this tutorial are in the deploy-safe-rollout-online-endpoints.sh file in the cli directory, and the YAML configuration files are in the endpoints/online/managed/sample/ subdirectory.

Note

The YAML configuration files for Kubernetes online endpoints are in the endpoints/online/kubernetes/ subdirectory.

Define the endpoint and deployment

Online endpoints are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and send responses back in real time.

Define an endpoint

The following table lists key attributes to specify when you define an endpoint.

Attribute Required or optional Description
Name Required The name of the endpoint. It must be unique in its Azure region. For more information about naming rules, see Azure Machine Learning online endpoints and batch endpoints.
Authentication mode Optional The authentication method for the endpoint. You can choose between key-based authentication, key, and Azure Machine Learning token-based authentication, aml_token. A key doesn't expire, but a token does expire. For more information about authentication, see Authenticate clients for online endpoints.
Description Optional The description of the endpoint.
Tags Optional A dictionary of tags for the endpoint.
Traffic Optional Rules on how to route traffic across deployments. You represent the traffic as a dictionary of key-value pairs, where the key represents the deployment name and the value represents the percentage of traffic to that deployment. You can set the traffic only after the deployments under an endpoint are created. You can also update the traffic for an online endpoint after the deployments are created. For more information about how to use mirrored traffic, see Allocate a small percentage of live traffic to the new deployment.
Mirror traffic Optional The percentage of live traffic to mirror to a deployment. For more information about how to use mirrored traffic, see Test the deployment with mirrored traffic.

To see a full list of attributes that you can specify when you create an endpoint, see CLI (v2) online endpoint YAML schema. For version 2 of the Azure Machine Learning SDK for Python, see ManagedOnlineEndpoint Class.

Define a deployment

A deployment is a set of resources that are required for hosting the model that does the actual inferencing. The following table describes key attributes to specify when you define a deployment.

Attribute Required or optional Description
Name Required The name of the deployment.
Endpoint name Required The name of the endpoint to create the deployment under.
Model Optional The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification. In this article's examples, a scikit-learn model does regression.
Code path Optional The path to the folder on the local development environment that contains all the Python source code for scoring the model. You can use nested directories and packages.
Scoring script Optional Python code that executes the model on a given input request. This value can be the relative path to the scoring file in the source code folder.
The scoring script receives data submitted to a deployed web service and passes it to the model. The script then executes the model and returns its response to the client. The scoring script is specific to your model and must understand the data that the model expects as input and returns as output.
This article's examples use a score.py file. This Python code must have an init function and a run function. The init function is called after the model is created or updated. You can use it to cache the model in memory, for example. The run function is called at every invocation of the endpoint to do the actual scoring and prediction.
Environment Required The environment to host the model and code. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification. The environment can be a Docker image with Conda dependencies, a Dockerfile, or a registered environment.
Instance type Required The virtual machine size to use for the deployment. For a list of supported sizes, see Managed online endpoints SKU list.
Instance count Required The number of instances to use for the deployment. You base the value on the workload you expect. For high availability, we recommend that you use at least three instances. Azure Machine Learning reserves an extra 20 percent for performing upgrades. For more information, see Azure Machine Learning online endpoints and batch endpoints.

To see a full list of attributes that you can specify when you create a deployment, see CLI (v2) managed online deployment YAML schema. For version 2 of the Python SDK, see ManagedOnlineDeployment Class.

Create an online endpoint

You first set the endpoint name and then configure it. In this article, you use the endpoints/online/managed/sample/endpoint.yml file to configure the endpoint. That file contains the following lines:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: key

The following table describes keys that the endpoint YAML format uses. To see how to specify these attributes, see CLI (v2) online endpoint YAML schema. For information about limits related to managed online endpoints, see Azure Machine Learning online endpoints and batch endpoints.

Key Description
$schema (Optional) The YAML schema. To see all available options in the YAML file, you can view the schema in the preceding code block in a browser.
name The name of the endpoint.
auth_mode The authentication mode. Use key for key-based authentication. Use aml_token for Azure Machine Learning token-based authentication. To get the most recent token, use the az ml online-endpoint get-credentials command.

To create an online endpoint:

  1. Set your endpoint name by running the following Unix command. Replace YOUR_ENDPOINT_NAME with a unique name.

    export ENDPOINT_NAME="<YOUR_ENDPOINT_NAME>"
    

    Important

    Endpoint names must be unique within an Azure region. For example, in the Azure westus2 region, there can be only one endpoint with the name my-endpoint.

  2. Create the endpoint in the cloud by running the following code. This code uses the endpoint.yml file to configure the endpoint:

    az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml
    

Create the blue deployment

You can use the endpoints/online/managed/sample/blue-deployment.yml file to configure the key aspects of a deployment named blue. That file contains the following lines:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: my-endpoint
model:
  path: ../../model-1/model/
code_configuration:
  code: ../../model-1/onlinescoring/
  scoring_script: score.py
environment: 
  conda_file: ../../model-1/environment/conda.yaml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest
instance_type: Standard_DS3_v2
instance_count: 1

To use the blue-deployment.yml file to create the blue deployment for your endpoint, run the following command:

az ml online-deployment create --name blue --endpoint-name $ENDPOINT_NAME -f endpoints/online/managed/sample/blue-deployment.yml --all-traffic

Important

The --all-traffic flag in the az ml online-deployment create command allocates 100 percent of the endpoint traffic to the newly created blue deployment.

In the blue-deployment.yaml file, the path line specifies where to upload files from. The Azure Machine Learning CLI uses this information to upload the files and register the model and environment. As a best practice for production, you should register the model and environment and specify the registered name and version separately in the YAML code. Use the format model: azureml:<model-name>:<model-version> for the model, for example, model: azureml:my-model:1. For the environment, use the format environment: azureml:<environment-name>:<environment-version>, for example, environment: azureml:my-env:1.

For registration, you can extract the YAML definitions of model and environment into separate YAML files and use the commands az ml model create and az ml environment create. To find out more about these commands, run az ml model create -h and az ml environment create -h.

For more information about registering your model as an asset, see Register a model by using the Azure CLI or Python SDK. For more information about creating an environment, see Create a custom environment.

Confirm your existing deployment

One way to confirm your existing deployment is to invoke your endpoint so that it can score your model for a given input request. When you invoke your endpoint via the Azure CLI or the Python SDK, you can choose to specify the name of the deployment to receive incoming traffic.

Note

Unlike the Azure CLI or Python SDK, Azure Machine Learning studio requires you to specify a deployment when you invoke an endpoint.

Invoke an endpoint with a deployment name

When you invoke an endpoint, you can specify the name of a deployment that you want to receive traffic. In this case, Azure Machine Learning routes the endpoint traffic directly to the specified deployment and returns its output. You can use the --deployment-name option for the Azure Machine Learning CLI v2, or the deployment_name option for the Python SDK v2 to specify the deployment.

Invoke the endpoint without specifying a deployment

If you invoke the endpoint without specifying the deployment that you want to receive traffic, Azure Machine Learning routes the endpoint's incoming traffic to the deployments in the endpoint based on traffic control settings.

Traffic control settings allocate specified percentages of incoming traffic to each deployment in the endpoint. For example, if your traffic rules specify that a particular deployment in your endpoint should receive incoming traffic 40 percent of the time, Azure Machine Learning routes 40 percent of the endpoint traffic to that deployment.

To view the status of your existing endpoint and deployment, run the following commands:

az ml online-endpoint show --name $ENDPOINT_NAME 

az ml online-deployment show --name blue --endpoint $ENDPOINT_NAME 

The output lists information about the $ENDPOINT_NAME endpoint and the blue deployment.

Test the endpoint by using sample data

You can invoke the endpoint by using the invoke command. The following command uses the sample-request.json JSON file to send a sample request:

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/model-1/sample-request.json

Scale your existing deployment to handle more traffic

In the deployment described in Deploy and score a machine learning model by using an online endpoint, you set the instance_count value to 1 in the deployment YAML file. You can scale out by using the update command:

az ml online-deployment update --name blue --endpoint-name $ENDPOINT_NAME --set instance_count=2

Note

In the previous command, the --set option overrides the deployment configuration. Alternatively, you can update the YAML file and pass it as input to the update command by using the --file option.

Deploy a new model but don't send it traffic

Create a new deployment named green:

az ml online-deployment create --name green --endpoint-name $ENDPOINT_NAME -f endpoints/online/managed/sample/green-deployment.yml

Because you don't explicitly allocate any traffic to the green deployment, it has zero traffic allocated to it. You can verify that fact by using the following command:

az ml online-endpoint show -n $ENDPOINT_NAME --query traffic

Test the new deployment

Even though the green deployment has 0 percent of traffic allocated to it, you can invoke it directly by using the --deployment option:

az ml online-endpoint invoke --name $ENDPOINT_NAME --deployment-name green --request-file endpoints/online/model-2/sample-request.json

If you want to use a REST client to invoke the deployment directly without going through traffic rules, set the following HTTP header: azureml-model-deployment: <deployment-name>. The following code uses Client for URL (cURL) to invoke the deployment directly. You can run the code in a Unix or Windows Subsystem for Linux (WSL) environment. For instructions for retrieving the $ENDPOINT_KEY value, see Get the key or token for data plane operations.

# get the scoring uri
SCORING_URI=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query scoring_uri)
# use curl to invoke the endpoint
curl --request POST "$SCORING_URI" --header "Authorization: Bearer $ENDPOINT_KEY" --header 'Content-Type: application/json' --header "azureml-model-deployment: green" --data @endpoints/online/model-2/sample-request.json

Test the deployment with mirrored traffic

After you test your green deployment, you can mirror a percentage of the live traffic to your endpoint by copying that percentage of traffic and sending it to the green deployment. Traffic mirroring, which is also called shadowing, doesn't change the results returned to clients—100 percent of requests still flow to the blue deployment. The mirrored percentage of the traffic is copied and also submitted to the green deployment so that you can gather metrics and logging without impacting your clients.

Mirroring is useful when you want to validate a new deployment without impacting clients. For example, you can use mirroring to check whether latency is within acceptable bounds or to check that there are no HTTP errors. The use of traffic mirroring, or shadowing, to test a new deployment is also known as shadow testing. The deployment that receives the mirrored traffic, in this case, the green deployment, can also be called the shadow deployment.

Mirroring has the following limitations:

  • Mirroring is supported for versions 2.4.0 and later of the Azure Machine Learning CLI and versions 1.0.0 and later of the Python SDK. If you use an older version of the Azure Machine Learning CLI or the Python SDK to update an endpoint, you lose the mirror traffic setting.
  • Mirroring isn't currently supported for Kubernetes online endpoints.
  • You can mirror traffic to only one deployment in an endpoint.
  • The maximum percentage of traffic you can mirror is 50 percent. This cap limits the effect on your endpoint bandwidth quota, which has a default value of 5 MBps. Your endpoint bandwidth is throttled if you exceed the allocated quota. For information about monitoring bandwidth throttling, see Bandwidth throttling.

Also note the following behavior:

  • You can configure a deployment to receive only live traffic or mirrored traffic, not both.
  • When you invoke an endpoint, you can specify the name of any of its deployments—even a shadow deployment—to return the prediction.
  • When you invoke an endpoint and specify the name of a deployment to receive incoming traffic, Azure Machine Learning doesn't mirror traffic to the shadow deployment. Azure Machine Learning mirrors traffic to the shadow deployment from traffic sent to the endpoint when you don't specify a deployment.

If you set the green deployment to receive 10 percent of mirrored traffic, clients still receive predictions from the blue deployment only.

Diagram that shows traffic flow through an endpoint. All traffic goes to the blue deployment, and 10 percent is mirrored to the green deployment.

Use the following command to mirror 10 percent of the traffic and send it to the green deployment:

az ml online-endpoint update --name $ENDPOINT_NAME --mirror-traffic "green=10"

You can test mirrored traffic by invoking the endpoint several times without specifying a deployment to receive the incoming traffic:

for i in {1..20} ; do
    az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file endpoints/online/model-1/sample-request.json
done

You can confirm that the specified percentage of the traffic is sent to the green deployment by checking the logs from the deployment:

az ml online-deployment get-logs --name green --endpoint $ENDPOINT_NAME

After testing, you can set the mirror traffic to zero to disable mirroring:

az ml online-endpoint update --name $ENDPOINT_NAME --mirror-traffic "green=0"

Allocate a small percentage of live traffic to the new deployment

After you test your green deployment, allocate a small percentage of traffic to it:

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "blue=90 green=10"

Tip

The total traffic percentage must be either 0 percent, to disable traffic, or 100 percent, to enable traffic.

Your green deployment now receives 10 percent of all live traffic. Clients receive predictions from both the blue and green deployments.

Diagram that shows traffic flow through an endpoint. The blue deployment receives 90 percent of the traffic, and the green deployment, 10 percent.

Send all traffic to the new deployment

When you're fully satisfied with your green deployment, switch all traffic to it.

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "blue=0 green=100"

Remove the old deployment

Use the following steps to delete an individual deployment from a managed online endpoint. Deleting an individual deployment doesn't affect the other deployments in the managed online endpoint:

az ml online-deployment delete --name blue --endpoint $ENDPOINT_NAME --yes --no-wait

Delete the endpoint and deployment

If you aren't going to use the endpoint and deployment, you should delete them. When you delete an endpoint, all its underlying deployments are also deleted.

az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait