Resiliency policies proactively prevent, detect, and recover from your container app failures. In this article, you learn how to apply resiliency policies for applications that use Dapr to integrate with different cloud services, like state stores, pub/sub message brokers, secret stores, and more.
You can configure resiliency policies like retries, timeouts, and circuit breakers for the following outbound and inbound operation directions via a Dapr component:
Outbound operations: Calls from the Dapr sidecar to a component, such as:
Persisting or retrieving state
Publishing a message
Invoking an output binding
Inbound operations: Calls from the Dapr sidecar to your container app, such as:
Subscriptions when delivering a message
Input bindings delivering an event
The following screenshot shows how an application uses a retry policy to attempt to recover from failed requests.
Make sure you have the latest version of the Azure Container App extension.
az extension show --name containerapp
az extension update --name containerapp
Create specific policies
Poznámka
If all properties within a policy are not set during create or update, the CLI automatically applies the recommended default settings. Set specific policies using flags.
Create resiliency policies by targeting an individual policy. For example, to create the Outbound Timeout policy, run the following command.
Update your resiliency policies by targeting an individual policy. For example, to update the response timeout of the Outbound Timeout policy, run the following command.
Navigate into your container app environment in the Azure portal. In the left side menu under Settings, select Dapr components to open the Dapr component pane.
You can add resiliency policies to an existing Dapr component by selecting Add resiliency for that component.
In the resiliency policy pane, select Outbound or Inbound to set policies for outbound or inbound operations. For example, for outbound operations, you can set timeout and HTTP retry policies similar to the following.
Click Save to save the resiliency policies.
Poznámka
Currently, you can only set timeout and retry policies via the Azure portal.
You can edit or remove the resiliency policies by selecting Edit resiliency.
Dôležité
Once you've applied all the resiliency policies, you need to restart your Dapr applications.
Policy specifications
Timeouts
Timeouts are used to early-terminate long-running operations. The timeout policy includes the following properties.
Maximum retries to be executed for a failed http-request.
5
retryBackOff
Yes
Monitor the requests and shut off all traffic to the impacted service when timeout and retry criteria are met.
N/A
retryBackOff.initialDelayInMilliseconds
Yes
Delay between first error and first retry.
1000
retryBackOff.maxIntervalInMilliseconds
Yes
Maximum delay between retries.
10000
Circuit breakers
Define a circuitBreakerPolicy to monitor requests causing elevated failure rates and shut off all traffic to the impacted service when a certain criteria is met.
Cyclical period of time (in seconds) used by the circuit breaker to clear its internal counts. If not provided, the interval is set to the same value as provided for timeoutInSeconds.
15
consecutiveErrors
Yes
Number of request errors allowed to occur before the circuit trips and opens.
10
timeoutInSeconds
Yes
Time period (in seconds) of open state, directly after failure.
5
Circuit breaker process
Specifying consecutiveErrors (the circuit trip condition as
consecutiveFailures > $(consecutiveErrors)-1) sets the number of errors allowed to occur before the circuit trips and opens halfway.
The circuit waits half-open for the timeoutInSeconds amount of time, during which the consecutiveErrors number of requests must consecutively succeed.
If the requests succeed, the circuit closes.
If the requests fail, the circuit remains in a half-opened state.
If you didn't set any intervalInSeconds value, the circuit resets to a closed state after the amount of time you set for timeoutInSeconds, regardless of consecutive request success or failure. If you set intervalInSeconds to 0, the circuit never automatically resets, only moving from half-open to closed state by successfully completing consecutiveErrors requests in a row.
If you did set an intervalInSeconds value, that determines the amount of time before the circuit is reset to closed state, independent of whether the requests sent in half-opened state succeeded or not.
Resiliency logs
From the Monitoring section of your container app, select Logs.
In the Logs pane, write and run a query to find resiliency via your container app system logs. For example, to find whether a resiliency policy was loaded:
ContainerAppConsoleLogs_CL
| where ContainerName_s == "daprd"
| where Log_s contains "Loading Resiliency configuration:"
| project time_t, Category, ContainerAppName_s, Log_s
| order by time_t desc
Click Run to run the query and view the result with the log message indicating the policy is loading.
Or, you can find the actual resiliency policy by enabling debug logs on your container app and querying to see if a resiliency resource is loaded.
Once debug logs are enabled, use a query similar to the following:
ContainerAppConsoleLogs_CL
| where ContainerName_s == "daprd"
| where Log_s contains "Resiliency configuration ("
| project time_t, Category, ContainerAppName_s, Log_s
| order by time_t desc
Click Run to run the query and view the resulting log message with the policy configuration.