Troubleshooting Chaos Agent
This page provides a consolidated guide to troubleshooting issues related to the Chaos Agent used in Azure Chaos Studio. Use this guide to diagnose problems during installation, verify network connectivity, interpret agent status messages, and resolve common errors.
Note
For detailed setup instructions, refer to the Install and Configure Chaos Agent page. For network and security details, see Private Link and Network Security.
Agent Installation Issues
If the Chaos Agent fails to install or appears unhealthy, follow these debugging steps:
- Extension Deployment Failure
- Symptoms: The Virtual Machine (VM) Extensions blade shows a status other than
Provisioning succeeded
(for example, Failed, Error). - Troubleshooting Steps:
- Verify that the target VM meets the minimum prerequisites (autoinstall supported OS, correct version, etc.). See OS Support and Compatibility.
- Confirm that a user-assigned managed identity is attached to the VM.
- Check the Activity Log in the Azure portal for any errors related to extension deployment.
- If the VM is part of a Virtual Machine Scale Set, ensure that the scale set upgrade policy isn't set to Manual. If it is, upgrade instances manually (using
az vmss update-instances
) or switch to an Automatic policy. - Consider uninstalling and reinstalling the extension using the Azure CLI or Azure portal by disabling and re-enabling agent-based faults on your VM:
az vm extension delete --resource-group <ResourceGroup> --vm-name <VMName> --name ChaosAgent az vm extension set --resource-group <ResourceGroup> --vm-name <VMName> --name ChaosAgent --publisher Microsoft.Azure.Chaos --version <version>
- Symptoms: The Virtual Machine (VM) Extensions blade shows a status other than
Network Connectivity Issues
Even when the agent is installed, it may not communicate properly if network connectivity is disrupted.
Symptoms:
- The agent’s Handler status doesn't show
Ready
. - Logs indicate failure to reach the Chaos Studio agent service endpoint.
- The agent’s Handler status doesn't show
Troubleshooting Steps:
- Verify Outbound Access:
Ensure that the VM has outbound network access to the Chaos Agent service endpoint, which follows the pattern:
https://<region>.agents.chaos-prod.azure.com
Replaceregion
with your VM's deployment region. - Check NSG and Firewall Settings:
a. Confirm that any Network Security Group (NSG) attached to the VM allows outbound HTTPS (port 443) traffic.
b. The recommended approach is to allow the ChaosStudio service tag for outbound traffic. - Proxy and Custom DNS:
If your environment uses a proxy or custom DNS settings, verify these settings aren't blocking access to the endpoint. - Private Link Configuration:
For environments configured with Private Link, ensure that: a. The Private Endpoint is correctly set up and approved. b. DNS resolution is updated so that the Chaos Agent service domain resolves to the Private Endpoint’s IP. c. The agent’s configuration is updated accordingly.
- Verify Outbound Access:
Agent Status and Health Checks
The agent reports two key statuses on the VM’s Extensions + applications blade:
Status Field:
Provisioning succeeded
: Indicates the extension was deployed successfully.- Any other status (for example,
Failed
orError
) signals installation issues.
Handler Status Field:
Ready
: Indicates the agent is running and communicating with the Chaos Studio service.NotReady
or an empty status suggests the agent can't connect—commonly due to network issues or misconfigured identities.
How to Check Agent Logs
Windows:
Open Event Viewer → Windows Logs → Application. Filter by the source AzureChaosAgent to view relevant log entries.Linux:
Run the following command to view the latest logs from the Chaos Agent service:journalctl -u azure-chaos-agent --lines 50
Look for error messages indicating connectivity or dependency issues.
Chaos Agent local VM debugging
Debug Agent logs on host
- Windows agent local debugging
- Chaos agent is running as windows service
AzureChaosAgent
- Windows service code runs in the VMExtention handler
- Agent location
- Agent is installed as Azure VM Extension. Agent location is typically at
C:\Packages\Plugins\Microsoft.Azure.Chaos.ChaosWindowsAgent\<version>
. The directory contains bothAzureChaosAgent.exe
andagentsettings.json
and libraries.
- Agent is installed as Azure VM Extension. Agent location is typically at
- Get Agent log on host: Agent log is pushed to windows service log.
- Example powershell command to check latest 50 lines of agent log: ```Get-EventLog -LogName "Application" -Source "AzureChaosAgent" -Newest 50```` .
- Start/Stop Agent on the host
- In Admin powershell, run
Start-Service AzureChaosAgent
orStop-Service AzureChaosAgent
- In Admin powershell, run
- Chaos agent is running as windows service
- Linux agent local debugging
- Linux chaos agent is managed by Systemd.
- Agent location
- If Agent is running, you can locate linux agent executable by running
ps aux | grep chaos
- If Agent is running, you can locate linux agent executable by running
- Get Agent log on host
- Example bash command to latest 50 lines of agent log:
journalctl -u azure-chaos-agent --lines 50
- Example bash command to latest 50 lines of agent log:
- Start/Stop Agent on the host
- run bash command
systemctl start azure-chaos-agent
orsystemctl stop azure-chaos-agent
- run bash command
Other Common Errors and Solutions
Some other issues and their accompanying solutions for the Chaos agent.
Credential or Identity Errors
Error Message | Cause | Solution |
---|---|---|
"Failed to register agent due to credential error." | The VM’s managed identity isn't configured correctly. | Verify that the VM has the correct user-assigned managed identity attached and that it has the required permissions. Refer to the Install and Configure Chaos Agent page for detailed steps. |
Missing Prerequisites for Fault Execution
Error Message | Cause | Solution |
---|---|---|
"Failed to register agent due to API Exception." or "Fault prerequisites not met" (for example, missing stress-ng on Linux) | Required dependencies (like stress-ng) are missing. | Attempt to uninstall and reinstall the Chaos agent. Install the missing dependency on the target VM. For example, on Debian/Ubuntu:sudo apt-get install stress-ng Refer to the OS Support and Compatibility page for further details. |
Network Connectivity Blockage
Error Message | Cause | Solution |
---|---|---|
"The agent log shows an inability to connect to https://<region>.agents.chaos-prod.azure.com. " |
Outbound network traffic is blocked. | Update NSG rules to allow HTTPS traffic to the Chaos Agent service endpoint. Consider using the ChaosStudio service tag for outbound rules. For environments with Private Link, ensure DNS resolves correctly to the Private Endpoint’s IP. |
Extension time-out or “ExtensionHandlerFailed”
Error Message | Cause | Solution |
---|---|---|
"ExtensionHandlerFailed" or time-out errors in the Activity Log. | The agent extension didn't start properly, possibly due to network or resource configuration issues. | - Restart the VM and verify network connectivity. - Check for any interfering security software that may block the extension. - If persistent, reinstall the extension using the Azure CLI (see installation troubleshooting section). |
More Resources
• If you continue to experience issues after following these steps, consider creating an incident with the Chaos Studio team.
• This document is intended to help users quickly diagnose and resolve issues with the Chaos Agent. For further assistance, refer to our support channels or visit the Azure Chaos Studio community forums.
• Install and Configure Chaos Agent
• OS Support and Compatibility
• Private Link and Network Security
• Chaos Agent Known Issues