AKS Unable to Access IMDS
I have a simple FastAPI mock service deployed on AKS that uses the Python Azure SDK to access azure storage. It uses MI with Storage Blob Data Contributor with client_id specified via environment variable. It's based on the Azure documentation…
CUDA-capable device(s) is/are busy or unavailable
I following this document: https://learn.microsoft.com/en-us/azure/aks/gpu-cluster?tabs=add-azure-linux-gpu-node-pool I create a nodePool with the type Standard_NV36ads_A10_v5. I checked the Gpu driver and the toolkit was installed by Azure, not by Gpu…
Unable to delete AKS load balancer where VMSS is already deleted
Hi team, I cannot delete the "kuebernetes" load balancer from my deleted AKS clusters. I also cannot delete the backpools. I don't have any support plan so I cant file a ticket. Can I get some support to delete the resource please. I am still…
Karpenter on AKS: Why is my subnet giving "SubnetIsFull" errors when it has 1000+ available IP addresses?
Hello! I'm trying to run Karpenter to my AKS cluster in order to manage spot instances. I was able to get the service deployed to the cluster, but when I try to opt in a pod to Karpenter management with proper tolerations, I get the following…
1 FilesystemIsReadOnly: Node condition ReadonlyFilesystem is now: True, reason: FilesystemIsReadOnly, message: "EXT4-fs (sda1): Remounting filesystem read-only" - This error in AKS cluster -
1 FilesystemIsReadOnly: Node condition ReadonlyFilesystem is now: True, reason: FilesystemIsReadOnly, message: "EXT4-fs (sda1): Remounting filesystem read-only" - This error in AKS cluster - doesn't autoscale of Azure Kubernetes cluster takes…
.net 8 docker image on AKS
Hi, We have a web project based on .NET which is containerized and published on ACR and further deployed on AKS. While performing the .NET version upgrade, since docker image on linux now doesn't runs on privileged port 80, hence we are exposing 8080…
Socket.io Client Not Connecting to AKS-Deployed Socket.io Server (Only Receiving 40 Without sid)
I’m facing an issue where my Socket.io client fails to fully connect to a Socket.io server deployed on AKS. When running Socket.io locally, the client correctly receives the 40{"sid":"<socket_id>"} message, meaning the…

Network troubleshooting on AKS
Hi, I have two similar clusters one for development and other for testing. Node size, web app versions, node pools almost everything is similar except one is in South India and other in UK South. Out of this the AKS in UK South is lagging a lot. I…
Open Cost giving an error to download the price list
Issue Summary We are experiencing an issue with OpenCost in our Azure Kubernetes Service (AKS) cluster. After enabling OpenCost using the command az aks update --resource-group <resource-group> --name <cluster-name> --enable-cost-analysis,…

I want to delete the whole aks, but the vmss stuck in instance deletion, so that deletion of aks is failed
I want to delete the whole aks, but the vmss stuck in instance deletion for a whole day, so that deletion of aks is failed. Please help ASAP. Thanks.
Windows Agent Pool not able to pull images
Getting the following error - Failed to pull image "..." [rpc error: code = Unknown desc = failed to pull and unpack image "": failed to extract layer sha256:10...: failed to safefile.openRelative failed in Win32: open…
Solved -- AKS cluster with private network is not able to connect to VM on different private network
There is an AKS cluster with private virtual network enabled and there is a VM present in different private virtual network with public IP assigned. Peering between these two private virtual networks is present. I can access AKS commands from portal 'run…
What is the best way to retrieve AKS costs using a REST API?
Is it somehow possible to read out AKS costs via REST API? I made a POST request to the Cost Management API endpoint. management.azure.com/subscriptions/xxxxx/providers/Microsoft.CostManagement/query?api-version=2024-08-01 with payload…
Unable to install Flux Extension in AKS private cluster
We are unable to install the flux extension in our AKS private cluster. The installation of the extension runs for some time, then the cluster always ends up in "Failed" state with this error…
AKS Virtual Node quota "leak" with KEDA ScaledJob
we recently enabled the virtual node on our AKS cluster and use it to run various background jobs. These jobs are triggered by KEDA and specifically use a ScaledJob. When KEDA sees a message in a service bus queue it (keda) correctly starts a ScaledJob…

image not available anymore in microsoft container registry
Hello All, the image mcr.microsoft.com/azuredocs/azure-vote-front:v1 is not available anymore, and on a lot of pages from microsoft to this image is refered. could you put it back or tell me where i can find it now? Normal BackOff 21s …
appgw.ingress.kubernetes.io/request-timeout does not work
I am using an Azure managed AKS and Appgateway setup. I am trying to increase the request timeout from "30" (which seems to be the default value) to "60". I have followed this document…
Kubernetes DNS Resolution Failure - Intermittent Connectivity Issue to 168.63.129.16 (Working Previously)
Hello, Our Kubernetes cluster is experiencing DNS resolution failures. CoreDNS logs show "i/o timeout" errors when attempting to reach 168.63.129.16:53. This issue began approximately two days ago. Prior to this, DNS resolution was functioning…
Cluster cant schedule new pods.
Kubernetes Cluster cannot issue new pods. All currently issued ones are active, but as soon as you try to restart a pod or create a new one its stuck in pending mode.No Recent events events or conditions in the pod/deployment details. All nodes appears…
Some requests from our service to Azure Orchestration is getting "Connection reset by peer"
Hello, Where using the libcloud package in Python in order to setup the driver for Azure of the type: DriverType.COMPUTE.AZURE_ARM The service is working fine for lots of years, while recently we're getting this error from some requests: ('Connection…