Unexpected Server Outage

BHaist 25 Reputation points
2025-03-13T19:15:53.5766667+00:00

Hello,

Last night around 5:15am UTC (10:15pm local), our server stopped functioning and was unreachable until deallocating and reallocating. This totaled to 10 hours of downtime and loss of data, which we want to avoid in the future. I was hoping for some advice to troubleshoot the root cause and potential solutions.

I've attached a journalctl log.txt from that time frame and it appears that some software updates were applied automatically - shortly thereafter the server became unresponsive. Looking at the memory allocation, it's clear that we ran out of RAM but it never recovered. I understand that there's a way for the server to automatically heal itself, but I haven't been able to find the option.

Additionally, the server now has its health state as "Unhealthy" but the troubleshooters can't find any issue.

Screenshot 2025-03-13 120504

Screenshot 2025-03-13 120327

Any recommendations?

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
8,508 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. VasimTamboli 5,120 Reputation points
    2025-03-14T00:04:16.74+00:00

    Well in such scenario you can schdule a task for every 5 min which will run powershlle script to check server utiliation and give you alert at 85 % and restart your server if its tochues to 90 % of server RAM, Please find attached Unexpected Server Outage.txtsuggested powershell script.

    Please accept as answer if it does help.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.