Hello,
Last night around 5:15am UTC (10:15pm local), our server stopped functioning and was unreachable until deallocating and reallocating. This totaled to 10 hours of downtime and loss of data, which we want to avoid in the future. I was hoping for some advice to troubleshoot the root cause and potential solutions.
I've attached a journalctl log.txt from that time frame and it appears that some software updates were applied automatically - shortly thereafter the server became unresponsive. Looking at the memory allocation, it's clear that we ran out of RAM but it never recovered. I understand that there's a way for the server to automatically heal itself, but I haven't been able to find the option.
Additionally, the server now has its health state as "Unhealthy" but the troubleshooters can't find any issue.


Any recommendations?