This morning I woke up to my phone using mobile data and my home assistant automations not working. Initially I thought it the power was out, but I could turn on the lights just fine. I checked my UniFi app and saw that the server was not connected to the network at all. This meant that the cable got unplugged, the switch isn’t working, or the server isn’t working. It said the switch was connected and another device was connected to the switch so that narrows it down to just 2 cases. So I opened my server closet in the basement and immediately noticed something was wrong. I couldn’t figure out what was wrong but I just felt like something was wrong. Everything was plugged in, the network switch lights were blinking like normal, my raspberry pi was running just fine, even the server indicator lights were on. My main server is an old gaming PC so it has a glass side panel so I looked inside and I could see the fan spinning, but I could not hear it. Usually I have it set to full speed and I can hear full speed very well. I tried rebooting the server with the power button and the fans didn’t go to full speed. As a last resort, I brought down a keyboard and monitor. As soon as I plugged in the monitor, I saw that there was a prompt to set the time on the BIOS! Picture of the prompt In my opinion, this was the stupidest reason for an outage.

Further investigations

I dug a little deeper and discovered that the BIOS had been reset during a power outage right before all of this happened. So far I have consulted the motherboard manual and found absolutely nothing about this. After a bit of research, I think it could have been that the CMOS battery has died. This is a really simple fix but I don’t have the replacement battery right now. This means that I will have the same exact issue after the next power outage unless I replace the battery.

Preventing this in the future

From what I can see, I just need to replace the CMOS battery. But this computer has been running for over 4 years, so what is stopping this from happening again around 2028? The most effective solution is going to be preventing power outages in the first place. This can be done using a battery backup or a standby generator. Standby generators will last longer during a power outage but are typically more expensive and harder to setup than a simple battery backup.

  • tal
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    4 months ago

    I once worked on a product that you really did not want to have not coming back up. I was on it several years after the original engineers had designed an early model. Said engineers had not tested what happened when the CMOS battery died and triggered a reset of BIOS settings, brought it back to the hardware platform’s default state. When it did, the thing entered a non-bootable state. You could, with a serial port, access the BIOS and fiddle the settings back for one good boot…but the CMOS battery was non-removable, soldered to the motherboard. Our manufacturing process had not involved changing the default BIOS settings, just what was stored in CMOS. Oops.

    IIRC our customer care guys just sent out new models for free to affected customers – the original hardware model wasn’t sold in large volume, and the cost of the actual hardware components wasn’t especially large relative to the cost of the product.

    I had one sitting around on my desk, as it was sometimes handy to have a physically-accessible device to do work on. I rolled down to Radio Shack – yes, this was a few years back – got a removable CMOS battery case, stripped the non-removable battery out, soldered the battery case to the motherboard, and had the only instance of the device out there that could take a fresh CMOS battery.