I have recently built a new PC, to be used as a server. For months now, I have been getting unexplained crashes, sometimes after a few minutes, sometimes after a few days, where the PC just reboots without any trace in the logs. Just normal occasional status logs, and then, a few seconds later, the log of a normal boot process.

This is slowly driving me crazy because I just can’t make out the issue. I have tried multiple different Linux installs, swapped out the ssd and PSU and ran a ram test but this behaviour stills persists.

Today something was different. Instead of rebooting, it showed me this blue screen, this time finally with a log. But I still can’t seem to make out the issues. Some quick internet searches show some very vague answers; everything from software to hardware, and psu to CPU.

Can any Linux wizard help me fix my problem? Link to the log

Update: I have now faced an even weirder issue. I booted up, installed cpupower like a comment suggested, installed man to look up its documentation and then the screen froze, and I was forced to reboot the PC by pressing the power button for 3s. Then when I booted back up, my bash history was reset to a state a from a few days back (~.bash_history mod time from 2 days ago) even though I rebooted several times since then, and have not had any persistency errors like this. man was also not installed anymore. Even weirder is that cpupower was still installed. So it seems like some data was saved, while other files were discarded. I will now use a second ssd and try to replicate this. I now suspect some kind of Storage issue, even though the two ssd drives in question have never caused issues in my laptop. This seems scary, I have never witnessed a so weirdly corrupted Linux install, ever.

  • MrPistachios
    link
    fedilink
    English
    arrow-up
    5
    ·
    2 days ago

    I had issues with reboots on a old server and it turned out to be the memory even though I didnt find anything in the memtests, maybe pull one stick out and try, and if it happens swap the other stick and try

    • Molecular5869@feddit.orgOP
      link
      fedilink
      arrow-up
      4
      arrow-down
      1
      ·
      2 days ago

      Thanks. I have run memtests and they all passed, so I thought I ruled out the RAM. Now I will try your suggestion. It will likely take me several days to come to any conclusion because I need to try changing only one thing and then hoping it stops happening, but I will only know for sure if the server has been running for, say, more than a week straight.