recently there has been this problem that has been getting more frequent, my computer just randomly freezes up/blackscreens and then fails to post when i do a hard restart. this doesn’t resolve itself until after i open it up and play musical chairs with the ram for a bit.

shit that i have tried:

  1. swapped the ram around to different slots. sometimes it works, sometimes it doesn’t
  2. cleaned out the case
  3. wd40’d the ram pins (helped with the posting but seems to have increased crash frequency, not enough data to tell for sure)

no idea where to begin with this one, can’t tell if it’s a motherboard or a ram issue or something else entirely. the sticks are of differing sizes and manufacture so that may also be an issue. would give specs but the thing just died on me in the middle of posting this and i can’t boot in just yet. motherboard is a supermicro x9 something server board.

  • AssholeDestroyer@lemmy.ml
    link
    fedilink
    English
    arrow-up
    13
    ·
    8 months ago

    Don’t put WD-40 on the pins. I’d start by pulling out the sicks and cleaning the pins off with a q-tip and iso alcohol. Probably a good idea to clean out the slots now too.

    Get Memtest64 and run it with both sticks. If it fails try it with each one by itself. If a stick doesn’t past the test you should be able to get a new one under warrenty. Just start an RMA request and say it failed memtest64.

    If its not your ram then its probably a poorly seated CPU. Remove the cooler, clean the paste off and carefully put the cooler back on without over tightening it, or tightening one side more than the other.

    • meth_dragon [none/use name]@hexbear.netOP
      link
      fedilink
      English
      arrow-up
      7
      ·
      8 months ago

      cleaning the pins off with a q-tip and iso alcohol

      i tried this at the beginning, things didn’t noticeably improve so i took it to a local shop and they gave me the wd40 treatment. will try again

      probably a poorly seated CPU

      inshallah please let this be it

        • Quasari@programming.dev
          link
          fedilink
          English
          arrow-up
          5
          ·
          8 months ago

          WD40 isn’t a lubricant, it’s for “Water Displacement.” While as a liquid it can be used as one, it is a poor one. It’s whole purpose is to cover a metal part with a hydrophobic layer. It’s good at removing water from something like your sparkplugs. Maybe they thought water had gotten in and was causing issues with contact?

        • GrouchyGrouse [he/him]@hexbear.net
          link
          fedilink
          English
          arrow-up
          3
          ·
          8 months ago

          Seconding this. Get some 90% isopropyl and clean off all that WD-40. Let it fully dry/evaporate. The only thing you should spray on your computer parts is compressed air.

  • Maoo [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    6
    ·
    8 months ago

    A few things to try out in addition to other folks’ good suggestions:

    • when it happens, after a hard shutdown, unplug the power cable, press the power button to discharge anything remaining, and then plug it back in and start. See if it consistently posts after you do this. This would indicate that a component is breaking itself but resets to a temporarily working state after a proper power cycle.

    • monitor temperatures. Log them to file if possible. Overheating components might explain why workarounds only work sometimes. Maybe some of them just let the components cool down enough.

    • just leave in one stick at a time and see how it goes. You can try to narrow down whether it’s a stick or a spot that’s broken by trying different slots with 1 stick and different sticks in the same spot.

    • Not posting can look like a few things. Is it possible it’s the video card / output breaking?

    • meth_dragon [none/use name]@hexbear.netOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      8 months ago
      1. i’ve been doing this when testing each individual stick of ram, there is no real pattern, but some stick/slot combinations are more consistent than others.

      2. will try this when i get the thing to turn on.

      3. see 1

      4. how would i test/fix this? nvidia-smi was fine last i checked. would this have any correlation with the ram issues?

      • Maoo [none/use name]@hexbear.net
        link
        fedilink
        English
        arrow-up
        3
        ·
        8 months ago

        If you’ve tested each stick all by itself (no others plugged in) in a few different slots and all of them have this issue, that suggests that it’s not the sticks and possibly not the slots either. If it were one of those two options you’d expect to be able to find one stable single stick + slot option, as you’d think that only one would break at a time. One stick breaking or one slot (or single pair of slots).

        For your graphics card, do you also have an integrated one in the CPU? If so, I’d remove your discrete card and see if it’s more stable. You’d need to switch your monitor cable to a different receptacle, of course. If that’s not an option, I’d come up with ways to “ping” your computer under the assumption that maybe it is posting and working but just not showing you anything. You could set up an ssh server or similar and auto-login and see whether you can still get in after one of these incidents and a hard reset

        The inconsistency of the memory issue makes new think it isn’t memory (no single stick at a time is stable in any slot, right?). I’d start removing more components to see if any minimal set is stable.

  • trompete [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    4
    ·
    8 months ago

    This could be any component, including MB, CPU, GPU, power supply. This could be damage that temporarily fixes itself once the thing cools down again. You’ll want to remove as many components as possible, and swap out the rest with alternatives, or swap your components into another computer. Maybe you know someone you can visit to swap stuff out with?

    Also, have you tried running memcheck86 on the RAM? There’s also other diagnostic software for other components.

    Just running a stress test like a benchmark might reliably trigger the problem, so you have a reproducible way of triggering the issue instead of just waiting for it to happen.

    • meth_dragon [none/use name]@hexbear.netOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      8 months ago

      i was on 6 sticks, i think i have narrowed the candidates down to 3 stable sticks, 2 unstable, and 1 definitely busted

      problem is the stable sticks only work in certain slots and even then uptime is not great

      one of the unstable sticks is brand new, makes me think that it got destroyed by being in one of the bad slots

      a big problem is that i have 16 slots for ram and it’s a total pain in the ass to test all of them

      • Feinsteins_Ghost [he/him]@hexbear.net
        link
        fedilink
        English
        arrow-up
        3
        ·
        8 months ago

        if sticks of ram are only working in certain slots its entirely possible the IC that controls the ram is shot.

        Recently had this happen on an old dual xeon setup, rendered half of my 192GB of ram unusable and was causing problems exactly like what you’re describing.

        Does the mobo show the sticks as inserted upon bootup?

  • LanyrdSkynrd [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    8 months ago

    Are you sure it’s the ram? I had a bad motherboard that did basically the same thing. I was swapping pci cards around and it would eventually work. Turned out it was the flexing of the motherboard that got it working again.