Hacker News new | past | comments | ask | show | jobs | submit login

I *had* to get myself out of lurking mode to reply specifically to you; this issue seems widespread for 1st-gens Ryzen. I see your chipset also is close enough to mine (X370), and I felt a strong "déjà vu" by reading your freezing symptoms.

I reused my now old X370 Ryzen build to run TrueNAS Scale (Based on Debian), and have hard lockups like yours.

My personal notes on the subject seems to stabilize things a bit but not completely, and it's a mixture of BIOS Settings Tweaks and Kernel boot parameters that seems to help partially. Things I tried/applied with varying degree of success:

- Disabling Cool&Quiet

- Disabling C-States

- Gear Down Mode: Disabled

- Power Down Mode: Disabled

- VDDSCR_SOC: Offset by +0.00625v (seemed to stabilize things on Windows)

- Someone in the kernel bugreport mentionned the need to power off (as opposed to just reset) so all the BIOS Settings are applied correctly (didn't try it myself yet)

See those links for more infos:

- https://bugzilla.kernel.org/show_bug.cgi?id=196683 (a very long bugreport thread, people commented lots of things they tried to stabilize their build along with kernel parameters ideas)

- https://gist.github.com/diracs-delta/876d74d030f80dc899fc58a...

- https://web.archive.org/web/20201020144021/https://www.truen... (linked from Archive.org as TrueNAS WAS specifically mentionning Ryzen stability in the first paragraphs of this page)

Good luck; and if you ever found how to get rid completely of those freezes, let me know :)

(edit: formatting)




Thank you for emerging from lurk mode for me :) I will try those things.


I found some of the settings but not all of them, but here’s the ones I found, and changed now:

- Global C-state Control: Auto -> Disabled

- AMD Cool’n’Quiet: Auto -> Disabled

- CPU NB/SoC Voltage: Auto -> Offset Mode; CPU NB/SoC Offset Mode Mark: +; CPU NB/SoC Offset Voltage: Auto

(The offset value can only be auto with my machine, not a custom value it seems.)

Only ones I couldn’t find were Gear Down Mode and Power Down Mode.

Clicked save and exit in the UEFI.

Then I powered down the machine and even flipped the on-off button of the PSU to off and let it stay off for ~20 seconds for good measure. Then turned the PSU back on and then powered the machine back on.

Currently reading the bug report thread and will try some of those things as well.


Now I've read a bit of those links and also read a bit of the following other links:

- https://utcc.utoronto.ca/~cks/space/blog/linux/KernelRcuNocb...

- https://access.redhat.com/documentation/en-us/red_hat_enterp...

- https://help.ubuntu.com/community/Grub2/Setup

And I've changed the following line in my /etc/default/grub from:

    GRUB_CMDLINE_LINUX=""
to

    GRUB_CMDLINE_LINUX="rcu_nocbs=0-15 processor.max_cstate=5"
since my CPU has 16 threads. And I've saved it and have run

    sudo update-grub
Now I'm about to reboot the computer and then hopefully it will be more stable from now on :)

Thanks again for the help bilange.


Having now turned the computer back on I've also confirmed that these flags are indeed now being passed to the kernel when it is booted, as seen in the output of

    cat /proc/cmdline
which shows the following:

    BOOT_IMAGE=/boot/vmlinuz-5.11.0-38-generic root=UUID=4dcba509-efff-4ccc-a099-f919240c767c ro rcu_nocbs=0-15 processor.max_cstate=5 quiet splash vt.handoff=7
And that's the "rcu_nocbs=0-15 processor.max_cstate=5" we added to our GRUB2 config shown right inside of there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: