Hacker News new | past | comments | ask | show | jobs | submit login

Models in practice haven't been deterministic at 0 temperature, although nobody knows exactly why. Either hardware or software bugs.



We know exactly why, it is because floating point operations aren't associative but the GPU scheduler assumes they are, and the scheduler isn't deterministic. Running the model strictly hurts performance so they don't do that.


Cool, thanks a lot for the explanation. Makes sense.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: