Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Models are predictable at 0 temperatures. They might have tested the output beforehand.


Models in practice haven't been deterministic at 0 temperature, although nobody knows exactly why. Either hardware or software bugs.


We know exactly why, it is because floating point operations aren't associative but the GPU scheduler assumes they are, and the scheduler isn't deterministic. Running the model strictly hurts performance so they don't do that.


Cool, thanks a lot for the explanation. Makes sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: