Hacker News new | past | comments | ask | show | jobs | submit login

You can do inference pretty easily with 8-bit fixed point weights. Now attempt doing the same during training.

Training and inference are only similar at a high level, not in actual application.




... because the gradient that is being followed may have a lower magnitude than can be represented in the lower precision.


You also need a few other operations for training, such as transpose, which may or may not be fast in a particular implementation.

(ETA: In case it's not obvious, I'm agreeing with david-gpu's comment, and adding more reasons that training currently differs from inference.)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: