Hacker News new | past | comments | ask | show | jobs | submit login

The activations are in (-1, 1), so they're also representable by (-1, 0, 1).



This is wrong. The paper described that their activation is in int8 during inference.

That being said, before-LLM-era deep learning already had low bit quantization down to 1w2f [0] working back in 2016 [1]. So it's certainly possible it would work for LLM too.

[0] 1-bit weights, 2-bit activations; though practically people deployed 2w4f instead. [1] https://arxiv.org/abs/1606.06160




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: