More ELI5 than the other comments. Considering the softmax network: During quant...

sampo · on July 25, 2023

> During quantization we find that values in the network vary from 0->5000, but 95% of values are <100. Quantizing this to 8bits would mean that our values would be in increments of about 20.

Instead of using an 8bit integer with even step size quantification, wouldn't they still use an 8bit float?

zamalek · on July 25, 2023

Possibly, it depends on the distribution of the vales. It would also make my examples far less straightforward :)

Either way you would still only have 256 discrete values.

TimPC · on July 25, 2023

No one quantizes blindly without accounting for data. If 95% of your values are in 0-100 you’ll probably do something like have 20 values for 0-100 and the remaining 12 for 101-5000. You don’t have to apply a uniform distribution and shouldn’t when your data is that concentrated.

zamalek · on July 25, 2023

Third paragraph.

samwillis · on July 25, 2023

If I'm following correctly, does this mean that with this change along with a model being quantized, we could see models that are 5% the size (on file system) and memory usage but almost identical in output?

zamalek · on July 25, 2023

The vales are selected were arbitrary. The size reduction will be 32bits/8bits - so it will be 4 times smaller.