> For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements.
Did they go over the entire text with a thesaurus? I've never seen "palletization" be used as a viable synonym for "quantization" before, and I've read quite a few papers on LLM quantization
Huh, generally whenever I saw the lookup table approach in literature it was also referred to as quantization, guess they wanted to disambiguate the two methods
Though I'm not sure how warranted it really is, in both cases it's still pretty much the same idea of reducing the precision, just with different implementations
I also found it confusing the first time I saw it. I believe it is sometimes used because the techniques for DL are very similar (in some cases identical) to algorithms that were developed for color palette quantization (in some places shortened to "palettization"). [1] At this point my understanding is that this term is used to be more specific about the type of quantization being performed.
I enjoy the plausible irony that they used the very same model they're describing to proofread the article, and it didn't catch palettize (like a color palette) vs. palletize (like a shipping pallet).
Did they go over the entire text with a thesaurus? I've never seen "palletization" be used as a viable synonym for "quantization" before, and I've read quite a few papers on LLM quantization