> For on-device inference, we use low-bit palletization, a critical optimization...

bagrow · on June 11, 2024

https://apple.github.io/coremltools/docs-guides/source/palet...

miven · on June 11, 2024

Huh, generally whenever I saw the lookup table approach in literature it was also referred to as quantization, guess they wanted to disambiguate the two methods

Though I'm not sure how warranted it really is, in both cases it's still pretty much the same idea of reducing the precision, just with different implementations

Edit: they even refer to it as LUT quantization on another page: https://apple.github.io/coremltools/docs-guides/source/quant...

astrange · on June 11, 2024

Just "quantization" is poor wording for that. Quantization means dropping the low bits.

Sounds like it was confused with "vector quantization" which does involve lookup tables (codebooks). But "palletization" is fine too.

fudged71 · on June 11, 2024

miven · on June 11, 2024

Yeah, it just got updated, here's the new link, they added sections on block-wise quantization for both the rounding-based and LUT-based approach: https://apple.github.io/coremltools/docs-guides/source/opt-p...

elcritch · on June 11, 2024

Huh, it’s PNG for AI weights.

cgearhart · on June 11, 2024

I also found it confusing the first time I saw it. I believe it is sometimes used because the techniques for DL are very similar (in some cases identical) to algorithms that were developed for color palette quantization (in some places shortened to "palettization"). [1] At this point my understanding is that this term is used to be more specific about the type of quantization being performed.

https://en.wikipedia.org/wiki/Color_quantization

dialup_sounds · on June 11, 2024

I enjoy the plausible irony that they used the very same model they're describing to proofread the article, and it didn't catch palettize (like a color palette) vs. palletize (like a shipping pallet).