Hacker News new | past | comments | ask | show | jobs | submit login

There are now encoding even more efficient than VLQ, but they blue the line in between encodings and compression algorithms. Most propose to trade efficiency of encoding 7 bit chars for ability to squeeze few thousands common Chinese characters into 16 bits.

The idea I heard was that to always code in 4 bytes blocks, and use some form of delta encoding. Some variations allow for less than OlogN character position search. And given that you can feed 32 wide data into NEON/SSE, and block are always 32 bit aligned, you can have that working faster than UTF-8




Do you have a link to this 32-bit-aligned proposal?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: