I was impressed by Microsoft’s AICI where the idea is a WASM program can choose the next tokens. And relatedly their Guidance[1] framework which can use CFGs and programs for local inference to even speed it up with context aware token filling. I hope this implies API-based LLMs may be moving in a similar direction.
[1] https://github.com/guidance-ai/guidance