But the problem is that the tokens are subwords, which means that if you simply disallowed tokens with es, you'd make it hard to complete a word given a prefix.
For example, it may start like this "This is a way to solv-", or "This is th-"
If I understand it correctly, that's a valid concern but the way structured generation library like outlines[1] work is that they can generate multiple variants of the inference (which they call beam search).
One beam could be "This is a way to solv-". With no obvious "good" next token.
Another beam could be "This way is solv-". With "ing" as the obvious next token.
Yes, that would probably work quite well, given enough training data. However, I interpreted the question/claim as a task that LLMs excell at, meaning that writing text while avoiding a certain character is a task for a general purpose LLM.
For example, it may start like this "This is a way to solv-", or "This is th-"