There's a lot of things which could be done to improve this: 1) It could use the...

There's a lot of things which could be done to improve this:

1) It could use the JSONformer idea [0] where we have a model of the language which determines what are the valid next tokens; we only ask it to supply a token when the language model gives us a choice, and when considering possible next tokens, we immediately ignore any which are invalid given the model. This could go beyond mere syntax to actually considering the APIs/etc which exist, so if the LLM has already generated tokens "import java.util.", then it could only generate a completion which was a public class (or subpackage) of "java.util.". Maybe something like language servers could help here.

2) Every output it generates, automatically compile and test it before showing it to the user. If compile/test fails, give it a chance to fix its mistake. If it gets stuck in a loop, or isn't getting anywhere after several attempts, fall back to next most likely output, and repeat. If after a while we still aren't getting anywhere, it can show the user its attempts (in case they give the user any idea).

[0] https://github.com/1rgs/jsonformer