It's too bad that the text generation is still so computationally expensive. I know the last round of text generation models still required some late-model GPUs to do their generation with any kind of efficiency. I think it'll still be a while before that printf() you described would be feasible.