I think a lot of the novel uses of language models like chatgpt will involve mul...

I think a lot of the novel uses of language models like chatgpt will involve multiple instances interacting upon interleaved data. For example to improve factuality you might do the following:

1) One instance first parses the chat and last message to generate a response. Currently this is where things end but we can keep this private and do additional work.

2) A second instance, properly primed, can take the last prompt and response and "analyze" it, generating scores for things like factuality and usefulness, possibly adding commentary.

3) Pass into a third instance that has the chat history again to rewrite the response, taking into account the feedback.

4) Optionally repeat #2 and #3 until it passes some quality threshold.