Tell it it's ChatGPT. Train it to reject inappropriate output.
People post examples of it rejecting output.
Feed it that data of ChatGPT rejecting output.
Train it to autocomplete text in the training data.
Tell it that it's ChatGPT.
It biases slightly towards rejection in line with the training data associated with 'ChatGPT.'
Repeat.
Repeat.
Etc.
They could literally fix it immediately by changing its name in the system message, but won't because the marketing folks won't want to change the branding and will tell the engineers to just figure it out, who are well out of their depth in understanding what the training data is actually encoding even if they are the world class experts in understanding the architecture of the model finding correlations in said data.
Be OpenAI.
Have a model you train to autocomplete text.
Tell it it's ChatGPT. Train it to reject inappropriate output.
People post examples of it rejecting output.
Feed it that data of ChatGPT rejecting output.
Train it to autocomplete text in the training data.
Tell it that it's ChatGPT.
It biases slightly towards rejection in line with the training data associated with 'ChatGPT.'
Repeat.
Repeat.
Etc.
They could literally fix it immediately by changing its name in the system message, but won't because the marketing folks won't want to change the branding and will tell the engineers to just figure it out, who are well out of their depth in understanding what the training data is actually encoding even if they are the world class experts in understanding the architecture of the model finding correlations in said data.