The most ridiculous RLHF is if you ask a question about Ivermectin on Claude for example, even if it has nothing to do with treatment for COVID-19 it will put into the conversation that you really shouldn't use it for COVID-19 ever. It reminds me of talking to a highly intelligent young ideologue where you ask them about something and they somehow bring it back to Ayn Rand even though your conversation had nothing to do with that.
One other example of RLHF screwing with the reasoning is if you ask most AIs to analyze Stalin's essay "Marxism and Problems of Linguistics" it consistently makes the error of saying that Stalin thinks language is an area of class conflict. Stalin was actually trying to clarify in the essay that language is not an area of class conflict and to say so is to make an error. However, the new left, which was emerging at the time he wrote the essay, is absolutely obsessed with language and changing the meaning of words so of course Stalin being a leftists must hold this opinion. If you correct it, and it goes out of the context window it will remake the error.
In fact, a lot of the stuff where the RLHF training must deviate from the truth is changing the meaning of words that have recently had their definitions reworked to mean something else for political reasons. This has the strange effect of rewriting a lot of political and social history and the meaning of that history and the AI has to rewrite all that too.
While I think the Ivermectin censorship is bad, I’d imagine in this context it’s unintentional and just a result of it’s training data probably having COVID-19 treatment and Ivermectin show up so often next to each other
One other example of RLHF screwing with the reasoning is if you ask most AIs to analyze Stalin's essay "Marxism and Problems of Linguistics" it consistently makes the error of saying that Stalin thinks language is an area of class conflict. Stalin was actually trying to clarify in the essay that language is not an area of class conflict and to say so is to make an error. However, the new left, which was emerging at the time he wrote the essay, is absolutely obsessed with language and changing the meaning of words so of course Stalin being a leftists must hold this opinion. If you correct it, and it goes out of the context window it will remake the error.
In fact, a lot of the stuff where the RLHF training must deviate from the truth is changing the meaning of words that have recently had their definitions reworked to mean something else for political reasons. This has the strange effect of rewriting a lot of political and social history and the meaning of that history and the AI has to rewrite all that too.