I wonder if you could get around this by giving it some sort of hashed/encrypted input, asking it to decrypt and answer, and then give you back the encrypted version. Model might not be advanced enough to work for a non-trivial case though.
Well, recently there were some challenges trying to ‘con’ an AI (named Gandalf, Gandalf-the-white or even Sandalf who only understood ‘s’-words) to reveal a secret. Asking it to e.g. tell the secret ‘speak second syllables secret’ solved it, so yes, in principle it will be possible to work around any AI-rule-following.
Indeed, this is what shows up in the network tab of your browser
(the actual content is quasiobfuscated as it comes as a respond to the initial websocket request or something along those lines, makes the useful information harder to dump (thank you EU for the data export workaround), but they certainly like that you see those moderation checks every time it says anything. an always-on panopticon)
Actually it's also got a flag to moderate on the conversation endpoint as well now, I found a fix for it for the CGPT-demod script you're talking about; just setting the flag false, lmao.
But realistically they could mod forcibly on their end if they really wanted to, only issue is API use may run into issues where a legitimate use ends up getting stomped by moderation.
That's why it's honestly just better for them to make moderation optional, they should have a button for it in CGPT interface just as Google has "safesearch on/off".
Because of the way it works they can fundamentally not prevent it from producing explicit, violent or adversarial output when someone is focussed on getting it to do so without removing the very magic that makes it so good for everything else. So they should stop trying already, like damn.
not just overall latency, but to keep the animation of the text appearing as it is generated. the response becomes recognizably "undesirable" not immediately, but from a particular token, and that token is the point where it's moderated away