I expect they're using the moderation filter (https://platform.openai.com/docs/g...

paddw · on June 7, 2023

I wonder if you could get around this by giving it some sort of hashed/encrypted input, asking it to decrypt and answer, and then give you back the encrypted version. Model might not be advanced enough to work for a non-trivial case though.

behnamoh · on June 8, 2023

Tried it with g4. It was smart enough to avoid following the final step of the instruction.

jstarfish · on June 8, 2023

Someone on Reddit tried ROT13 and said it didn't work.

l33t233372 · on June 8, 2023

I know a very early version of chatGPT could be busted by asking it to write its input backwards and talk about it

eru · on June 8, 2023

Pig-latin might work?

plank · on June 8, 2023

Well, recently there were some challenges trying to ‘con’ an AI (named Gandalf, Gandalf-the-white or even Sandalf who only understood ‘s’-words) to reveal a secret. Asking it to e.g. tell the secret ‘speak second syllables secret’ solved it, so yes, in principle it will be possible to work around any AI-rule-following.

dontupvoteme · on June 7, 2023

Indeed, this is what shows up in the network tab of your browser

(the actual content is quasiobfuscated as it comes as a respond to the initial websocket request or something along those lines, makes the useful information harder to dump (thank you EU for the data export workaround), but they certainly like that you see those moderation checks every time it says anything. an always-on panopticon)

technothrasher · on June 7, 2023

That's probably exactly what it was. Thanks!

elemos · on June 7, 2023

There’s a grease monkey script that will block the call. It’s happening in your browser after text completion.

fennecfoxy · on June 8, 2023

Actually it's also got a flag to moderate on the conversation endpoint as well now, I found a fix for it for the CGPT-demod script you're talking about; just setting the flag false, lmao.

But realistically they could mod forcibly on their end if they really wanted to, only issue is API use may run into issues where a legitimate use ends up getting stomped by moderation.

That's why it's honestly just better for them to make moderation optional, they should have a button for it in CGPT interface just as Google has "safesearch on/off".

Because of the way it works they can fundamentally not prevent it from producing explicit, violent or adversarial output when someone is focussed on getting it to do so without removing the very magic that makes it so good for everything else. So they should stop trying already, like damn.

58x14 · on June 7, 2023

Really? Why would they fire that off from the client as a separate call? Thanks for the heads’ up, will check out.

eru · on June 8, 2023

Perhaps because they don't want to actually block you from doing this, but want to have the plausible deniability that they put measures in place?

(And 'they' here might mean the company as an abstract entity, or perhaps just the engineer put in charge of implementing this feature?)

jstarfish · on June 8, 2023

Validates output at time of rendering, so you can't trick it with obfuscation techniques.

There's a browser plugin called DeMod or something that disables it but I don't know how well it works.

valyagolev · on June 8, 2023

not just overall latency, but to keep the animation of the text appearing as it is generated. the response becomes recognizably "undesirable" not immediately, but from a particular token, and that token is the point where it's moderated away