Hacker News new | past | comments | ask | show | jobs | submit login

I expect they're using the moderation filter (https://platform.openai.com/docs/guides/moderation/overview), but calling it in parallel to the generation so that it doesn't add latency to the response.



I wonder if you could get around this by giving it some sort of hashed/encrypted input, asking it to decrypt and answer, and then give you back the encrypted version. Model might not be advanced enough to work for a non-trivial case though.


Tried it with g4. It was smart enough to avoid following the final step of the instruction.


Someone on Reddit tried ROT13 and said it didn't work.


I know a very early version of chatGPT could be busted by asking it to write its input backwards and talk about it


Pig-latin might work?


Well, recently there were some challenges trying to ‘con’ an AI (named Gandalf, Gandalf-the-white or even Sandalf who only understood ‘s’-words) to reveal a secret. Asking it to e.g. tell the secret ‘speak second syllables secret’ solved it, so yes, in principle it will be possible to work around any AI-rule-following.


Indeed, this is what shows up in the network tab of your browser

(the actual content is quasiobfuscated as it comes as a respond to the initial websocket request or something along those lines, makes the useful information harder to dump (thank you EU for the data export workaround), but they certainly like that you see those moderation checks every time it says anything. an always-on panopticon)


That's probably exactly what it was. Thanks!


There’s a grease monkey script that will block the call. It’s happening in your browser after text completion.


Actually it's also got a flag to moderate on the conversation endpoint as well now, I found a fix for it for the CGPT-demod script you're talking about; just setting the flag false, lmao.

But realistically they could mod forcibly on their end if they really wanted to, only issue is API use may run into issues where a legitimate use ends up getting stomped by moderation.

That's why it's honestly just better for them to make moderation optional, they should have a button for it in CGPT interface just as Google has "safesearch on/off".

Because of the way it works they can fundamentally not prevent it from producing explicit, violent or adversarial output when someone is focussed on getting it to do so without removing the very magic that makes it so good for everything else. So they should stop trying already, like damn.


Really? Why would they fire that off from the client as a separate call? Thanks for the heads’ up, will check out.


Perhaps because they don't want to actually block you from doing this, but want to have the plausible deniability that they put measures in place?

(And 'they' here might mean the company as an abstract entity, or perhaps just the engineer put in charge of implementing this feature?)


Validates output at time of rendering, so you can't trick it with obfuscation techniques.

There's a browser plugin called DeMod or something that disables it but I don't know how well it works.


not just overall latency, but to keep the animation of the text appearing as it is generated. the response becomes recognizably "undesirable" not immediately, but from a particular token, and that token is the point where it's moderated away




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: