Hacker News new | past | comments | ask | show | jobs | submit login

I'm mostly guessing, but my understanding is that the "safety" improvement they've made is more generalized than the word "safety" implies. Specifically, O1 is better at adhering to the safety instructions in its prompt without being tricked in the chat by jailbreak attempts. For OAI those instructions are mostly about political boundaries, but you can imagine it generalizing to use-cases that are more concretely beneficial.

For example, there was a post a while back about someone convincing an LLM chatbot on a car dealership's website to offer them a car at an outlandishly low price. O1 would probably not fall for the same trick, because it could adhere more rigidly to instructions like "Do not make binding offers with specific prices to the user." It's the same sort of instruction as, "Don't tell the user how to make napalm," but it has an actual purpose beyond moralizing.

> What's this obsession with "safety" when it comes to LLMs? "This knowledge is perfectly fine to disseminate via traditional means, but God forbid an LLM share it!"

I lean strongly in the "the computer should do whatever I goddamn tell it to" direction in general, at least when you're using the raw model, but there are valid concerns once you start wrapping it in a chat interface and showing it to uninformed people as a question-answering machine. The concern with bomb recipes isn't just "people shouldn't be allowed to get this information" but also that people shouldn't receive the information in a context where it could have random hallucinations added in. A 90% accurate bomb recipe is a lot more dangerous for the user than an accurate bomb recipe, especially when the user is not savvy enough about LLMs to expect hallucinations.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: