Hacker News new | past | comments | ask | show | jobs | submit login

Tangentially the model seems to be trained in an unprofessional mode, using many filler words like 'okay' 'hmm' maybe it's done to sound cute or approachable but I find it highly annoying

or is this how the model learns to talk through reinforcement learning and they didn't fix it with supervised reinforcement learning






I’m sure I’ve seen this technique in chain of thought before, where the model is instructed about certain patterns of thinking: “Hmm, that doesn’t seem quite right”, “Okay, now what?”, “But…”, to help it identify when reasoning is going down the wrong path. Which apparently increased the accuracy. It’s possible these filler words aren’t unprofessional but are in fact useful.

If anyone can find a source for that I’d love to see it, I tried to search but couldn’t find the right keywords.


I remember reading a paper that showed that giving models even a a few filler tokens before requiring a single phrase/word/number answer significantly increasee accuracy. This is probably similar.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: