Hacker News new | past | comments | ask | show | jobs | submit login

I've lead myself to believe that long responses are actually beneficial for the quality of the responses, as processing and producing tokens are the only time when LLMs get to "think".

In particular, requesting an analysis of the problem first before jumping to conclusions can be more effective than just asking for the final answer directly.

However, this analysis phase, or similar one, could just be done hidden in the background, but I don't think any are doing that yet. From the user point of view that would be just waiting, and from API point of view those tokens would also cost. Might just as well entertain the user with the text it processes in the meanwhile.




My understanding is this used to be the case[1] but isn't really true any longer due to things like the "star" method for model training[2]. Empirically it absolutely (circa GPT3) used to be the case that if you prompted with "Explain all your reasoning step by step and then give the answer at the end" or similar it would give you a better answer for a complex question than if you said "Just give me the answer and nothing else" or similar, or asked for the answer first, and then circa gpt-4 answers started getting much longer even if you asked the model to be concise.

That doesn't seem to be the case any more and there has been speculation this is down to the star method being used for training newer models. I say speculation because I don't believe people have come out and said they are using star for training. OpenAI referred to Q* somewhere but they wouldn't be drawn on whether that * is this "star" and although google were involved in publishing the star paper they haven't said gemini uses it (I don't think).

[1] https://arxiv.org/abs/2201.11903

[2] https://arxiv.org/pdf/2203.14465




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: