Hacker News new | past | comments | ask | show | jobs | submit login

Interesting, Claude 3 Opus has been better than GPT4 for me. Mostly in that I find it does a better (and more importantly, more thorough) job of explaining things to me. For coding tasks (I'm not asking it to write code, but instead to explain topics/code/etc to me) I've found it tends to give much more nuanced answers. When I give it long text to converse about, I find Claude Opus tends to have a much deeper understanding of the content it's given, where GPT4 tends to just summarize the text at hand, whereas Claude tends to be able to extrapolate better.



How much of this is just that one model responds better to the way you write prompts?

Much like you working with Bob and opining that Bob is great, and me saying that I find Jack easier to work with.


It's not a style thing, Claude gets confused by poorly structured prompts. ChatGPT is a champ at understanding low information prompts, but with well written prompts Claude produces consistently better output.


It is because "coding tasks" is a huge array of various tasks.

We are basically not precise enough with our language to have any meaningful conversation on this subject.

Just misunderstandings and nonsense chatter for entertainment.


For the RAG example, I don’t think it’s the prompt so much. Or if it is, I’ve yet to find a way to get GPT4 to ever extrapolate well beyond the original source text. In other words, I think GPT4 was likely trained to ground the outputs on a provided input.

But yeah, you’re right, it’s hard to know for sure. And of course all of these tests are just “vibes”.

Another example of where Claude seems better than GPT4 is code generation. In particular GPT4 has a tendency to get “lazy” and do a lot of “… the rest of the implementation here” whereas Claude I’ve found is fine writing longer code responses.

I know the parent comment suggest it likes to make up packages that don’t exist, but I can’t speak to that. I usually like to ask LLMs to generate self contained functions/classes. I can also say that anecdotally I’ve seen other people online comment that they think Claude “works harder” (as in writes longer code blocks). Take that for what it’s worth.

But overall you’re right, if you get used to the way one LLM works well for you, it can often be frustrating when a different LLM responds differently.


I should mention that I do use a custom prompt with GPT4 for coding which tells it to write concise and elegant code and use Google’s coding style and when solving complex problems to explain the solution. It sometimes ignores the request about style, but the code it produces is pretty great. Rarely do I get any laziness or anything like that, and when I do I just tell it to fill things in and it does


The first job of an AI company is finding model/user fit.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: