Same here: I’m subscribed to all three top dogs in LLM space, and routinely issu...

Me1000 · 2024-04-13T21:22:06 1713043326

Interesting, Claude 3 Opus has been better than GPT4 for me. Mostly in that I find it does a better (and more importantly, more thorough) job of explaining things to me. For coding tasks (I'm not asking it to write code, but instead to explain topics/code/etc to me) I've found it tends to give much more nuanced answers. When I give it long text to converse about, I find Claude Opus tends to have a much deeper understanding of the content it's given, where GPT4 tends to just summarize the text at hand, whereas Claude tends to be able to extrapolate better.

robocat · 2024-04-13T22:06:48 1713046008

How much of this is just that one model responds better to the way you write prompts?

Much like you working with Bob and opining that Bob is great, and me saying that I find Jack easier to work with.

CuriouslyC · 2024-04-13T23:51:09 1713052269

It's not a style thing, Claude gets confused by poorly structured prompts. ChatGPT is a champ at understanding low information prompts, but with well written prompts Claude produces consistently better output.

setiverse · 2024-04-14T18:37:11 1713119831

It is because "coding tasks" is a huge array of various tasks.

We are basically not precise enough with our language to have any meaningful conversation on this subject.

Just misunderstandings and nonsense chatter for entertainment.

Me1000 · 2024-04-13T23:36:54 1713051414

For the RAG example, I don’t think it’s the prompt so much. Or if it is, I’ve yet to find a way to get GPT4 to ever extrapolate well beyond the original source text. In other words, I think GPT4 was likely trained to ground the outputs on a provided input.

But yeah, you’re right, it’s hard to know for sure. And of course all of these tests are just “vibes”.

Another example of where Claude seems better than GPT4 is code generation. In particular GPT4 has a tendency to get “lazy” and do a lot of “… the rest of the implementation here” whereas Claude I’ve found is fine writing longer code responses.

I know the parent comment suggest it likes to make up packages that don’t exist, but I can’t speak to that. I usually like to ask LLMs to generate self contained functions/classes. I can also say that anecdotally I’ve seen other people online comment that they think Claude “works harder” (as in writes longer code blocks). Take that for what it’s worth.

But overall you’re right, if you get used to the way one LLM works well for you, it can often be frustrating when a different LLM responds differently.

ein0p · 2024-04-14T00:08:22 1713053302

I should mention that I do use a custom prompt with GPT4 for coding which tells it to write concise and elegant code and use Google’s coding style and when solving complex problems to explain the solution. It sometimes ignores the request about style, but the code it produces is pretty great. Rarely do I get any laziness or anything like that, and when I do I just tell it to fill things in and it does

richardw · 2024-04-13T22:54:19 1713048859

The first job of an AI company is finding model/user fit.

CharlesW · 2024-04-13T18:16:24 1713032184

This was with Claude Opus, vs. one of the lesser variants? I really like Opus for English copy generation.

ein0p · 2024-04-13T18:39:48 1713033588

Opus, yes, the $20/mo version. I usually don’t generate copy. My use cases are code (both “serious” and “the nice to have code I wouldn’t bother writing otherwise”), learning how to do stuff in unfamiliar domains, and just learning unfamiliar things in general. It works well as a very patient teacher, especially if you already have some degree of familiarity with the problem domain. I do have to check it against primary sources, which is how I know the percentage of hallucinations is very low. For code, however I don’t even have to do that, since as a professional software engineer I am the “primary source”.

CuriouslyC · 2024-04-13T23:48:44 1713052124

GPT4 is better at responding to malformed, uninformative or poorly structured prompts. If you don't structure large prompts intelligently Claude can get confused about what you're asking for. That being said, with well formed prompts, Claude Opus tends to produce better output than GPT4. Claude is also more flexible and will provide longer answers, while ChatGPT/GPT4 tend to always sort of sound like themselves and produce short "stereotypical" answers.

sebastiennight · 2024-04-13T23:51:36 1713052296

> ChatGPT/GPT4 tend to always sort of sound like themselves

Yes I've found Claude to be capable of writing closer to the instructions in the prompt, whereas ChatGPT feels obligated to do the classic LLM end to each sentence, "comma, gerund, platitude", allowing us to easily recognize the text as a GPT output (see what I did there?)

cheema33 · 2024-04-14T11:22:45 1713093765

> It’s very one sided in favor of GPT4

My experience has been the opposite. I subscribe to multiple services as well and copy/paste the same question to all. For my software dev related questions, Claude Opus is so far ahead that I am thinking that it no longer is necessary to use GPT4.

For code samples I request, GPT4 produced code fails to even compile many times. That almost never happens for Claude.

thefourthchime · 2024-04-14T01:50:58 1713059458

Totally agree. I do the same and subscribe to all three, at least whenever our new version comes out

My new litmus test is “give me 10 quirky bars within 200 miles of Austin.”

This is incredibly difficult for all of them, gpt4 is kind of close, Claude just made shit up, Gemini shat itself.

meowtimemania · 2024-04-14T11:55:11 1713095711

Have you tried Poe.com? You can access all the major llm’s with one subscription