What usecases are you using it for? I mostly use it for generating tests, making...

What usecases are you using it for?

I mostly use it for generating tests, making documentation, refactoring, code snippets, etc. I use it daily for work along with copilot/x.

In my experience GPT3.5turbo is... rather dumb in comparison. It makes a comment explaining what a method is going to do and what arguments it will have - then misses arguments altogether. It feels like it has poor memory (and we're talking relatively short code snippets, nothing remotely near it's context length).

And I don't mean small mistakes - I mean it will say it will do something with several steps, then just miss entire steps.

GPT3.5turbo is reliably unreliable for me, requiring large changes and constant "rerolls".

GPT3.5turbo also has difficulty following the "style/template" from both the prompt and it's own response. It'll be consistent then just - change. An example being how it uses bullet points in documentation.

Codex is generally better - but noticeably worse then GPT4 - it's decent as a "smart autocomplete" though. Not crazy useful for documentation.

Meanwhile GPT4 generally nails the results, occasionally needing a few tweaks, generally only with long/complex code/prompts.

tl;dr - In my experience for code GPT3.5turbo isn't even worth the time it takes to get a good result/fix the result. Codex can do some decent things. I just use GPT4 for anything more then autocomplete - it's so much more consistent.