Hacker News new | past | comments | ask | show | jobs | submit login

GPT 4 pre nerf was terrible at reviewing non-trivial or non textbook code. I've decided to test it for a few weeks by checking stuff I caught in review or as bugs, to see if it would spot it. It was like 0% on first try (would always talk about something irrelevant) and after leading it with follow up questions it would figure out the problem half of the time and half of the time I'd just give up leading it.

These were tricky problems that were small scope - I've picked them so I could easily provide it to GPT for review.

So I doubt larger context window will do much.




It’s hard to tell why you ran into such a problem without seeing how you prompted but I can offer a few pointers. Use the OpenAI playground instead of chat, it allows you to specify the system prompt and edit the conversation before each submission. System prompt is good for providing general context, tools and options but you absolutely must provide a few example interactions in the conversation. Even just two prompt and response pairs will strongly influence the rest of the conversation. You can use that to shape the responses however you like and it focuses the model on the task at hand. If you get a bad response, delete or edit it. Bad examples beget more bad responses.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: