Is there an agent framework that lives up to the hype?
Where you specify a top-level objective, it plans out those objectives, it selects a completion metric so that it knows when to finish, and iterates/reiterates over the output until completion?
> Where you specify a top-level objective, it plans out those objectives, it selects a completion metric so that it knows when to finish, and iterates/reiterates over the output until completion?
I built Plandex[1], which works roughly like this. The goal (so far) is not to take you from an initial prompt to a 100% working solution in one go, but to provide tools that help you iterate your way to a 90-95% solution. You can then fill in the gaps yourself.
I think the idea of a fully autonomous AI engineer is currently mostly hype. Making that the target is good for marketing, but in practice it leads to lots of useless tire-spinning and wasted tokens. It's not a good idea, for example, to have the LLM try to debug its own output by default. It might, on a case-by-case basis, be a good idea to feed an error back to the LLM, but just as often it will be faster for the developer to do the debugging themselves.
Thanks for the feedback. The cloud option is offered as a way to get started as quickly as possible, but self-hosting is straightforward too: https://docs.plandex.ai/hosting/self-hosting
To this date, ChatGPT Code Interpreter is still the most impressive implementation of this pattern that I've seen.
Give it a task, it writes code, runs the code, gets errors, fixes bugs, tries again generally until it succeeds.
That's over a year old at this point, and it's not clear to me if it counts as an "agent" by many people's definitions (which are often frustratingly vague).
Well, you have Cognition AI and Devin that became a recent unicorn startup (partnerships with Microsoft and stuff) but true, I can't think of an agent that actually lives up to the hype (heard Devin wasn't great).
Where you specify a top-level objective, it plans out those objectives, it selects a completion metric so that it knows when to finish, and iterates/reiterates over the output until completion?