Is there an agent framework that lives up to the hype? Where you specify a top-l...

danenania · 2024-08-06T21:00:53 1722978053

> Where you specify a top-level objective, it plans out those objectives, it selects a completion metric so that it knows when to finish, and iterates/reiterates over the output until completion?

I built Plandex[1], which works roughly like this. The goal (so far) is not to take you from an initial prompt to a 100% working solution in one go, but to provide tools that help you iterate your way to a 90-95% solution. You can then fill in the gaps yourself.

I think the idea of a fully autonomous AI engineer is currently mostly hype. Making that the target is good for marketing, but in practice it leads to lots of useless tire-spinning and wasted tokens. It's not a good idea, for example, to have the LLM try to debug its own output by default. It might, on a case-by-case basis, be a good idea to feed an error back to the LLM, but just as often it will be faster for the developer to do the debugging themselves.

1 - https://plandex.ai

aantix · 2024-08-07T00:11:15 1722989475

This looked very promising.

Although, it's now prompting me to make an account when I issue `plandex new`?

None of the video demos show this requirement.

I think the demos should show this requirement. Or the Quickstart docs should directly link to the self-hosted instructions.

"? Hey there! It looks like this is your first time using Plandex on this computer.

What would you like to do?

> Start an anonymous trial on Plandex Cloud (no email required)

  Sign in, accept an invite, or create an account"****

danenania · 2024-08-07T00:53:28 1722992008

Thanks for the feedback. The cloud option is offered as a way to get started as quickly as possible, but self-hosting is straightforward too: https://docs.plandex.ai/hosting/self-hosting

aantix · 2024-08-07T01:13:48 1722993228

Plandex and Github's Copilot Workspace are very similar. You could draw inspiration from each other.

I tweeted about Copilot Workspace the other day.

https://twitter.com/aantix/status/1819794837375263228

simonw · 2024-08-06T20:38:10 1722976690

To this date, ChatGPT Code Interpreter is still the most impressive implementation of this pattern that I've seen.

Give it a task, it writes code, runs the code, gets errors, fixes bugs, tries again generally until it succeeds.

That's over a year old at this point, and it's not clear to me if it counts as an "agent" by many people's definitions (which are often frustratingly vague).

alsima · 2024-08-06T20:40:19 1722976819

Well, you have Cognition AI and Devin that became a recent unicorn startup (partnerships with Microsoft and stuff) but true, I can't think of an agent that actually lives up to the hype (heard Devin wasn't great).