Hacker News new | past | comments | ask | show | jobs | submit login

Is there an agent framework that lives up to the hype?

Where you specify a top-level objective, it plans out those objectives, it selects a completion metric so that it knows when to finish, and iterates/reiterates over the output until completion?




> Where you specify a top-level objective, it plans out those objectives, it selects a completion metric so that it knows when to finish, and iterates/reiterates over the output until completion?

I built Plandex[1], which works roughly like this. The goal (so far) is not to take you from an initial prompt to a 100% working solution in one go, but to provide tools that help you iterate your way to a 90-95% solution. You can then fill in the gaps yourself.

I think the idea of a fully autonomous AI engineer is currently mostly hype. Making that the target is good for marketing, but in practice it leads to lots of useless tire-spinning and wasted tokens. It's not a good idea, for example, to have the LLM try to debug its own output by default. It might, on a case-by-case basis, be a good idea to feed an error back to the LLM, but just as often it will be faster for the developer to do the debugging themselves.

1 - https://plandex.ai


This looked very promising.

Although, it's now prompting me to make an account when I issue `plandex new`?

None of the video demos show this requirement.

I think the demos should show this requirement. Or the Quickstart docs should directly link to the self-hosted instructions.

"? Hey there! It looks like this is your first time using Plandex on this computer.

What would you like to do?

> Start an anonymous trial on Plandex Cloud (no email required)

  Sign in, accept an invite, or create an account"****


Thanks for the feedback. The cloud option is offered as a way to get started as quickly as possible, but self-hosting is straightforward too: https://docs.plandex.ai/hosting/self-hosting


Plandex and Github's Copilot Workspace are very similar. You could draw inspiration from each other.

I tweeted about Copilot Workspace the other day.

https://twitter.com/aantix/status/1819794837375263228


To this date, ChatGPT Code Interpreter is still the most impressive implementation of this pattern that I've seen.

Give it a task, it writes code, runs the code, gets errors, fixes bugs, tries again generally until it succeeds.

That's over a year old at this point, and it's not clear to me if it counts as an "agent" by many people's definitions (which are often frustratingly vague).


Well, you have Cognition AI and Devin that became a recent unicorn startup (partnerships with Microsoft and stuff) but true, I can't think of an agent that actually lives up to the hype (heard Devin wasn't great).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: