If history is any indication, we won't see any Apple consumer robots until quite some time after robots get some initial adoption and deployment by more generic companies.
> AI is bad, yes, but bad AI is still useful. Therefore, bad AI is here to stay, and we must deal with it.
IDK having lived through what computers could do in the 90s, current capabilities don't seem at all bad to me...
But I get what he's saying. It's not so much about whether it is good or bad, but how useful it is. By looking at my ever-growing AI bills (ChatGPT pro, Anthropic API costs from constantly testing, developing, and using RA.Aid and running other agents, etc.,) AI most definitely is useful, at least for me.
I've been wrestling with these same challenges while building RA.Aid—trying to make tools that speak LLM. We have good tool integrations, but a lot of the tools were originally designed for human consumption. The LLMs seem to have their own idea of how they want to do something, which is what makes prompt optimization such an important factor.
> The LLMs seem to have their own idea of how they want to do something
exactly! what I'm experiencing is that prompt engineering has its limitations and comes with inconsistency issues...
by designing the tool from scratch tailored to LLMs, we can make the interface match what their "own idea of how to do" that particular task, which is more reliable and scalable
One problem we've had developing autonomous SWE agents (https://github.com/ai-christianson/RA.Aid) is that open models just haven't been performing near sonnet on controlling the agent. Our experience is echoed by many other agent devs out there, and you can see it for yourself if you try deepseek (v3 or r1) vs sonnet in any agentic product.
Do you think that your training setup could help train these models to be better at agentic work?
Cool repo! Agreed OSS models are still lagging, but they're definitely catching up!
So with GRPO and reinforcement learning, the OSS model creators now have one more tool to make OSS models much better, since we now don't need vast amounts of labeled CoT data, but rather just questions and answers, and we let RL / GRPO figure out the CoT itself after using some reward function.
So I guess it definitely can help in agentic workloads!
On first impressions, it looks like they are taking the route of integrating tightly with VSCode, which means they'll be competing with Cline, Cursor, and Windsurf.
IMO it might be good for them to release this on the web, similar to the replit agent. Integration with GitHub directly would be awesome.
reply