More

ai-christianson · 2025-02-06T21:40:59 1738878059

Have you given any thought into developing some kind of new product on your own (and turning that into a company?)

ai-christianson · 2025-02-06T21:31:01 1738877461

Does this do "Computer Use" in that it looks at the screen, controls the mouse, keyboard (e.g. how Anthropic computer use does?)

ai-christianson · 2025-02-06T21:04:21 1738875861

If history is any indication, we won't see any Apple consumer robots until quite some time after robots get some initial adoption and deployment by more generic companies.

ai-christianson · 2025-02-06T20:26:42 1738873602

> AI is bad, yes, but bad AI is still useful. Therefore, bad AI is here to stay, and we must deal with it.

IDK having lived through what computers could do in the 90s, current capabilities don't seem at all bad to me...

But I get what he's saying. It's not so much about whether it is good or bad, but how useful it is. By looking at my ever-growing AI bills (ChatGPT pro, Anthropic API costs from constantly testing, developing, and using RA.Aid and running other agents, etc.,) AI most definitely is useful, at least for me.

ai-christianson · 2025-02-06T20:06:07 1738872367

I've been wrestling with these same challenges while building RA.Aid—trying to make tools that speak LLM. We have good tool integrations, but a lot of the tools were originally designed for human consumption. The LLMs seem to have their own idea of how they want to do something, which is what makes prompt optimization such an important factor.

rmbyrro · 2025-02-06T20:24:38 1738873478

> The LLMs seem to have their own idea of how they want to do something

exactly! what I'm experiencing is that prompt engineering has its limitations and comes with inconsistency issues...

by designing the tool from scratch tailored to LLMs, we can make the interface match what their "own idea of how to do" that particular task, which is more reliable and scalable

ai-christianson · 2025-02-06T19:30:17 1738870217

One problem we've had developing autonomous SWE agents (https://github.com/ai-christianson/RA.Aid) is that open models just haven't been performing near sonnet on controlling the agent. Our experience is echoed by many other agent devs out there, and you can see it for yourself if you try deepseek (v3 or r1) vs sonnet in any agentic product.

Do you think that your training setup could help train these models to be better at agentic work?

danielhanchen · 2025-02-06T19:51:20 1738871480

Cool repo! Agreed OSS models are still lagging, but they're definitely catching up!

So with GRPO and reinforcement learning, the OSS model creators now have one more tool to make OSS models much better, since we now don't need vast amounts of labeled CoT data, but rather just questions and answers, and we let RL / GRPO figure out the CoT itself after using some reward function.

So I guess it definitely can help in agentic workloads!

ai-christianson · 2025-02-06T19:27:55 1738870075

> Commercial price/deal servers I suspect are just around the corner.

I agree. I think there's going to be a big trend towards paid APIs and paid access to data.

albertdessaint · 2025-02-06T21:26:35 1738877195

Wow, thank you so much! I came across the Model Context Protocol framework but didn't take the time to explore it in depth.

albertdessaint · 2025-02-07T06:18:04 1738909084

And right now Agent.ai is gaining traction (500k+ users)

ai-christianson · 2025-02-06T18:37:56 1738867076

They don't read full articles either (and sometimes not even the title!)

ai-christianson · 2025-02-06T18:11:29 1738865489

Cancer research, and disease research in general, is IMO one of the most promising things we could do with massive amounts of compute and energy.

ai-christianson · 2025-02-06T17:51:44 1738864304

Very cool! I'm working on a similar agent, but FOSS (https://github.com/ai-christianson/RA.Aid) --It'll be really interesting to see how the GitHub agent works.

On first impressions, it looks like they are taking the route of integrating tightly with VSCode, which means they'll be competing with Cline, Cursor, and Windsurf.

IMO it might be good for them to release this on the web, similar to the replit agent. Integration with GitHub directly would be awesome.

spiritplumber · 2025-02-06T18:48:21 1738867701

Very neat!