Hacker News new | past | comments | ask | show | jobs | submit login

I am a paying customer with credits and the API endpoints rate-limited me to the point where it's actually unusable as a coding assistant. I use a VS Code extension and it just bailed out in the middle of a migration. I had to revert everything it changed and that was not a pleasant experience, sadly.



When working with AI coding tools commit early, commit often becomes essential advice. I like that aider makes every change its own commit. I can always manicure the commit history later, I'd rather not lose anything when the AI can make destructive changes to code.


I can recommend https://github.com/tkellogg/dura for making auto-commits without polluting main branch history, if your tool doesn't support it natively


Why not just continue the migration manually?


Control your own inference endpoints.


Could you explain more on how to do this? e.g if I am using the Claude API in my service, how would you suggest I go about setting up and controlling my own inference endpoint?


You can't. He means by using the open source models.


Runa local LLM tuned for coding on LM Studio. It has a server and provides endpoints.


You aren't running against a local LLM?


That's like asking if they aren't paying the neighborhood drunk with wine bottles for doing house remodeling, instead of hiring a renovation crew.


That’s funny, but open weight, local models are pretty usable depending on the task.


You're right, but that's also subject to compute costs and time value of money. The calculus is different for companies trying to exploit language models in some way, and different for individuals like me who have to feed the family before splurging for a new GPU, or setting up servers in the cloud, when I can get better value by paying OpenAI or Claude a few dollars and use their SOTA models until those dollars run out.

FWIW, I am a strong supporter of local models, and play with them often. It's just that for practical use, the models I can run locally (RTX 4070 TI) mostly suck, and the models I could run in the cloud don't seem worth the effort (and cost).


For the money for a 4070ti, you could have bought a 3090, which although less efficient, can run bigger models like Qwen2.5 32b coder. Apparently it performs quite well for code


I guess the cost model doesn't work because you're buying gpu that you use about 0.1% of the day


That's what my grandma did in the village in Hungary. But with schnapps. And the drunk was also the professional renovation crew.


Not everyone has a 4090 or M4 Max at home.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: