Hacker News new | past | comments | ask | show | jobs | submit login

In people's experience with these sorts of tools, have they assisted with maintainance of codebases? This might be directly, or indirectly via more readable, bette organized code.

The reason I ask is that these tools seem to excel in helping to write new code. In my experience I think there is an upper limit to the amount of code a single developer can maintain. Eventually you can't keep everything in your head, so maintaining it becomes more effort as you need to stop to familiarize yourself with something.

If these tools help to write more code, but do not assist with maintainance, I wonder if we're going to see masses of new code written really quickly, and then everything grinds to a halt, because no one has an intimate understanding of what was written?




My open source ai coding tool aider is unique in that it is designed to work with existing code bases. You can jump into an existing git repo and start aaking for changes, new features, etc.

https://github.com/paul-gauthier/aider

It helps gpt understand larger code bases by building a "repository map" based on analyzing the abstract syntax tree of all the code in the repo. This is all built using tree-sitter, the same tooling which powers code search and navigation on GitHub and in many popular IDEs.

https://aider.chat/docs/repomap.html


Related to the OpenAI announcement, I've been able to generate some preliminary code editing evaluations of the new GPT models. OpenAI is enforcing very low rate limits on the new GPT-4 model. I will update the results as quickly my rate limit allows.

https://news.ycombinator.com/item?id=38172621

Also, aider now supports these new models, including `gpt-4-1106-preview` with the massive 128k context window.

https://github.com/paul-gauthier/aider/releases/tag/v0.17.0


Dude I love how passionate you are about this project. I see you in every GPT thread. Despite all this tech there are so few projects out there trying to make large-repo code editing/generation possible.


I think that's mostly because most people are skeptical about trusting people with IP.

If there was a way to prove that the data was not being funneled into openai's next models, sure, but where is the proof of that? A piece of paper that says you aren't allowed to do something, does not equate to proof of that thing not being done.

Personally I believe all code work should be Open Source by default, as that would ensure the lowest quality code gets filtered out and only the best code gets used for production, resulting in the most efficient solutions dominating (aka, less carbon emissions or children being abused or whatever the politicians say today).

So as long as IP exists, companies will continue to drive profit to the max at the expense of every possible resource that has not been regulated. Instead of this model, why not banish IP, make everything open all at once and have only the best, most efficient code running, thereby locating missing children faster, or emitting less carbon, or bombing terrorists better or w/e.


I do love aider, thanks for making it! I'd like an option to stop it from writing files everywhere, though, even if that means I have no history.


Thanks for trying aider! I'd like to better understand your concern about aider's support files. If you're able, maybe file an issue and I'd be happy to try and help make it work better for you.

https://github.com/paul-gauthier/aider/issues


Sure, thanks!


> It helps gpt understand larger code bases by building a "repository map" based on analyzing the abstract syntax tree of all the code in the repo.

What I do with codespin[1] (another AI code gen tool) is to give a file/files to GPT and ask for signatures (and comments and maybe autogenerate a description), and then cache it until the file changes. For a lot of algorithmic work, we could just use GPT now. Sure it's less efficient, but as these costs come down it matters less and less. In a way, it's similar to higher level (but inefficient) programming languages vs lower level efficient languages.

[1]: https://github.com/codespin-ai/codespin-cli


How do you manage token limits when sending large amounts of code structure to OpenAI?


Aider has a "token budget" for the repository map (--map-tokens, default of 1k). It analyzes the AST of all the code in the repo, the call graph, etc... and uses a graph optimization algorithm to select the most relevant parts of the repo map that will fit in the budget.

There's some more detail in the recent writeup about the new tree-sitter based repo map that was linked in my comment above.


Any plans to open support up for other languages that tree-sitter supports?


Aider supports the repo map for bunch of languages already, see below. Is there one in particular you need that is missing?

https://github.com/paul-gauthier/aider/tree/main/aider/queri...


Perl5, please!


I think Cursor.sh is another tool designed around the same principle at least from a repository level awareness scope, but rather than just being CLI it's a full fledged VS Code fork.


Thanks a lot for doing this project. Your blog post got me excited.


> If these tools help to write more code, but do not assist with maintainance, I wonder if we're going to see masses of new code written really quickly, and then everything grinds to a halt, because no one has an intimate understanding of what was written?

Yep. Companies using LLMs to "augment" junior developers will get a lot of positive press, but I guess it remains to be seen how much the market consistently rewards this behavior. Consumers will probably see right through it, but the b2b folks might get fleeced for a few years before eventually churning and moving to a higher quality old-fashioned competitor that employs senior talent.

But IDK, maybe we'll come up with models that are good at growing and maintaining a coherent codebase. It doesn't seem like an impossible task, given where we are today. But we're pretty far from it still, as you point out.


What are the tasks that you envision are key to maintenance?

- bug finding and fixing

- parsing logs to find optimisation options

- refactoring (after several local changes)

- given new features, recommending a refactoring?

I feel like code assistants are already reasonable help for doing the first two, and the later two are mostly a question of context window. I feel we might end up with code bases split by context sizes, stitched with shared descriptions.


I guess the issue is that programmers work with a really big context window, and need to reason at multiple levels of abstraction depending on the problem.

A great idea to solve a problem at one level of abstraction / context might be a terrible "strategic" idea at a higher level of abstraction. This is what separates the "junior" engineers from "senior" engineers, speaking very loosely.

IDK, I'm not convinced by all that I've seen, that GPT is capable of that higher-order thinking. I fear it requires a degree of epistemology that GPT fundamentally doesn't possess as a stochastic token-guesser. It never pushes back against a request, or asks if you really intend another question by your first question. It never tries to read through your requirements to grasp the underlying problem that's prompting them.

Maybe some combination of static tools, senior caretakers and prompt hackery can get us to a solution that maintains code effectively. But I don't think you can throw out the senior caretakers, their verification involvement is really necessary. And I don't know how conducive this environment would be to developing the next generation of "senior caretakers".


> IDK, I'm not convinced by all that I've seen, that GPT is capable of that higher-order thinking. I fear it requires a degree of epistemology that GPT fundamentally doesn't possess as a stochastic token-guesser. It never pushes back against a request, or asks if you really intend another question by your first question. It never tries to read through your requirements to grasp the underlying problem that's prompting them.

It can if prompted appropriately. If you are just using the default ChatGPT interface and system prompt, it doesn't, but then, it is intended to be compliant outside of its safety limits in that application. (I am not arguing it has the analytical capacity to be suited for for the role being discussed, but the particular complaint about excessive compliance is a matter of prompting, not model capacity.)


I've been thinking about this for a while now, wrt two points:

1. This will be the end of traditional SWEs and the rise of the age of debuggers, human debuggers who spend their days setting up breakpoints and figuring bugs in a sea of LLM generated code.

2. Hiring will switch from using Leetcode questions to "pull out your debugger and figure out what's wrong with this code".


What makes you think the LLM couldn’t run a debugging session from the content of a JIRA ticket and the whole code base + documentation?


Having never seen it, or anything even close to it. (Of course, I'm a little biased by seeing product demos that don't even get "add another item to this list of command line arguments" right; maybe if everyone already believes it works, nobody bothers to actually sell that?)


If the codebase is anything more than a simple Python project... I don't think that'll happen.

It just doesn't scale that well. Hell, GPT-4 can't make sense of my own projects.


Who can say what's possible but there are very few "debugging transcripts" for neural nets to train on out there. So it'd have to sort of work out how to operate a debugger via functions, understand the codebase (perhaps large parts of it), intuit what's going wrong, be able to modify the codebase, etc. Lots of work to do there.


Nobody wants to face the potential that their skills are going to greatly decrease in value.


The first interview where I was handed some (intentionally) broken C code and gdb was about 15 years ago. I'm not sure that part is a change in developer workflow (this may apply more in systems and embedded though.)

I've been paying attention to this too (mostly by following Simon Willison) and I'm still solidly in the "get back to me when this stuff can successfully review a pull request or even interpret a traceback" camp...


Yep, time to start adding source_file.prompt sidecar files next to each generated module so debugging sessions can start at the same initial condition.


There's a nice code gpt plugin for intellij and vs code. Basically you can select some code and ask it to criticize it, refactor it, optimize it, find bugs in it, document it, explain it, etc. A larger context means that you can potentially fit your entire code base in that. Most people struggle to keep the details in their head of even a small code base.

The next level would be deeper integration with tools to ensure that whatever it changes, the tests still have to pass and the code still has to compile. Speaking of tests, writing those is another thing it can do. So, AI assisted salvaging of legacy code bases that would otherwise not be economical to deal with could become a thing.

What we can expect over the next years is a lot more AI assisted developer productivity. IMHO it will perform better on statically typed languages as those are simply easier to reason about for tools.


We are doing this for API-Testing now. You should check out our website

https://ai.stepci.com


piece of feedback: it's weird to have a drop-down on "OpenAPI Links" when there are no other options.


Thanks! We will have more examples coming very soon




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: