Hacker News new | past | comments | ask | show | jobs | submit login

Interesting concept that raised the question for me: What is the primary limiting factor right now that prevents LLM’s or any other AI model to go “end to end” on programming a full software solution or full design/engineering solution?

Is it token limitations or accuracy the further you get into the solution?




LLM's can't gut a fish in the cube when they get to their limits.

On a more serious note: I think the high-level structuring of the architecture, and then the breakdown into tactical solutions — weaving the whole program together — is a fundamental limitation. It's akin to theorem-proving, which is just hard. Maybe it's just a scale issue; I'm bullish on AGI, so that's my preferred opinion.


Actually I think this is a good point: fundamentally an AI is forced to “color inside the lines”. It won’t tell you your business plan is stupid and walk away, which is a strong signal that is hard to ignore. So will this lead to people with more money than sense to do even more extravagantly stupid things than we’ve seen in the past, or is it basically just “Accenture-in-a-box”?


AI will absolutely rate your business plan if you ask it to.

Try this prompt:"Please rate this business plan on a scale of 1-100 and provide buttle points on how it can be improved without rewriting any of it: <business plan>"


I agree that AI is totally capable of rating a business plan. However, I think that the act of submitting a business plan to be rated requires some degree of humility on the part of the user, and I do doubt that an AI will “push back” when it comes to an obviously bad business plan unless specifically instructed to do so.


I wouldn't trust an absolute answer but it can help you generate counterarguments that you might miss


> LLM's can't gut a fish in the cube when they get to their limits.

Is this an idiom? Or did one of us just reach the limits of our context? :P


Office space reference.


I guess this would be the context window size in the case of LLMs.

Edit: On second thought, maybe at a certain minimum context window size it is possible to cajole the instructions in such a way that you at any point in the process make the LLM work at a suitable level of abstraction more like humans do.


Maybe the issue is that for us the "context window" that we feed ourselves is actually a compressed and abstracted version - we do not re-feed ourselves the whole conversation but a "notion" and key points that we have stored. LLMs have static memory so I guess there is no other way as to single-pass the whole thing.

For human-like learning it would need to update it state (learn) on the fly as it does inference.


Half baked idea: What if you have a tree of nodes. Each node stores a description of (a part of) a system and an LLM generated list of what the parts of it are, in terms of a small step towards concreteness. The process loops through each part in each node recursively, making a new node per part, until the LLM writes actual compilable code.


Isn't that what langchain is?


See https://github.com/mit-han-lab/streaming-llm and others. There's good reason to believe that attention networks learn how to update their own weights (Forget the paper) based on their input. The attention mechanism can act like a delta to update weights as the data propagates through the layers. The issue is getting the token embeddings to be more than just the 50k or so that we use for the english language so you can explore the full space, which is what the attention sink mechanism is trying to do.


Memory and finetuning. If it was easy to insert a framework/documentation into GPT4 (the only model capable of complex software development so far in my experience), it would be easy to create big complex software. The problem is that currently the memory/context management needs to be done all by the side of the LLM interaction (RAG). If it was easy to offload part of this context management on each interaction to a global state/memory, it would be trivial to create quality software with tens of thousands of LoCs.


It is the fact that LLM's can't and don't try to write valid programs. They try to write something which reads like a reply to your question, using their corpus of articles, exchanges etc. That's not remotely the same thing, and it's not at all about "accuracy" or "tokens".


The issue with transformers is the context length. Compute wise, we can figure out the long context window (in terms of figuring out the attention matrix and doing the calculations). The issue is training. The weights are specialized to deal with contexts only of a certain size. As far as I know, there's no surefire solution that can overcome this. But theoretically, if you were okay with the quadratic explosion (and had a good dataset, another point...) you could spend money and train it for much longer context lengths. I think for a full project you'd need millions of tokens.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: