Hacker News new | past | comments | ask | show | jobs | submit login
Malleable software in the age of LLMs (2023) (geoffreylitt.com)
87 points by tosh 7 months ago | hide | past | favorite | 36 comments



> Turning a natural language specification into web scraping code is exactly the kind of code synthesis that current LLMs can already achieve.

I wish, I have tried. LLM's don't understand the DOM. Unless it is as simple as the address being in element id=address, an LLM is useless for generating scraping code. You are better off with an element picker and some heuristics to generate a selector / xpath query. Now there are some specialized models, I found a paper that takes the DOM tree and fits a vector to each node, but I think they are too much effort for little gain, unless someone integrates them into an open source scraping library so they are easy to use.


This has not been my experience, to be honest -- I wrote a pretty complex scraper for extracting portions of a page and structuring their contents into JSON, using ChatGPT-4, about a year ago and it worked pretty well. (But not "well" in the sense that a non-programmer would've been able to do this, if that's your bar.)

I even got it usefully updated when the format changed!


It could depend on the page. In my case it was almost half a megabyte, mostly HTML markup junk, and the text was in Japanese. https://kakuyomu.jp/works/16818023214223186311 And the task was "write a selector to identify the author". I even tried giving it the author, didn't help.


I love to see more websites where people can share successful relatively complex coding prompts.


>You are better off with an element picker and some heuristics to generate a selector / xpath query.

Bummer. I wanted to try my hand at this. There has to be some trick where you can combine LLM and some element picker to get a really robust solution right?


Hmm, extractnet seems somewhat promising:

https://github.com/currentslab/extractnet


That does a list of blocks and "cheats" by using heavily-engineereed features such as Readability's algorithm. It is suitable for some purposes, I guess. But the paper I am talking about, https://arxiv.org/pdf/2201.10608, does self-supervised learning.


Post author here. Just wanted to link to a talk that I gave which expands on the ideas in this essay and shows a couple evocative demos to make it more concrete:

https://youtu.be/bJ3i4K3hefI?si=BCrqvyZiFFur3W9p


Software is a mean to an end. So programming is a mean to a mean to an end. LLMs are also means to an end. Why would we want to use LLMs as a mean to a mean to a mean to an end, when it can be transparently specialized to get users directly to the end? I think end-user programming has to be a fallacy, because I see no one other than programmers wanting it. It’s not to say it’s an invalid idea, it’s more likely misdirected. I understand the feeling that programmers wish more people could yield the generalized control over their computer devices, as programmers have over their own. But do end users actually need that? Should this be a priority, when time is limited and people can spend it doing more fulfilling things than fiddling with machines? Don’t they have numerous other things to optimize for in their actual, physical lives? Why would we wish they carry the burden of such frigid discipline when others can do it for them? As far as I can tell, prompting itself already is end user programming, what’s lacking are the bridges between the models and diverse effectful APIs. As absurd as it sounds, at whatever level, programming is merely data parameterization over a set of interfaces. It’s also what prompting is, except the underlying interface is overly general. So it seems like a viable future should involve standardization efforts so LLMs can invoke APIs on their own, and we provide it with compatible interfaces it can use to service users. In this scenario, the applications behind the endpoints will not merely be stateless request/response interfaces, as they need also to instruct, model and contextualize remote data capture interfaces within the client – generative interfaces. Software can hardly be more malleable than that: contextually generating whatever UI it’s needed to achieve a task that was prompted in natural language. Once there’s enough composable black-box services accessible via LLMs, there is no need for end users to program anything, they will directly demand the effects. Sure, there can be service-generating services, but I fail to see how users could parameterize such calls while lacking domain knowledge, and while there’ll be a direct path to the exact or approximate effect they need in the first place. As for special cases, they will always require special solutions and specialized knowledge, no matter the domain.


Fiddling with machines results in processes that can save you a lot of time and headaches, so I do think it's worth the time to learn it.

Even if/when LLMs improve enough to bring their code quality up to par, humans should still be comfortable reading that code to ensure it aligns with the goal.


I'll worry about this when users can reliably write useful specification documents. Garbage In, Garbage Out and all that.


I don't know if the intention of the article was to stoke worry but actually embrace as opportunity! Users being able to make their own software for their own needs is a worthy and beautiful goal. It's like if people could only eat what was at restaurants and not cook for themselves.

And of course with iteration and feedback loops people can definitely learn how to specify what they want in a fairly precise way.


However, the easier programming gets the less people who doesn't know it need to do it because it's more likely somebody has already solved their problem.


If the thing you want it to do isn’t too unusual, LLMs can guide you into doing something vaguely sensible.

This is the one area of AI I’m actually pretty positive about. Computing largely passed people by, and computers have not lived up to their potential. I could see this approach making computers more useful for a large number of people.

Especially semi-technical people who’s specialty isn’t programming.


> Especially semi-technical people who’s specialty isn’t programming.

That would be worse kind . Imaging fixing bug of those people with LLM Hallucinated code that runs but having wrong logic all over the place.

I reduced use of LLM for coding these days. I only ask it to generate templates or write some repetitive codes and sampledata.


Cool how we're far enough into the adoption cycle that 'I reduced use of LLM' is a mainstream thing to hear among sophisticated techies.


Even with GPT4 the most useful usecase are to generate stuff that are so repetative and daunting to type and refactor code that interns wrote. And making it write documentations.

What i love most is to draft out project specs from user's requirements and to generate user stories.

Or let it write leet-code like problems that is too boring to do manually.

The actual layer of coding is still best done by experienced engineer's biological brains.


Some of us haven’t even incorporated LLMs into our development workflows yet. :)


Some of us looked at it and decided the results were so bad it was a net loss to even try.


Yep. Even the few times I use it via the ChatGPT interface I spend as much time fixing the bad assumptions it made.


Some of us looked at it and decided the results were so bad it was a net loss to even try, but it was just so cool...


I do not envy a non-programmer stuck with buggy code generated by an LLM.


I’m glad the conversation about LLMs in UI is getting more nuanced!

There was a while there where it seemed like conversational (either text or by voice-to-text) interfaces were the only way people could imagine using LLMs. Everything being an empty text box staring back at you.

It seems like that may have just been due to ease of implementation to an API. Now that we’ve all had some time to work a bit, some interesting UI experiments are starting to peek out.


Recent and related:

Malleable Software in the Age of LLMs (2023) - https://news.ycombinator.com/item?id=40188435 - April 2024 (38 comments)


There are domains that software being static is a guarantee of law obedience. In other words even continuous deployment is undesired in such cases


I call this movement the operator tool.

Tools will no longer need operators as adding multi modality to tools will ensure we can actually have the operator interface on top of the tool.

Tuning the operator is where the playing field shifts to as operators which can translate fuzzy inputs to creative specs or workflows which can in turn be executed by the downstream engine.


I don't think people are going far enough. There will be no code in 5-10 years.

> Jensen Huang CEO of NVIDIA said: “Every single pixel will be generated soon. Not rendered: generated” (https://x.com/icreatelife/status/1639363377255309328?lang=en)

If we take this to the extreme, user interfaces are going to be generated on the fly, specific for the user. They will adapt to the user based on the device they are using, time of day, what data is being displayed, what the user prefers, etc.

These large models generating views might be streamed from servers (ala stadia), it might pass off some of the work to edge devices.

The models will be able to store things and communicate with other models as needed. Models might spin up that perform certain things well and have access to specific resources.


Imagine a company trying to offer support to interfaces all tailored to a specific user. I can imagine a future where this is viable, but it will def be a longer time span than 5 years


Yea, timeline is likely way too optimistic.


It's like Elon time on steroids.


Yeah... no chance. There will be no code in 5 years. What a joke. Even if it were technically possible (it's not), the idea that the humans involved would be able to make it happen in such a short amount of time is laughable.

Seriously, think harder about what you're suggesting here. It's ridiculous.


Dude chill, it's a thought experiment based on how things are moving right now. Maybe i'm off by a magnitude of years, who cares. Whether it happens in 5, 15, 50 years, it's going to happen.


Who said so? It might not happen at all. We might well be coding in a thousand years, why not? Non-programmers seem to not understand that a programming language is just that, a language. We use it just because it is more productive than using plain English, which now with LLMs is becoming a better tool to program. But prompt engineering is still programming and that won't change in a thousand years.


The idea that programmers won't want to code, or that artists won't want to art, or that musicians won't want to music, is absurd. Creativity is one of the most rewarding things a human being can engage in, just because corporations would prefer to pay for shitty, broken hallucinated code or art doesn't mean humans will stop doing it for their own satisfaction.

This concept obviously flies well over the heads of the LLM/AI/AGI bros who think that our creative lives is going to be rendered obsolete by vegetable silicon. They lack creativity and imagination.


You didn't present it as a thought experiment, and it's also not something to treat flippantly. If it happens, it will be because we achieve AGI and nothing less. I know it's in vogue to think that's 5 years away right now, but we're not remotely close and it's also in no way inevitable.

If it does come to that, whether or not there is code will be the least of our worries.


The article of this post is a thought experiment. Totally fair about the timeline and AGI requirement. Throwing around Elon-like estimates over here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: