You and me both man, Either I'm speaking a different language or I'm simply real...

tharant · 2024-10-24T04:18:23 1729743503

Indeed. I’ve yet to run across an actual demonstration of an LLM that can produce useful, non-trivial code. I’m not suggesting (yet) that the capabilities don’t exist or that everyone is lying—the web is a big place after all and finding things can be difficult—but I am slowly losing faith in the capability of what the industry is selling. It seems right now one must be deeply knowledgeable of and specialized in the ML/AI/NLP space before being capable of doing anything remotely useful with LLM-based code generation.

grbsh · 2024-10-24T13:00:05 1729774805

I think there is something deeper going on: “coding” is actually 2 activities: the act of implementing a solution, and the act of discovering the solution itself. Most programmers are used to doing both at once. But to code effectively with an LLM, you need to have already discovered the solution before you attempt to implement it!

I’ve found this to be the difference between writing 50+ prompts / back and for the to get something useful, and when I can get something useful in 1-3 prompts. If you look at Simon’s post, you’ll see that these are all self-contained tools, whose entire scope has been constrained from the outset of the project.

When you go into a large codebase and have to change some behavior, 1) you usually don’t have the detailed solution articulated in your mind before looking at the codebase. 2) That “solution” likely consists of a large number of small decisions / judgements. It’s fundamentally difficult to encode a large number of nuanced details in a concise prompt, making it not worth it to use LLMs.

On the other hand, I built this tool: https://github.com/gr-b/jsonltui that I now use every day almost entirely using Claude. “CLI tool to visualize JSONL with textual interface, localizing parsing errors” almost fully qualifies this. In contrast, my last 8 line PR at my company, while it would appear much simpler on the surface level, contains many more decisions, not just of my own, but reflecting team conversations and expectations that are not written down anywhere. To communicate this shared implicit context with Claude would be so much more difficult than to perform the change myself.

simonw · 2024-10-24T04:32:30 1729744350

I think https://tools.simonwillison.net/openai-audio is useful and non-trivial.

tharant · 2024-10-24T05:27:52 1729747672

You’re probably right but I’m far more interested in seeing things like how you prompted the model to produce your audio tool’s code. Did you have a design doc or did you collaborate with the model to come up with a design and its implementation ad-hoc? How much manual rewriting did you do. How much worked with little to no editing? How much did you prompt the model to fix any bugs it created? How successful was it? Did you specify a style guide up front or just use what it spat out and try to refactor later? How did that part go? You see where I’m going?

Oh, wow, it honestly just occurred to me that examples of how to prompt a model to produce a certain kind of content might be considered, more or less, some kind of trade secret vaguely akin to a secret recipe. That would be a bit depressing but I get it.

simonw · 2024-10-24T06:07:29 1729750049

Here are the full Claude transcripts I used to build the OpenAI Audio app:

- https://gist.github.com/simonw/0a4b826d6d32e4640d67c6319c7ec... - most of the work

- https://gist.github.com/simonw/a04b844a5e8b01cecd28787ed375e... - some tweaks

Lots more details in my full post about it here: https://simonwillison.net/2024/Oct/18/openai-audio/

fragmede · 2024-10-24T05:51:18 1729749078

details at: https://simonwillison.net/2024/Oct/18/openai-audio/