Acceptable UX maybe, but the gap between Sonnet 3.5 and open models isn’t worth ...

thot_experiment · 2024-12-18T21:24:00 1734557040

You're the second person in this thread to make this point, what are you using it for? I find the difference is basically negligible (in the sense that both get the busywork right and both fail at anything complicated)

scosman · 2024-12-19T00:13:29 1734567209

yeah, Sonnet goes past that. 300+ line changes in 20 seconds. You have to review it, but generally it's right. It's infinitely faster than the time to look at docs and do it myself.

Sure it's busywork. But it's a lot of busywork very fast.

thot_experiment · 2024-12-19T01:27:36 1734571656

Well it's definitely not infinitely faster since you're having to review it, but we're talking about the delta between Sonnet and Qwen/Mistral/Llama or whatever, not doing it manually.

I'm really curious what your problem domain is, like specifically what sort of code are you asking it to change and what changes are you asking for.

I just gave o1 and Sonnet a total layup question (optimization that had a huge win simply by filtering an array before sorting it vs the other way around) and neither model got the solution right, both of them came up with ~hundred lines of code, neither model's code worked on the first try. It took me like 10 minutes to refactor and optimize the code for a 6x speedup and it would take longer than that to debug the AI code to even make it run. (I spent 10 minutes prompting/editing to try to get the generated solutions to run)

Also the initial code was 11 sloc, my solution is 14 sloc, and claude was 70 sloc and o1 was 93. idfk, i just don't think we're there yet

scosman · 2024-12-19T11:23:28 1734607408

Quick example: create a config class, allowing the caller get and set a variety of config vars. It should be easy to add new vars. Persist any set value to a yaml file. Each var should be described by a name, type, env var fallback value (optional) and default value if the set value and env var are null (optional). The API should be typed and allow int, string, float and lists. Add comprehensive tests.

Obviously nothing complicated but it takes non-zero time. It did it in one shot in about 30s. Didn’t have to look at and docs (I don’t have the yaml lib memorized). Got the python typing right which would have been a bit of a pain. A lot faster than doing it myself, even with reviews. Tests were solid so I could tell it worked.

The filter example you give seems like they should have aced it. Not sure what went wrong but it has easily done work like that for me. I usually am half way through typing the method name when the rest autocompletes. Are you using a tool with good context management like cursor?

8n4vidtmkvmk · 2024-12-19T06:34:10 1734590050

My work involves a lot of boilerplate. Made some changes recently that amount to about 3 lines of meaningful code but wrapped in 4 new files + edits to like 6 others. It's ridiculous. Perfect for AI but I haven't found a way to automate because the tools don't seem to be smart enough to create files and make random little edits, and the amount of words to explain what I want would be too much. One day...

dingnuts · 2024-12-19T00:48:26 1734569306

I gotta find a new trade