This seems like a terrible test case since python examples are readily available...

kohlerm · 2024-10-05T12:00:39 1728129639

I agree this is not a great test. What's good about it is that it is a constraint satisfaction problem, and I would expect LLMs to be pretty bad at unknown problems of this kind. Simple reason, an LLM only has a a finite number of layers and it cannot do arbitrary long searches.

johnisgood · 2024-10-06T12:30:07 1728217807

I almost made ChatGPT write a Python program that creates a monthly work schedule (for imaginary workers) based on specific constraints (e.g. there are 10 workers, 2 shifts (morning and night), must work 40 hours per week, must have at least one weekend in a month off, 2 minimum workers per shift, no more than 3 consecutive working days, and so forth).

I am not sure if I could make it give me a working solution, however, and I have not tried Claude, for example, and I have not tried to do it with other programming languages. Maybe.

The issue was that it messed up the constraints and there were no feasible solutions, that said, it did give me a working program for this that had fewer constraints.

falcor84 · 2024-10-09T13:44:02 1728481442

I don't understand what you're saying - the idea is that we're asking the LLM to generate code to perform the search, rather than run an arbitrarily long search on its own, right? So why should the number of layers it has matter?

rghall102 · 2024-10-05T08:38:46 1728117526

It is fascinating that the R solution just below the Python solution is much shorter and more readable. The same applies to Ruby and various Lisps.

It even applies to the VisualBasic solution!