Hacker News new | past | comments | ask | show | jobs | submit login

This seems like a terrible test case since python examples are readily available in the training data: https://rosettacode.org/wiki/Cheryl%27s_birthday

It's interesting that so many of the model's fail to retrieve this, but any thta do solve it should clearly be able to do so with no reasoning/theory of mind.




I agree this is not a great test. What's good about it is that it is a constraint satisfaction problem, and I would expect LLMs to be pretty bad at unknown problems of this kind. Simple reason, an LLM only has a a finite number of layers and it cannot do arbitrary long searches.


I almost made ChatGPT write a Python program that creates a monthly work schedule (for imaginary workers) based on specific constraints (e.g. there are 10 workers, 2 shifts (morning and night), must work 40 hours per week, must have at least one weekend in a month off, 2 minimum workers per shift, no more than 3 consecutive working days, and so forth).

I am not sure if I could make it give me a working solution, however, and I have not tried Claude, for example, and I have not tried to do it with other programming languages. Maybe.

The issue was that it messed up the constraints and there were no feasible solutions, that said, it did give me a working program for this that had fewer constraints.


I don't understand what you're saying - the idea is that we're asking the LLM to generate code to perform the search, rather than run an arbitrarily long search on its own, right? So why should the number of layers it has matter?


It is fascinating that the R solution just below the Python solution is much shorter and more readable. The same applies to Ruby and various Lisps.

It even applies to the VisualBasic solution!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: