jfcoa's comments

jfcoa · 2024-10-05T00:04:58 1728086698

This seems like a terrible test case since python examples are readily available in the training data: https://rosettacode.org/wiki/Cheryl%27s_birthday

It's interesting that so many of the model's fail to retrieve this, but any thta do solve it should clearly be able to do so with no reasoning/theory of mind.

kohlerm · 2024-10-05T12:00:39 1728129639

I agree this is not a great test. What's good about it is that it is a constraint satisfaction problem, and I would expect LLMs to be pretty bad at unknown problems of this kind. Simple reason, an LLM only has a a finite number of layers and it cannot do arbitrary long searches.

johnisgood · 2024-10-06T12:30:07 1728217807

I almost made ChatGPT write a Python program that creates a monthly work schedule (for imaginary workers) based on specific constraints (e.g. there are 10 workers, 2 shifts (morning and night), must work 40 hours per week, must have at least one weekend in a month off, 2 minimum workers per shift, no more than 3 consecutive working days, and so forth).

I am not sure if I could make it give me a working solution, however, and I have not tried Claude, for example, and I have not tried to do it with other programming languages. Maybe.

The issue was that it messed up the constraints and there were no feasible solutions, that said, it did give me a working program for this that had fewer constraints.

falcor84 · 2024-10-09T13:44:02 1728481442

I don't understand what you're saying - the idea is that we're asking the LLM to generate code to perform the search, rather than run an arbitrarily long search on its own, right? So why should the number of layers it has matter?

rghall102 · 2024-10-05T08:38:46 1728117526

It is fascinating that the R solution just below the Python solution is much shorter and more readable. The same applies to Ruby and various Lisps.

It even applies to the VisualBasic solution!

jfcoa · 2024-10-04T22:40:43 1728081643

The Ig Nobel is not for trivial achievements, it is to "honor achievements that first make people laugh, and then make them think." This takes different forms.

The part of the wikipedia article you are referencing is an inference from a particular article: "A September 2009 article in The National titled "A noble side to Ig Nobels" says that, although the Ig Nobel Awards are veiled criticism of trivial research, history shows that trivial research sometimes leads to important breakthroughs."

The definition of "blue zones" never had anything to do with average longevity. The entire concept is predicated on unusual numbers of centenarians, not long average life spans. In fact, as is pointed out in the Ig Nobel winning paper, Blue Zone places like Sardinia, Okinawa, and Ikaria have always been paradoxical: they are supposed to have higher numbers of unusually long lived people, but have shorter average lifespans than the rest of their countries. The paradox goes away with the finding that the count of centenarians is incorrect. There's nothing left to the Blue Zone concept without the centenarians.

giantg2 · 2024-10-04T22:55:59 1728082559

It's hard to believe it's not satirical...

"Ig Nobel Prize Winner Dr. Elena Bodnar demonstrates her invention (a brassiere that can quickly convert into a pair of protective face masks)"

jfcoa · 2024-10-04T23:05:37 1728083137

Yes, it is definitely satirical. But isn't specifically for "trivial", it gets deployed in different ways.

Some of the awards are straight up criticism of the research, like for bunk homeopathy stuff. It was awarded for the prank paper used in the Sokal affair, in which it's definitely praise of what Sokal did. Sometimes it is awarded for a bizarre but funny thing from something being studied in another more serious context like the magnetic frog levitation paper.

jfcoa · 2024-10-03T18:17:08 1727979428

It doesn't completely remove it, it removes certain dependencies on it so that it can be computed by parallel scan, there is still a hidden state. It bears some similarity to what was done with Mamba.