ARC Prize – a $1M+ competition towards open AGI progress

neoneye2 · 2024-06-11T23:07:56 1718147276

I'm Simon Strandgaard and I participated in ARCathon 2022 (solved 3 tasks) and ARCathon 2023 (solved 8 tasks).

I'm collecting data for how humans are solving ARC tasks, and so far collected 4100 interaction histories (https://github.com/neoneye/ARC-Interactive-History-Dataset). Besides ARC-AGI, there are other ARC like datasets, these can be tried in my editor (https://neoneye.github.io/arc/).

I have made some videos about ARC:

Replaying the interaction histories, and you can see people have different approaches. It's 100ms per interaction. IRL people doesn't solve task that fast. https://www.youtube.com/watch?v=vQt7UZsYooQ

When I'm manually solving an ARC task, it looks like this, and you can see I'm rather slow. https://www.youtube.com/watch?v=PRdFLRpC6dk

What is weird. The way that I implement a solver for a specific ARC task is much different than the way that I would manually solve the puzzle. Having to deal with all kinds of edge cases.

Huge thanks to the team behind the ARC Prize. Well done.

parentheses · 2024-06-12T04:35:26 1718166926

The UX of your solution entry is _way_ better than the ARC site itself.

mkl · 2024-06-12T11:45:04 1718192704

Being able to hold the mouse button down is certainly much nicer. Not being able to see the examples while you are solving makes it harder than it should be though.

neoneye2 · 2024-06-12T13:00:35 1718197235

I have create an issue with your suggestion. https://github.com/neoneye/ARC-Interactive/issues/67

Seeing the examples while having the editor visible. That's a good idea. I haven't explored this direction, since I had my phone (with tiny screen estate) in mind.

Drafts for a such a UI are much welcome. However I'm probably too lazy to code it though.

neoneye2 · 2024-06-12T08:03:51 1718179431

That warms my heart. Thank you.

The short story. I needed something that could render thumbnails of tasks, so I could visual debug what was going on in my solver. However I have never gotten around to make the visual inspection tool. After having the thumbnail renderer, mid january 2024, then it eventually turned into what it is now.

ECCME · 2024-06-12T04:33:20 1718166800

"Here is a challenge, designed to be unsolvable or so. We'll give you a bazillion dollars if you complete the challenge, and, in the meantime, we will use your attempts to train an as AI that will be worth the cost!!"

gota · 2024-06-12T16:41:46 1718210506

In the most charitable interpretation of this comment - I can understand the feeling, when so much of social media interactions are in the form 'It's post a picture of you as a baby, 10 year old, and current age!'. Those and many other instances can bring out excessive skepticism

But the people involved in this haven't signaled that they are in that path, either in the message about the challenge (precisely the opposite) or seemingly in their careers so far

So I guess I don't share the concern but a better way to phrase your comment could be -

"how can we be sure the human-provided solutions won't turn out to be just fodder for training a RL model or something that will later be monetized, closed and proprietary? Do the challenge organizers provide any guarantees on that?"

geor9e · 2024-06-12T05:59:47 1718171987

No, you missed the point. The striking thing about ARC is the puzzles are super easy, for humans. The average person solves 85% of the tasks, but the worlds best LLMs are only solving 5%. The challenge is to simply make an AI score as well as the average human.

ECCME · 2024-06-12T06:11:33 1718172693

[flagged]

hackerlight · 2024-06-12T06:21:22 1718173282

Amazon Mechanical Turk workers, who might not be 100 average IQ but wouldn't be far off that.

skrebbel · 2024-06-12T05:24:13 1718169853

Did you even try the puzzles? They’re not particularly “unsolvable”.

ECCME · 2024-06-12T06:01:00 1718172060

ARC-AGI: "here are some pretty simple puzzles, we'll give you a million dollars to solve them!"

Human: "They're quite challenging, this might be a trick to engage activity for the purpose of training models."

skrebbel: "You're stupid".

educasean · 2024-06-12T06:21:44 1718173304

Did you try the puzzles?

ECCME · 2024-06-12T06:32:02 1718173922

No. What is the purpose of this competition? Unlikely that the reason for it is to pay out an enormous reward, right? Easy or not easy, the fortune is only rewarded to the system that solves the puzzles. The reward is too valuable to be given away easily. Ipso facto, solving the puzzles is deemed challenging by those who present the competition.

echoangle · 2024-06-12T14:37:57 1718203077

Are you writing this under every challenge with a monetary reward? The point of the challenge is that it is hard to do for an AI and easy for a human. Of course it is not easy to solve, that’s the point of the challenge. But the puzzle itself is not very hard.

salamo · 2024-06-11T22:38:36 1718145516

This is super cool. I share Francois' intuition that the presently data-hungry learning paradigm is not only not generalizable but unsustainable: humans do not need 10,000 examples to tell the difference between cats and dogs, and the main reason computers can today is because we have millions of examples. As a result, it may be hard to transfer knowledge to more esoteric domains where data is expensive, rare, and hard to synthesize.

If I can make one criticism/observation of the tests, it seems that most of them reason about perfect information in a game-theoretic sense. However, many if not most of the more challenging problems we encounter involve hidden information. Poker and negotiations are examples of problem solving in imperfect information scenarios. Smoothly navigating social situations also requires a related problem of working with hidden information.

One of the really interesting things we humans are able to do is to take the rules of a game and generate strategies. While we do have some algorithms which can "teach themselves" e.g. to play go or chess, those same self-play algorithms don't work on hidden information games. One of the really interesting capabilities of any generally-intelligent system would be synthesizing a general problem solver for those kinds of situations as well.

com2kid · 2024-06-11T23:45:51 1718149551

> humans do not need 10,000 examples to tell the difference between cats and dogs,

I swear, not enough people have kids.

Now, is it 10k examples? No, but I think it was on the order of hundreds, if not thousands.

One thing kids do is they'll ask for confirmation of their guess. You'll be reading a book you've read 50 times before and the kid will stop you, point at a dog in the book, and ask "dog?"

And there is a development phase where this happens a lot.

Also kids can get mad if they are told an object doesn't match up to the expected label, e.g. my son gets really mad if someone calls something by the wrong color.

Another thing toddlers like to do is play silly labeling games, which is different than calling something the wrong name on accident, instead this is done on purpose for fun. e.g. you point to a fish and say "isn't that a lovely llama!" at which point the kid will fall down giggling at how silly you are being.

The human brain develops really slowly[1], and a sense of linear time encoding doesn't really exist for quite awhile. (Even at 3, everything is either yesterday, today, or tomorrow) so who the hell knows how things are being processed, but what we do know is that kids gather information through a bunch of senses, that are operating at an absurd data collection rate 12-14 hours a day, with another 10-12 hours of downtime to process the information.

[1] Watch a baby discover they have a right foot. Then a few days later figure out they also have a left foot. Watch kids who are learning to stand develop a sense of "up above me" after they bonk their heads a few time on a table bottom. Kids only learn "fast" in the sense that they have nothing else to do for years on end.

PheonixPharts · 2024-06-12T01:13:07 1718154787

> Now, is it 10k examples? No, but I think it was on the order of hundreds, if not thousands.

I have kids so I'm presuming I'm allowed to have an opinion here.

This is ignoring the fact that babies are not just learning labels, they're learning the whole of language, motion planning, sensory processing, etc.

Once they have the basics down concept acquisition time shrinks rapidly and kids can easily learn their new favorite animal in as little as a single example.

Compare this to LLMs which can one-shot certain tasks, but only if they have essentially already memorized enough information to know about that task. It gives the illusion that these models are learning like children do, when in reality they are not even entirely capable of learning novel concepts.

Beyond just learning a new animal, humans are able to learn entirely new systems of reasoning in surprisingly few examples (though it does take quite a bit of time to process them). How many homework questions did your entire calc 1 class have? I'm guessing less than 100 and (hopefully) you successfully learned differential calculus.

educasean · 2024-06-12T06:35:43 1718174143

> kids can easily learn their new favorite animal in as little as a single example

Until they encounter a similar animal and get confused, at which point you understand the implicit heuristic they were relying on. (Eg. They confused a dairy cow as a zebra, which means their heuristic was a black-and-white quadrupedal)

Doesn't this seem remarkably close to how LLMs behave with one-shot or few-shot learning? I think there are a lot more similarities here than you give it credit for.

Also, I grew up in South Korea where early math education is highly prioritized (for better or for worse). I remember having to solve 2 dozen arithmetic problems every week after school with a private tutor. Yes, it was torture and I was miserable, but it did expose me to thousands more arithmetic questions than my American peers. All that misery paid off when I moved to the U.S. at the age of 12 and realized that my math level was 3-4 years above my peers. So yes, I think human intelligence accuracy also does improve with more training data.

interloxia · 2024-06-12T06:53:43 1718175223

Not many zebras where I live but lots of little dogs. Small dogs were clearly cats for a long time no matter what I said. The training can take a while.

TeMPOraL · 2024-06-12T09:53:04 1718185984

This. My 2.5 y.o. still argues with me that a small dog she just saw in the park is a "cat". That's in contrast to her older sister, who at 5 is... begrudgingly accepting that I might be right about it after the third time I correct her.

HarHarVeryFunny · 2024-06-12T11:34:59 1718192099

The thing is that the labels "cat" and "dog" reflect a choice in most languages to name animals based on species, which manifests in certain physical/behavioral attributes. Children need to learn by observation/teaching and generalization that these are the characteristics they need to use to conform to our chosen labelling/distinction, and that other things such as size/color/speed are irrelevant.

Of course it didn't have to be this way - in a different language animals might be named based on size or abilities/behavior, etc.

So, your daughter wanting to label a cat-sized dog as a cat is just a reflection of her not having aligned her generalization of what you are talking about when you say "cat" vs "dog" with her own.

lynx23 · 2024-06-12T10:33:31 1718188411

And once they learn sarcasm, small dogs are cats again :-)

pea · 2024-06-12T16:07:46 1718208466

My favourite part of this is when they apply their new words to things that technically make sense, but don't. My daughter proudly pointed at a king wearing a crown as "sharp king" after learning about knives, saws, etc.

dimask · 2024-06-12T08:27:57 1718180877

> How many homework questions did your entire calc 1 class have? I'm guessing less than 100 and (hopefully) you successfully learned differential calculus.

Not just that: people learn mathematics mainly by _thinking over and solving problems_, not by memorising solutions to problems. During my mathematics education I had to practice solving a lot of problems dissimilar what I had seen before. Even in the theory part, a lot of it was actually about filling in details in proofs and arguments, and reformulating challenging steps (by words or drawings). My notes on top of a mathematical textbook are much more than the text itself.

People think that knowledge lies in the texts themselves; it does not, it lies in what these texts relate to and the processes that they are part of, a lot of which are out in the real world and in our interactions. The original article is spot on that there is no AGI pathway in the current research direction. But there are huge incentives for ignoring this.

naasking · 2024-06-12T14:27:31 1718202451

> Not just that: people learn mathematics mainly by _thinking over and solving problems_, not by memorising solutions to problems.

I think it's more accurate to say that they learn math by memorizing a sequence of steps that result in a correct solution, typically by following along with some examples. Hopefully they also remember why each step contributes to the answer as this aids recall and generalization.

The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly. This is just standard training. Understanding the motivation of each step helps with that memorization, and also allows you to apply that step in novel problems.

> The original article is spot on that there is no AGI pathway in the current research direction.

I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits for problem solving if trained enough, and parametric memory generalizes their operation to many more tasks.

They have now been able to achieve near perfect accuracy on comparison tasks, where GPT-4 is barely in the double digit success rate.

Composition tasks are still challenging, but parametric memory is a big step in the right direction for that too. Accurate comparitive and compositional reasoning sound tantalizingly close to AGI.

Vetch · 2024-06-12T15:06:35 1718204795

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly

Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes. Me and Terence Tao on the same exact math training data would not yield two mathematicians of similar skill.

While it's true that memorization of properties, structure, operations and what should be applied when and where is involved, there is a much deeper component of knowing how these all relate to each other. Grasping their fundamental meaning and structure, and some people seem to be wired to be better at thinking about and picking out these subtle mathematical relations using just the description or based off of only a few examples (or be able to at all, where everyone else struggles).

> I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits

It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.

From: https://arxiv.org/abs/2405.15071

> The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison.

naasking · 2024-06-12T15:31:50 1718206310

> Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes

Everyone starts by memorizing how to do basic arithmetic on numbers, their multiplication tables and fractions. Only some then advance to understanding why those operations must work as they do.

> It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.

Yes, I acknowledged that when I said "Composition tasks are still challenging". Comparisons and composition are both key to abstract reasoning. Clearly parametric memory and grokking have shown a fairly dramatic improvement in comparative reasoning with only a small tweak.

There is no evidence to suggest that compositional reasoning would not also fall to yet another small tweak. Maybe it will require something more dramatic, but I wouldn't bet on it. This pattern of thinking humans are special does not have a good track record. Therefore, I find the original claim that I was responding to("there is no AGI pathway in the current research direction") completely unpersuasive.

SonOfLilit · 2024-06-12T15:50:35 1718207435

I started by understanding. I could multiply by repeat addition (each addition counted one at a time with the aid of fingers) before I had the 10x10 addition table memorized. I learned university level calculus before I had more than half of the 10x10 multiplication table memorized, and even that was from daily use, not from deliberate memorization. There wasn't a day in my life where I could recite the full table.

Maybe schools teach by memorization, but my mom taught me by explaining what it means, and I highly recommend this approach (and am a proof by example that humans can learn this way).

naasking · 2024-06-12T16:13:51 1718208831

> I started by understanding. I could multiply by repeat addition

How did you learn what the symbols for numbers mean and how addition works? Did you literally just see "1 + 3 = 4" one day and intuit the meaning of all of those symbols? Was it entirely obvious to you from the get-go that "addition" was the same as counting using your fingers which was also the same as counting apples which was also the same as these little squiggles on paper?

There's no escaping the fact that there's memorization happening at some level because that's the only way to establish a common language.

SonOfLilit · 2024-06-13T08:00:41 1718265641

There's a difference between memorizing meanings of words (addition is same as counting this and then the other thing, "3" means three things) and memorizing methods (table of single digit addition/multiplication to do them faster in your head). You were arguing the second, I'm a counterexample. I agree about the first, everyone learns language by memorization (some rote, some by use), but language is not math.

naasking · 2024-06-13T13:47:22 1718286442

> You were arguing the second, I'm a counterexample.

I still don't think you are. Since we agree that you memorized numbers and how they are sequential, and that counting is moving "up" in the sequence, addition as counting is still memorizing a procedure based on this, not just memorizing a name: to add any two numbers, count down on one as you count up on the other until the first number number reaches zero, and the number that counted up is the sum. I'm curious how you think you learned addition without memorizing this procedure (or one equivalent to it).

Then you memorized the procedure for multiplication: given any two numbers, count down on one and add the other to itself until the counted down number reaches one. This is still a procedure that you memorized under the label "multiplication".

This is exactly the kind of procedure that I initially described. Someone taught you a correct procedure for achieving some goal and gave you a name for it, and "learning math" consists of memorizing such correct procedures (valid moves in the game of math if you will). These moves get progressively more sophisticated as the math gets more advanced, but it's the same basic process.

They "make sense" to you, and you call it "understanding", because they are built on a deep foundation that ultimately grounds out in counting, but it's still memorizing procedures up and down the stack. You're just memorizing the "minimum" needed to reproduce everything else, and compression is understanding [1].

The "variation in outcomes" that an OP discussed is simply because many valid moves are possible in any given situation, just like in chess, and if you "understand" when a move is valid vs. not (eg. you remember it), then you have an advantage over someone who just memorized specific shortcuts, which I suspect is what you are thinking I mean by memorization.

[1] https://philpapers.org/rec/WILUAC-2

dimask · 2024-06-15T18:47:05 1718477225

I think you are confusing "memory" with strategies based on memorisation. Yes memorising (ie putting things into memory) is always involved in learning in some way, but that is too general and not what is discussed here. "Compression is understanding" possibly to some extent, but understanding is not just compression; that would be a reduction of what understanding really is, as it involves a certain range of processes and contexts in which the understanding is actually enacted rather than purely "memorised" or applied, and that is fundamentally relational. It is so relational that it can even go deeply down to how motor skills are acquired or spatial relationships understood. It is no surprise that tasks like mental rotation correlates well with mathematical skills.

Current research in early mathematical education now focuses on teaching certain spatial skills to very young kids rather than (just) numbers. Mathematics is about understanding of relationships, and that is not a detached kind of understanding that we can make into an algorithm, but deeply invested and relational between the "subject" and the "object" of understanding. Taking the subject and all the relations with the world out of the context of learning processes is absurd, because that is in the exact centre of them.

SonOfLilit · 2024-06-14T11:26:40 1718364400

Sorry, I strongly disagree.

I did memorize names of numbers, but that is not essential in any way to doing or understanding math, and I can remember a time where I understood addition but did not fully understand how names of numbers work (I remember, when I was six, playing with a friend at counting up high, and we came up with some ridiculous names for high numbers because we didn't understand decimal very well yet).

Addition is a thing you do on matchsticks, or fingers, or eggs, or whatever objects you're thinking about. It's merging two groups and then counting the resulting group. This is how I learned addition works (plus the invariant that you will get the same result no matter what kind of object you happen to work with). Counting up and down is one method that I learned, but I learned it by understanding how and why it obviously works, which means I had the ability to generate variants - instead of 2+8=3+7=... I can do 8+2=9+1=..., or I can add ten at a time, etc'.

Same goes for multiplication. I remember the very simple conversation where I was taught multiplication. "Mom, what is multiplication?" "It's addition again and again, for example 4x3 is 3+3+3". That's it, from that point on I understood (integer) multiplication, and could e.g. wonder myself at why people claim that xy=yx and convince myself that it makes sense, and explore and learn faster ways to calculate it while understanding how they fit in the world and what they mean. (An exception is long multiplication, which I was taught as a method one day and was simple enough that I could memorize it and it was many years before I was comfortable enough with math that whenever I did it it was obvious to me why what I'm doing here calculates exactly multiplication. Long division is a more complex method: it was taught to me twice by my parents, twice again in the slightly harder polynomial variant by university textbooks, and yet I still don't have it memorized because I never bothered to figure out how it works nor to practice enough that I understand it).

I never in my life had an ability to add 2+2 while not understanding what + means. I did for half an hour have the same for long division (kinda... I did understand what division means, just not how the method accomplishes it) and then forgot. All the math I remember, I was taught in the correct order.

edit: a good test for whether I understood a method or just memorized it would be, if there's a step I'm not sure I remember correctly, whether I can tell which variation has to be the correct one. For example, in long multiplication, if I remembered each line has to be indented one place more to the right or left but wasn't sure which, since I understand it, I can easily tell that it has to be the left because this accomplishes the goal of multiplying it by 10, which we need to do because we had x0 and treated it as x.

11101010001100 · 2024-06-12T15:49:50 1718207390

The point is the memorization exercise requires orders of magnitude fewer examples for bootstrapping.

naasking · 2024-06-12T16:14:42 1718208882

Does it though? It's a common claim but I don't think that's been rigourously established.

shkkmo · 2024-06-12T15:03:27 1718204607

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly

Perhaps that is how you learned math, but it is nothing like how I learned math. Memorizing steps does not help, I sucked at it. What works for me us understanding the steps and why we used them. Once I understood the process and why it worked, I was able to reason my way through it.

> The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly.

Did you look at the types of problems presented by the ARC-AGO test? I don't see how memorization plays any role.

> They have now been able to achieve near perfect accuracy on comparison tasks, where GPT-4 is barely in the double digit success rate.

Then lets see how they do on the ARC test? While it is possible that generalized circuits can develop in Ls with enough training but I am pretty skeptical till we see results.

naasking · 2024-06-12T15:22:57 1718205777

> Perhaps that is how you learned math, but it is nothing like how I learned math.

Memorization is literally how you learned arithmetic, multiplication tables and fractions. Everyone starts learning math by memorization, and only later start understanding why certain steps work. Some people don't advance to that point, and those that do become more adept at math.

pedrosorio · 2024-06-12T16:32:10 1718209930

> Memorization is literally how you learned arithmetic, multiplication tables and fractions

I understood how to do arithmetic for numbers with multiple digits before I was taught a "procedure". Also, I am not even sure what you mean by "memorization is how you learned fractions". What is there to memorize?

naasking · 2024-06-12T17:24:43 1718213083

> I understood how to do arithmetic for numbers with multiple digits before I was taught a "procedure"

What did you understand, exactly? You understood how to "count" using "numbers" that you also memorized? You intuitively understood that addition was counting up and subtraction was counting down, or did you memorize those words and what they meant in reference to counting?

> Also, I am not even sure what you mean by "memorization is how you learned fractions". What is there to memorize?

The procedure to add or subtract fractions by establishing a common denominator, for instance. The procedure for how numerators and denominators are multiplied or divided. I could go on.

shkkmo · 2024-06-13T06:10:46 1718259046

Fractions is exactly an area of mathematics where I learned by understanding the concept and how it was represented and then would use that understanding to re-reason the procedures I had a hard time remembering.

I do have the single digit multiplication table memorized now, but there was a long time where that table had gaps and I would use my understanding of how numbers worked to to calculate the result rather than remembering it. That same process still occurs for double digit number.

Mathematics education, especially historically, has indeed leaned pretty heavily on memorization. That does mean thats the only way to learn math, or even a particularly good one. I personally think over reliance on memorization is part of why so many people think they hate math.

naasking · 2024-06-13T14:02:32 1718287352

> Fractions is exactly an area of mathematics where I learned by understanding the concept and how it was represented and then would use that understanding to re-reason the procedures I had a hard time remembering.

Sure, I did that plenty too, but that doesn't refute the point that memorization is core to understanding mathematics, it's just a specific kind of memorization that results maximal flexibility for minimal state retention. All you're claiming is that you memorized some core axioms/primitives and the procedures that operate on them, and then memorized how higher-level concepts are defined in terms of that core. I go into more detail of the specifics here:

https://news.ycombinator.com/item?id=40669585

I agree that this is a better way to memorize mathematics, eg. it's more parsimonious than memorizing lots of shortcuts. We call this type of memorizing "understanding" because it's arguably the most parsimonious approach, requiring the least memory, and machine learning has persuasively argued IMO that compression is understanding [1].

[1] https://philpapers.org/rec/WILUAC-2

imtringued · 2024-06-12T13:17:19 1718198239

Every time I see people online reduce the human thinking process to just production of a perceptible output, I start questioning myself, whether somehow I am the only human on this planet capable of thinking and everyone else is just pretending. That can't be right. It doesn't add up.

The answer is that both humans and the model are capable of reasoning, but the model is more restricted in the reasoning that it can perform since it must conform to the dataset. This means the model is not allowed to invest tokens that do not immediately represent an answer but have to be derived on the way to the answer. Since these thinking tokens are not part of the dataset, the reasoning that the LLM can perform is constrained to the parts of the model that are not subject to the straight jacket of training loss. Therefore most of the reasoning occurs in-between the first and last layers and ends with the last layer, at which point the produced token must cross the training loss barrier. Tokens that invest into the future but are not in the dataset get rejected and thereby limit the ability of the LLM to reason.

TeMPOraL · 2024-06-12T10:00:50 1718186450

> People think that knowledge lies in the texts themselves; it does not, it lies in what these texts relate to and the processes that they are part of, a lot of which are out in the real world and in our interactions

And almost all of it is just more text, or described in more text.

You're very much right about this. And that's exactly why LLMs work as well as they do - they're trained on enough text of all kinds and topics, that they get to pick up on all kinds of patterns and relationships, big and small. The meaning of any word isn't embedded in the letters that make it, but in what other words and experiences are associated with it - and it so happens that it's exactly what language models are mapping.

dimask · 2024-06-12T12:42:04 1718196124

It is not "just more text". That is an extremely reductive approach on human cognition and experience that does favour to nothing. Describing things in text collapses too many dimensions. Human cognition is multimodal. Humans are not computational machines, we are attuned and in constant allostatic relationship with the changing world around us.

whyever · 2024-06-12T08:59:37 1718182777

I think there is a component of memorizing solutions. For example, for mathematical proofs there is a set of standard "tricks" that you should have memorized.

shkkmo · 2024-06-12T15:05:04 1718204704

Sure memory helps a lot, it allows you to concentrate your mental effort on the novel ot unique parts of the problem.

aamar · 2024-06-12T03:13:50 1718162030

> How many homework questions did your entire calc 1 class have? I'm guessing less than 100…

I’m quite surprised at this guess and intrigued by your school’s methodology. I would have estimated >30 problems average across 20 weeks for myself.

My kids are still in pre-algebra, but they get way more drilling still, well over 1000 problems per semester once Zern, IReady, etc. are factored in. I believe it’s too much, but it does seem like the typical approach here in California.

com2kid · 2024-06-12T18:01:36 1718215296

I preferred doing large problem sets in math class because that is the only way I felt like I could gain an innate understanding of the math.

For example after doing several hundred logarithms, I was eventually able to do logs to 2 decimal places in my head. (Sadly I cannot do that anymore!) I imagine if I had just done a dozen or so problems I would not have gained that ability.

com2kid · 2024-06-12T04:44:17 1718167457

> This is ignoring the fact that babies are not just learning labels, they're learning the whole of language, motion planning, sensory processing, etc.

Sure, but they learn a lot of labels.

> How many homework questions did your entire calc 1 class have? I'm guessing less than 100

At least 20 to 30 a week, for about 10 weeks of class. Some weeks were more, and I remember plenty of days where we had 20 problems assigned a day.

Indeed, I am a huge fan of "the best way to learn math is to do hundreds upon hundreds of problems", because IMHO some concepts just require massive amounts of repetition.

p1esk · 2024-06-12T02:33:59 1718159639

illusion that these models are learning like children do, when in reality they are not even entirely capable of learning novel concepts

Now imagine how much would your kid learn if the only input he ever received was a sequence of words?

_flux · 2024-06-12T08:47:39 1718182059

Are you saying it's not fair for LLMs, because of the way they are taught is different?

The difference is that we don't know better methods for them, but we do know of better methods for people.

TeMPOraL · 2024-06-12T10:04:14 1718186654

I think they're saying that it's silly to claim humans learn with less data than LLMs, when humans are ingesting a continuous video, audio, olfactory and tactile data stream for 16+ hours a day, every day. It takes at least 4 years for a human children to be in any way comparable in performance to GPT-4 on any task both of them could be tested on; do people really believe GPT-4 was trained with more data than a 4 year old?

ben_w · 2024-06-12T11:19:06 1718191146

> do people really believe GPT-4 was trained with more data than a 4 year old?

I think it was; the guesstimate I've seen is GPT-4 was trained on 13e12 tokens, that over 4 years is 8.9e9/day or about 1e5/s.

Then it's a question of how many bits per token — my expectation is 100k/s is more than the number of token-equivalents we experience, even though it's much less than the bitrate even of just our ears let alone our eyes.

dwaltrip · 2024-06-12T19:10:16 1718219416

Interesting analysis, makes sense. I wonder how we should account for the “pre-built” knowledge that is transferred to a newborn genetically and from the environment at conception and during gestation. Of course things like epi-genetics also come into play.

The analogies get a little blurry here, but perhaps we can draw a distinction between information that an infant gets from their higher-level senses (e.g. sight, smell, touch, etc) versus any lower-level biological processes (genetics, epi-genetics, developmental processes, and so on).

The main point is that there is a fundamental difference: LLMs have very little prior knowledge [1] while humans contain an immense amount of information even before they begin learning through the senses.

We need to look at the billions of years of biological evolution, millions of years of cultural evolution, and the immense amounts of environmental factors, all which shape us before birth and before any “learning” occurs.

[1] The model architecture probably counts as hard-coded prior knowledge contained before the model begins training, but it is a ridiculously small amount of information compared to the complexity of living organisms.

_flux · 2024-06-12T10:45:09 1718189109

I think that's all fair that both LMMs and and people get a certain (even unbounded) amount of "pretraining" before actual tasks.

But after the training people are much more equipped to do single-shot recognition and cognitive tasks of imagery and situations they have not encountered before, e.g. identifying (from pictures) which animals is being shown, even if it is the second time of seeing that animal (the first being shown that this animal is a zebra).

So, basically, after initial training, I believe people are superior in single-shot tasks—and things are going to get much more interesting once LMMs (or something after that?) are able to do that well.

It might be that GPT-4o can actually do that task well! Someone should demo it, I don't have access. Except, of course, GPT-4o already knows what zebras look like, so something else than exactly that..

lelanthran · 2024-06-12T16:06:06 1718208366

> I think they're saying that it's silly to claim humans learn with less data than LLMs, when humans are ingesting a continuous video, audio, olfactory and tactile data stream for 16+ hours a day, every day.

Yeah, but they're seeing mostly the same thing day after day!

They aren't seeing 10k stills of 10k different dogs, then 10k stills of 10k different cats. They're seeing $FOO thousand images of the family dog and the family cat.

My (now 4.5yo) toddler did reliably tell the difference between cats and dogs the first time he went with us to the local SPCA and saw cats and dogs that were not our cats and dogs.

In effect, 2 cats and 2 dogs were all he needed to reliably distinguish between cats and dogs.

TeMPOraL · 2024-06-12T16:17:48 1718209068

> In effect, 2 cats and 2 dogs were all he needed to reliably distinguish between cats and dogs.

I assume he was also exposed to many images, photos and videos (realistic or animated) of cats and dogs in children books and toys he handled. In our case, this was a significant source of animal recognition skills of my daughters.

lelanthran · 2024-06-12T17:08:58 1718212138

> I assume he was also exposed to many images, photos and videos (realistic or animated) of cats and dogs in children books and toys he handled.

No images or photos (no books).

TV, certainly, but I consider it unlikely that animals in the animation style of pepper pig helps the classifier.

Besides which, we're still talking under a dozen cats/dogs seen till that point.

Forget about cats/dogs. Here's another example: he only had to see a burger patty once to determine that it was an altogether new type of food, different from (for example) a sausage.

Anyone who has kids will have dozens of examples where the classifier worked without a false positive off a single novel item.

p1esk · 2024-06-12T09:07:39 1718183259

So a billion years of evolutionary search plus 20 years of finetuning is a better method?

_carbyau_ · 2024-06-12T01:57:56 1718157476

Two other points - I've also forgotten a bunch, but also know I could "relearn" it faster than the first time around.

To continue your example, I know I've learned calculus and was lauded at the time. Now I could only give you the vagaries, nothing practical. However I know if I was pressed, I could learn it again in short order.

TeMPOraL · 2024-06-12T09:51:18 1718185878

> This is ignoring the fact that babies are not just learning labels, they're learning the whole of language, motion planning, sensory processing, etc.

Yes. All that learning is feeding off one another. They're learning how reality works. Every bit of new information informs everything else. It's something that LLMs demonstrate too, so it shouldn't be a surprising observation.

> Once they have the basics down concept acquisition time shrinks rapidly

Sort of, kind of.

> and kids can easily learn their new favorite animal in as little as a single example.

Under 5 they don't. Can't speak what happens later, as my oldest kid just had their 5th birthday. But below 5, all I've seen is kids being quick to remember a name, but taking quite a bit longer to actually distinguish between a new animal and similarly looking ones they already know. It takes a while to update the classifier :).

(And no, they aren't going to one-shot recognize an animal in a zoo that they saw first time on a picture hours earlier; it's a case I've seen brought up, and I maintain that even most adults will fail spectacularly at this test.)

> Compare this to LLMs which can one-shot certain tasks, but only if they have essentially already memorized enough information to know about that task. It gives the illusion that these models are learning like children do, when in reality they are not even entirely capable of learning novel concepts.

Correct, in the sense that the models don't update their weights while you use them. But that just means you have to compare them with ability of humans to one-shot tasks on the spot, "thinking on their feet", which for most tasks makes even adults look bad compared to GPT-4.

> How many homework questions did your entire calc 1 class have? I'm guessing less than 100 and (hopefully) you successfully learned differential calculus.

I don't believe someone could learn calc in 100 exercises or less. Per concept like "addition of small numbers", or "long division", or "basic derivatives", or "trivial integrals", yes. Note that in-class exercises count too; learning doesn't happen primarily by homework (mostly because few have enough time in a day to do it).

shkkmo · 2024-06-12T15:12:31 1718205151

> But that just means you have to compare them with ability of humans to one-shot tasks on the spot, "thinking on their feet", which for most tasks makes even adults look bad compared to GPT-4.

This simply is not true as stated in the article. ARC-AGI is a one-shot task test that humans reliably do much, much better on than any AI model.

> I don't believe someone could learn calc in 100 exercises or less.

I learned the basics of integration in a foreign language I barely understood by watching a couple of diagrams get drawn out and seeing far less than 100 examples or exercises.

9cb14c1ec0 · 2024-06-12T00:04:43 1718150683

> not enough people have kids.

Second that. I think I've learned as much as my children have.

> Watch a baby discover they have a right foot. Then a few days later figure out they also have a left foot.

Watching a baby's awareness grow from pretty much nothing to a fully developed ability to understand the world around is one of the most fascinating parts of being a parent.

smusamashah · 2024-06-12T06:41:38 1718174498

My kid is about 3 and has been slow on language development. He can barely speak a few short sentences now. Learning names of things and concepts made a big difference for him and that's a fascinating watch and realization.

This reminds of the story of Adam learning names, or how some languages can express a lot more in fewer words. And it makes sense that LLMs look intelligent to us.

My kid loves repeating the names of things he learned recently. For past few weeks, after learning 'spider' and 'snake' and 'dangerous' he keeps finding spiders around, no snakes so makes up snakes from curly drawn lines and tells us they are dangerous.

I think we learn fast because of stereo (3d) vision. I have no idea how these models learn and don't know if 3d vision will make multi model LLMs better and require exponentially less examples.

Tepix · 2024-06-12T09:21:17 1718184077

> I think we learn fast because of stereo (3d) vision.

I think stereo vision is not that important if you can move around and get spatial clues that way also.

smusamashah · 2024-06-12T11:37:02 1718192222

Every animal/insect I can think of has more than 1 eye. Some has lot more than 2 eyes. It has to be that important.

Nition · 2024-06-12T01:02:22 1718154142

> the kid will stop you, point at a dog in the book, and ask "dog?"

Of course for a human this can either mean "I have an idea about what a dog is, but I'm not sure whether this is one" or it can mean "Hey this is a... one of those, what's the word for it again?"

llm_trw · 2024-06-12T06:57:13 1718175433

Babies, unlike machine learning models, aren't placed in limbo when they aren't running back propagation.

Babies need few examples for complex tasks because they get constant infinitely complex examples on tasks which are used for transfer learning.

Current models take a nuclear reactors worth of power to run back prop on top of a small countries GDP worth of hardware.

They are _not_ going to generalize to AGI because we can't afford to run them.

larodi · 2024-06-12T11:14:56 1718190896

> Current models take a nuclear reactors worth of power to run back prop on top of a small countries GDP worth of hardware.

Nice one. Perhaps we are to conclude the whole transformer architecture is amazingly overblown in storage/computation costs.

AGI or not, we need better approach to what transformers are doing.

1024core · 2024-06-12T00:22:12 1718151732

> I swear, not enough people have kids.

My friends toddler, who grew up with a cat in the house, would initially call all dogs "cat". :-D

mkl · 2024-06-12T11:05:57 1718190357

My niece, 3yo, at the zoo, spent about 30 seconds trying to figure out whether a pig was a cat or a car.

resource0x · 2024-06-12T02:56:55 1718161015

I haven't seen 1000 cats in my entire life. I'm sure I learned how to tell a dog from a cat after being exposed to just a single instance of each.

lostmsu · 2024-06-12T03:33:56 1718163236

I'm sure you saw over 1B images of cats though, assuming 24 images per second from vision.

lelanthran · 2024-06-12T17:04:20 1718211860

> I'm sure you saw over 1B images of cats though, assuming 24 images per second from vision.

The AI models aren't seeing the same image 1B times.

lostmsu · 2024-06-12T22:40:23 1718232023

Neither are you, during those 10 000 hours most of the time you aren't absolutely still.

lelanthran · 2024-06-13T08:58:03 1718269083

> Neither are you, during those 10 000 hours most of the time you aren't absolutely still.

So? I'm still seeing the same object. Large models aren't trained on 10k different images of a single cat.

cess11 · 2024-06-12T15:30:50 1718206250

I have a small kid. When they first saw some jackdaws, the first bird they noticed could fly, they thought it was terribly exciting and immediately learned the word for them, and generalised it to geese, crows, gulls and magpies (plus some less common species I don't know what they're called in english), pointing at them and screaming the equivalent of 'jackda! jackda!'.

PontifexMinimus · 2024-06-13T07:29:34 1718263774

> Now, is it 10k examples? No, but I think it was on the order of hundreds, if not thousands.

If I was presented with 10 pictures of 2 species I'm unfamiliar with, about as different as cats and dogs, I expect I would be able to classify further images as either, reasonably accurately.

ein0p · 2024-06-12T07:07:07 1718176027

Not to mention that babies receive petabytes of visual input to go with other stimuli. It’s up for debate how sample efficient humans actually are in the first few years of their lives.

Tepix · 2024-06-12T09:22:40 1718184160

Hardly. Visual acuity is quite low (limited to a tiny area of the FoV), your brain is filling in all the blanks for you.

ein0p · 2024-06-12T19:10:01 1718219401

Even at that resolution (about 0.35 MP per eye for just the fovea before any processing) napkin math suggests 7.3T per day. Over 5 years you get about 13PB if my math is right, assuming 16 waking hours per day.

AuryGlenz · 2024-06-12T05:23:34 1718169814

That’s all true, yet my 2.5 year old sometimes one-shots specific information. I told my daughter that woodpeckers eat bugs out of trees after doing what you said and asking “what’s that noise?” for the fifth time in a few minutes when we heard some this spring. She brought it up again at least a week later, randomly. Developing brains are amazing.

She also saw an eagle this spring out the car window and said “an eagle! …no, it’s a bird,” so I guess she’s still working on those image classifications ;)

bamboozled · 2024-06-12T06:17:29 1718173049

I think your comment over intellectualises the way children experience the world.

My child experiences the world in a really pure way. They don’t care much about labels or colours or any other human inventions like that. He picks up his carrot, he doesn’t care about the name or the color . He just enjoys it through purely experiencing eating it. He can also find incredible flow state like joy from playing with river stones or looking at the moon.

I personally feel bad I have to each them to label things and but things in boxes. I think your child is frustrated at times because it’s a punish of a game. The departure from “the oceanic feeling.

Your comment would make sense to me if the end game of our brains and human experience is labelling things. It’s not. It’s useful but it’s not what living is about.

theptip · 2024-06-12T02:59:32 1718161172

> humans do not need 10,000 examples to tell the difference between cats and dogs

The optimization process that trained the human brain is called evolution, and it took a lot more than 10,000 examples to produce a system that can differentiate cats vs dogs.

Put differently, an LLM is pre-trained with very light priors, starting almost from scratch, whereas a human brain is pre-loaded with extremely strong priors.

PaulDavisThe1st · 2024-06-12T04:20:44 1718166044

> The optimization process that trained the human brain is called evolution, and it took a lot more than 10,000 examples to produce a system that can differentiate cats vs dogs.

Asserted without evidence. We have essentially no idea at what point living systems were capable of differentiating cats from dogs (we don't even know for sure which living systems can do this).

choeger · 2024-06-12T04:42:29 1718167349

We know for a fact that cats, dogs, and humans do.

PaulDavisThe1st · 2024-06-12T13:21:42 1718198502

Sure, but can earthworms? Butterflies? Oak trees? Slime mould? At what point in the history of life did sufficient discrimination to differentiate e.g. a cat and a dog actually arise? Are the mechanisms used for this universal? Are some better than others? etc.

ben_w · 2024-06-12T10:42:13 1718188933

As adults, not (as per this thread) genetically.

llm_trw · 2024-06-12T06:58:22 1718175502

>The optimization process that trained the human brain is called evolution

A human brain that doesn't get visual stimulus at the critical age between 0 and 3 years old will never be able to tell the difference between a cat and a dog because it will be forevermore blind.

ben_w · 2024-06-12T11:06:23 1718190383

Commonly believed, but not so: https://www.sciencedaily.com/releases/2007/02/070220021337.h...

I heard a similar case before I did my A-levels, so at least 22 years ago, where the person had cateracts removed and it took a while to learn to see, something about having to touch a statue (of a monkey?) before being able to recognise monkeys?

pants2 · 2024-06-11T23:38:21 1718149101

Humans, I would bet, could distinguish between two animals they've never seen based only on a loose or tangential description. I.e. "A dog hunts animals by tracking and chasing them long enough to exhaust their energy, but a cat is opportunistic and strikes using stealth and agility."

A human that has never seen a dog or a cat could probably determine which is which based on looking at the two animals and their adaptations. This would be an interesting test for AIs, but I'm not quite sure how one would formulate a eval for this.

taneq · 2024-06-12T01:19:01 1718155141

Only after being exposed to (at least pictures and descriptions of) dozens if not hundreds of different types of animal and their different attributes. Literal decades of training time and carefully curated curriculum learning are required for a human to perform at what we consider ‘human level’.

ryankrage77 · 2024-06-12T00:12:08 1718151128

A possible way to this idea would be to draw two aliens with different hunting strategies and do a poll of which is which. I'd try it but my drawing skills are terrible and I'm averse to using generated images.

tigerlily · 2024-06-12T10:56:06 1718189766

Seems analogous to bouba/kiki effect:

https://en.m.wikipedia.org/wiki/Bouba/kiki_effect

jules · 2024-06-11T23:01:04 1718146864

Do computers need 10,000 examples to distinguish dogs from cats when pretrained on other tasks?

curious_cat_163 · 2024-06-12T01:51:10 1718157070

VirusNewbie · 2024-06-11T23:07:16 1718147236

>: humans do not need 10,000 examples to tell the difference between cats and dogs

well, maybe. We view things in three dimensions at high fidelity: viewing a single dog or cat actually ends up being thousands of training samples, no?

amelius · 2024-06-11T23:15:00 1718147700

Yes, but we do not call a couch in a leopard print a leopard. Because we understand that the print is secondary to the function.

rolisz · 2024-06-12T01:14:07 1718154847

Hah. My toddler gladly calls her former walking aid toy a "lawn mower". Random toys become pie and cakes she brings to us to eat.

VirusNewbie · 2024-06-11T23:55:29 1718150129

I'm not sure it's as simple as you say. The first time my very young son saw a horse, he made the ASL sign for 'dog'.

He had only ever seen cats and dogs in his life previous to that.

clipsy · 2024-06-12T02:16:44 1718158604

Did he require 9,999 more examples of horses before learning the difference?

VirusNewbie · 2024-06-12T03:24:13 1718162653

In another comment I replied that 3D high fidelity images do end up being thousands of training samples, so the answer is yes.

clipsy · 2024-06-12T04:19:26 1718165966

I'm deeply skeptical that training AI on (effectively) thousands of images of one horse will perform very well at training to recognize horses in general.

jpc0 · 2024-06-12T08:21:30 1718180490

I'll double down with you on this.

Then train the AI using a binaural video of a thoroughbred and see if it can distinguish a draft horse and a quarter horse as horse...

_flux · 2024-06-12T08:51:21 1718182281

Are you suggesting that if a group of kids were given a book of zoo animals before going to the zoo, they would have difficulties identifing any new animals, because they only have seen one picture of each?

VirusNewbie · 2024-06-12T16:02:05 1718208125

I think that's an interesting question, and a possible counter to my argument.

Certainly kids learn and become better at extrapolation and need fewer and fewer samples in general as they get more life experience.

mewpmewp2 · 2024-06-12T03:50:47 1718164247

But we have a lot more sensory input and context to verify all of that.

If you kept training LLMs with all that data, it would be interesting to see what the results would be.

bbor · 2024-06-11T23:28:12 1718148492

Eh, still doesn’t hold up. I really don’t think there’s many psychologists working on the posited mechanism of simple NN-like backprop learning. Aka conditioning, I guess. As Chomsky reminds us every time we let him: human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction. We definitely employ principles and patterns that are far more complex (more “emergent”?) than linear regression.

Tho I only ever did undergrad stats, maybe ML isn’t even technically a linear regression at this point. Still, hopefully my gist is clear

VirusNewbie · 2024-06-11T23:57:01 1718150221

>human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction

This isn't accurate comparison imo, because we're mapping language to a world model which was built through a ton of trial and error.

Children aren't understanding language at six months old, there seems to be a minimum amount of experience with physics and the world before language can click for them.

ekidd · 2024-06-12T01:28:44 1718155724

> Chomsky reminds us every time we let him: human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction.

Chomsky's arguments about "poverty of the stimulus" rely on using non-probabistic grammars. Norvig discusses this here: https://norvig.com/chomsky.html

> In 1967, Gold's Theorem showed some theoretical limitations of logical deduction on formal mathematical languages. But this result has nothing to do with the task faced by learners of natural language. In any event, by 1969 we knew that probabilistic inference (over probabilistic context-free grammars) is not subject to those limitations (Horning showed that learning of PCFGs is possible).

If I recall correctly, human toddlers hear about 3-13 million spoken words per year, and the higher ranges are correlated with better performance in school. Which:

- Is a lot, in an absolute sense.

- But is still much less training data than LLMs require.

Adult learners moving between English and romance languages can get a pretty decent grasp of the language (C1 or C2 reading ability) with about 3 million words of reading. Which is obviously exploiting transfer learning and prior knowledge, because it's harder in a less related language.

So yeah, humans are impressive. But Chomsky doesn't really seem to have the theoretical toolkit to deal with probabilistic or statistical learning. And LLMs are closer to statistical learning than to Chomsky's formal models.

AIorNot · 2024-06-12T00:29:57 1718152197

There’s a great episode from Darkwish Patels podcast discussing this today

https://youtu.be/UakqL6Pj9xo?si=iDH6iSNyz1Net8j7

nphard85 · 2024-06-12T07:58:07 1718179087

Dwarkesh*

goertzen · 2024-06-11T23:36:44 1718149004

I don’t know enough of biology or genetics or evolution, but surely the millions of years of training that is hardcoded into our genes and expressed in our biology had much larger “training” runs.

allanrbo · 2024-06-12T03:48:27 1718164107

If a human eye works at say 10 fps, then 8 minutes with a cat is about 10k images :-D

captaincaveman · 2024-06-12T09:44:35 1718185475

I'd say that was more like a single instance, one interaction with a thing.

lxgr · 2024-06-12T15:34:34 1718206474

But in that single interaction, you might have seen the cat from all kinds of different angles, in various poses, doing various things, some of which are particularly not-dog-like.

I vaguely remember hearing that there's even ways to expand training data like that for neural networks, i.e. by presenting the same source image slightly rotated, partially obscured etc.

naasking · 2024-06-12T21:31:21 1718227881

One interaction that captures a multidimensional, multisensory set of perceptions. In an ML training set, say for visual recognition, this would consist at least of hundreds of images from many angles, in different poses and varied lighting.

captaincaveman · 2024-06-12T23:51:07 1718236267

I don't think its analogous, I don't think we see a cat and our brain have it frame by frame adjust our synaptic weights (or whatever brains do). The whole premise of natural brains being able to learn by static images or disjointed modalities is a very clunky reductionist engineered approach we have taken.

naasking · 2024-06-13T01:52:07 1718243527

> I don't think we see a cat and our brain have it frame by frame adjust our synaptic weights (or whatever brains do)

I think that "whatever we do" is doing a lot of heavy lifting here. Some of those "whatevers" will be isomorphic to a frame-level analysis that pulls out structural commonalities, or close enough that it's not a clunky reductionist analogy.

captaincaveman · 2024-06-13T10:16:52 1718273812

When we see what we think is a cat, what we have categorised as a cat, I don't think we are looking at it from each angle and going, cat, cat, cat. I think there is an aspect of something like the 'free-energy principle' that is required to trigger off a re-assessment. So while visually we may receive 20fps of cat images, it's mostly discarded unless there is some novelty that challenges expectation.

fennecbutt · 2024-06-12T11:14:53 1718190893

Humans don't need those examples because our brains are very pretrained. Natural fear of snakes and snakelike things, etc etc.

ML models are starting from absolute zero, single celled organism level.

woadwarrior01 · 2024-06-12T13:39:49 1718199589

> humans do not need 10,000 examples to tell the difference between cats and dogs

Neither do machines. Lookup few-shot learning with things like CLIP.

nextaccountic · 2024-06-12T06:09:14 1718172554

> humans do not need 10,000 examples to tell the difference between cats and dogs

Humans learn through a lifetime.

Or are we talking about newborn infants?

lacker · 2024-06-11T21:02:15 1718139735

I really like the idea of ARC. But to me the problems seem like they require a lot of spatial world knowledge, more than they require abstract reasoning. Shapes overlapping each other, containing each other, slicing up and reassembling pieces, denoising regular geometric shapes, you can call them "core knowledge" but to me it seems like they are more like "things that are intuitive to human visual processing".

Would an intelligent but blind human be able to solve these problems?

I'm worried that we will need more than 800 examples to solve these problems, not because the abstract reasoning is so difficult, but because the problems require spatial knowledge that we intelligent humans learn with far more than 800 training examples.

modeless · 2024-06-11T23:30:59 1718148659

> to me it seems like they are more like "things that are intuitive to human visual processing".

Yann LeCun argues that humans are not general intelligence and that such a thing doesn't really exist. Intelligence can only be measured in specific domains. To the extent that this test represents a domain where humans greatly outperform AI, it's a useful test. We need more tests like that, because AIs are acing all of our regular tests despite being obviously less capable than humans in many domains.

> the problems require spatial knowledge that we intelligent humans learn with far more than 800 training examples.

Pretraining on unlimited amounts of data is fair game. Generalizing from readily available data to the test tasks is exactly what humans are doing.

> Would an intelligent but blind human be able to solve these problems?

I'm confident that they would, given a translation of the colors to tactile sensation. Blind humans still understand spatial relationships.

HarHarVeryFunny · 2024-06-11T23:27:48 1718148468

I just did the first 5 of the "public eval set" without having looked at the "public training set", and found them easy enough. If we're defining AGI as at least human level, then the AGI should also be able to do these without seeing any more examples.

I don't think there's any rules about what knowledge/experience you build into your solution.

mewpmewp2 · 2024-06-12T03:54:49 1718164489

AGI should obviously be able to do them. But AI being able to do those 100 percent wouldn't be evidence of AGI however. It is a very narrow domain.

bubblyworld · 2024-06-12T06:00:33 1718172033

Why not? If the only thing that can solve problem X is AGI (e.g. humans), and something else comes along that solves it, then rationally that should be evidence that the something else is AGI right?

Unless you have strong prior beliefs (like "computers can't be AGI") or something else that's problem specific ("these problems can be solved by these techniques which don't count as AGI"). So I guess that's my real question.

lucianbr · 2024-06-12T06:51:45 1718175105

That makes no sense at all. Any problem is initially only solvable by humans, until some technology is developed to solve it. Calculating a logarithm was at some point only doable by humans, and then digital computers came along. This would be in your view evidence that digital computers are AGI!? As in, an 8086 with some math code is AGI. We've had it for decades now, only nobody noticed :)

bubblyworld · 2024-06-12T07:25:54 1718177154

It's just Bayes theorem - there are basically two variables that control how strong the evidence is:

* How likely you think AGI is in general.

* How solvable you think the problem is, independently of what's solving it.

In the cases you've brought up that latter probability is very high, which means that they are extremely weak evidence that computers are AGI. So we agree!

In this case the latter probability seems to be quite low - attempts to solve it with computers have largely failed so far!

lucianbr · 2024-06-12T07:47:09 1718178429

We don't agree. You're now saying anything is evidence of anything, which just makes the word "evidence" meaningless.

In real life, when people say "A is evidence of B" they mean strong evidence, or even overwhelming evidence. You just backpedalled by redefining evidence to mean anything and nothing, so you can salvage an obviously false claim.

Nobody in the real world says "rain is evidence of aliens" with the implicit assumption that it's just extremely weak evidence. The way English is used by people makes that sentence simply false, as is yours that anything previously not solved is evidence of AGI.

bubblyworld · 2024-06-12T07:48:43 1718178523

We're talking about a specific problem here - the competition in the OP. Not aliens in the rain.

educasean · 2024-06-12T06:59:41 1718175581

This flies directly in the face of technologies such as Deep Blue and AlphaGo. They excel in tiny domains previously thought to be the pinnacle of intelligence, and now they dominate humans. Are they AGI in your definition?

bubblyworld · 2024-06-12T07:27:55 1718177275

See my response to the other commenter. In these cases as well I would conclude it's very weak evidence of AGI, so I don't think we disagree.

Edit: I think maybe the disagreement here is about the nature of evidence. I think there can be evidence that something is AGI even if it isn't, in fact, AGI. You seem to believe that if there's any evidence that something is AGI, it must be AGI, I think?

educasean · 2024-06-12T07:56:03 1718178963

I personally don't find this line of rhetoric useful or relevant. Let's agree to disagree.

bubblyworld · 2024-06-12T08:04:12 1718179452

Okay, that's fair. But to be clear - this is a theorem of probability theory, not rhetoric.

nl · 2024-06-12T10:57:06 1718189826

> If the only thing that can solve problem X is AGI (e.g. humans), and something else comes along that solves it, then rationally that should be evidence that the something else is AGI right?

No.

Because there might undiscovered ways to solve these problems that no one claims is AGI.

The definition of AGI is notoriously fuzzy, but non-the-less if there was a 10 line python program (with no external dependencies or data) that could solve it then few would argue that was AGI.

So perhaps there is an algorithm that solves these puzzles 100% of the time and can be easily expressed.

So I agree that only being able to solve these problems doesn't define AGI.

bubblyworld · 2024-06-12T14:57:40 1718204260

I think I agree with you, but consider these two cases:

1. Only humans are known to have solved problem X, and we've spent no time looking for alternative solutions.

2. Only humans are known to have solved problem X, and we've spent hundreds of thousands of hours looking for alternative solutions and failed.

Now suppose something solves the problem. I feel like in case 2 we are justified in saying there's evidence that something is a human-like AGI. In case 1 we probably aren't justified in saying that.

To me this seems evident regardless of what the problem actually is! Because if it's hard enough that thousands of human hours cannot find a simple/algorithmic solution it's probably something like an "AGI-complete" problem?

nl · 2024-06-12T23:05:13 1718233513

Maybe (2). But it took ~50 years work to build systems that can beat people at poker and I don't think people argue poker bots are AGI.

To be clear, I think we have AGI (LLMs with tool use are generalized enough) and we are currently finding edge cases that they fail at.

bubblyworld · 2024-06-13T08:41:03 1718268063

That's a good point. In my head I was considering stuff like chess, where even though it took a long time to reach superhuman performance on computers, the issue was mainly compute. People basically knew how to do it algorithmically before then (pruning tree search).

I guess the underlying issue with my argument is that we really have no idea how large the search space is for finding AGI, so applying something like Bayes theorem (which is basically my argument) tells you more about my priors than reality.

That said, we know that human AGI was a result of an optimisation process (natural selection), and we have rudimentary generic optimisers these days (deep neural nets), so you could argue we've narrowed the search space a lot since the days of symbolic/tree search AI.

nl · 2024-06-13T12:00:32 1718280032

> we know that human AGI was a result of an optimisation process (natural selection)

I don't think this is obviously correct.

Three things:

1) Many actions we think of as "intelligence" are just short-cuts based on heuristics.

2) While there's probably an argument that problem solving is selected for it's not clear to me how far this goes at all. There's little evidence that smarter people end up in more powerful positions for example. Seems like there is perhaps there is a cut-off beyond which intelligence is just a side effect of the problem solving ability that is useful.

3) Perhaps humans individually aren't (very?) intelligent and it is only a society of humans that are.

(also perhaps human GI? Nothing artificial about it.)

> no idea how large the search space is for finding AGI, so applying something like Bayes theorem (which is basically my argument) tells you more about my priors than reality.

There are plenty of imaginable forms of intelligence that are often ignored during these conversations. One in common use is "an intelligent footballer" which applies to sport for someone who can read a game well. There are other, non-human examples too (Dolphins, crows, parrots etc).

And then in the world of speculative fiction there's a range of different types of intelligence. Vernor Vinge wrote about intelligences which had motivations that people couldn't comprehend (and Vinge is generally credited with the concept of the singularity). More recently Peter Watt's Blindside contemplates the separation of intelligence and sentience.

Basically I don't think your expression of Bayes' theorem had nearly enough possibilities in it.

HarHarVeryFunny · 2024-06-13T14:46:36 1718289996

> While there's probably an argument that problem solving is selected for it's not clear to me how far this goes at all. There's little evidence that smarter people end up in more powerful positions for example.

Evolution hasn't had enough time to adapt us to our new fangled lifestyle of last few hundred years, or few thousand for that matter, and anyways in the modern world people are not generally competing on things affecting survival, but rather on cultural factors that affect number of children we have.

Humans and most (all?) intelligent animals are generalists, which is why we need a big brain and intelligence - to rapidly adapt to a wide variety of ever changing circumstances. Non-generalists such as herbivores, crocodiles don't need intelligence and therefore don't have it.

The main thing that we need to survive & thrive as generalists - and what evolution has evidentially selected for - is ability to predict so that we can plan ahead and utilize past experience. Where will the food be, where will the water be in a drought, etc. I think active reasoning (not just LLM-like prediction/recall) would also play a large role in survival, and presumably parts of our brain have evolved specifically to support that, even if the CEO probably got his job based more on height/looks and golf handicap.

nl · 2024-06-14T02:54:20 1718333660

I strongly agree that the predictive and planning ability is very important - things like agriculture rely on it and must be selective at that point.

But the point has previously been made else humans developed large brains long (1.5M years?) before agriculture, and for a long time the only benefit seemed to be fire and flint tools.

It's not widely understood the causal link here - there are other species that have large brains but haven't developed these skills. So it's not clear exactly what facets of intelligence are selected for.

bubblyworld · 2024-06-13T16:18:09 1718295489

> also perhaps human GI? Nothing artificial about it.

Lol, thanks, that's quite funny. I should spend less time on the internet.

> While there's probably an argument that problem solving is selected for it's not clear to me how far this goes at all.

Yeah, I meant something much more low brow which is that _humans_, with all of our properties (including GI), are a result of natural selection. I'm not claiming GI was selected for specifically, but it certainly occurred as a side-effect either way. So we know optimisation can work.

> There are plenty of imaginable forms of intelligence that are often ignored during these conversations.

I completely agree! I wish there was more discussion on intelligence in the broad in these threads. Even if you insist on sticking to humans it's pretty clear that something like a company or a government is operating very intelligently in its own environment (business, or politics), well beyond the influence of its individual constituents.

> Basically I don't think your expression of Bayes' theorem had nearly enough possibilities in it.

Another issue with Bayes in general is that you have a fixed probability space in mind when you use it, right? I can use Bayes to optimise my beliefs against a fixed ontology, but it says nothing about how or when to update the ontology itself.

And no doubt my ontology is lacking when it comes to (A)GI...

HarHarVeryFunny · 2024-06-12T23:13:28 1718234008

> I think we have AGI

That seems a pretty extreme position!

What's your definition of AGI ?

nl · 2024-06-12T23:29:19 1718234959

> That seems a pretty extreme position!

Not really.

Jeremy Howard has said the same thing for example.

> What's your definition of AGI ?

Things that we consider intelligent when humans do them.

Basically we had all these definitions of AGI that we have surpassed (Turing test etc). Now we are finding more edge cases where we go "ahh... it can't do this so therefore it isn't intelligent".

But the issue with that is that lots of humans can't do them either.

I think the ARC challenge is valid. But I'd also point out that there are substantial numbers of people who won't be able to solve them either (blind people for example, as well as people who aren't good at puzzles). We make excuses there ("oh we can explain it to a blind person" or for many physical problems things like "Oh Stephen Hawking couldn't solve this but that is an exception") but we don't allow the same excuses for machine intelligence.

I don't think the boundary of AGI is a hard line, but if you went back 10 years and took what we had now and showed it to them I think people would be "Oh wow you have AI!".

HarHarVeryFunny · 2024-06-13T13:24:56 1718285096

OK, so where we differ is in defining AGI. To me, and I think most people, it's referring to human-level (or beyond) general intelligence. Shane Legg from DeepMind has also explicitly defined it this way, but I'm not sure where others in the industry stand.

LLMs do have a broad range of abilities, so not narrow AI, but clearly it's not general intelligence (or at least not human level), else they would not be failing or struggling on things that to us are easy - general means universal (not confined to specific types of problem), not just multi-capability.

The lack of reasoning ability, especially since it is architecturally based, seems more than a matter of patching up corner cases that aren't handled well. This shoring up of areas of weakness by increasing model size, adding targeted synthetic data and post-training is mostly just addressing static inference, much like adding more and more rules to CYC.

To make an LLM capable of reasoning it needs to go beyond a fixed N-layers of compute and support open-ended exploration, and probably replace gradient descent with a learning mechanism that can also be used at inference time. In a recent interview John Schulman (one of the OpenAI co-founders) indicated that they hoped that RL training on reasoning would improve it, but that is still going to be architecturally limited. You can learn a repertoire of reasoning templates than can be applied in gestalt fashion, but that's not the same as being able to synthesize a solution to a novel problem on the fly.

LLMs are certainly amazing, and as you say 10-years ago we would have regarded them as AI, but of course the same was true of expert systems and other techniques - we call things we don't know how to do "AI" then relabel them once we move past them to new challenges. Just as we no longer regard expert systems as AI, I doubt in 20 years we'll regard LLMs (which in some regards are also very close to expert systems) as AI, certainly not AGI. AGI will be the technology than can replace humans in many jobs, and when we get there LLMs will in hindsight look very limited.

HarHarVeryFunny · 2024-06-12T11:53:11 1718193191

Humans can do infinitely many things because we have general intelligence.

Testing whether an AI can play chess or solve Chollet's ARC problems, or some other set of narrow skills, doesn't prove generality. If you want to test for generality, then you either have to:

1) Have a huge and very broad test suite, covering as many diverse human-level skills as possible.

and/or,

2) Reductively understand what human intelligence is, and what combination of capabilities it provides, then test for all of those capabilities both individually and in combination.

As Chollet notes, a crucial part of any AGI test is solving novel problems that are not just templated versions (or shallow combinatins) of things the wanna-be AGI has been trained on, so for both of above tests this is key.

bubblyworld · 2024-06-12T15:11:24 1718205084

I suspect trying to reductively understand intelligence is a bit like trying to reductively understand biology - every level of abstraction is causally influenced by every other level of abstraction, so there just aren't simple primitives you can break everything down into.

daveguy · 2024-06-13T21:02:20 1718312540

You can get pretty far with an understanding of biochemical reaction cycles, genetic theory, and protein molecular interactions.

bubblyworld · 2024-06-15T16:01:52 1718467312

Trying to express the high-level behaviour of an organism in those reductive terms is way beyond science right now, if it's even possible at all. Like have a look at a chart of human metabolic pathways - it's absolute insanity. And those are extremely simplified already!

HarHarVeryFunny · 2024-06-12T18:39:54 1718217594

A implies B, doesn't mean than B implies A. That's a basic logical fallacy.

AGI can add 1+1 correctly, but an ability to do that is not a test for AGI.

bubblyworld · 2024-06-13T08:30:35 1718267435

This is not what I'm saying. Consider the following statement:

"Absence of evidence is evidence of absence."

Presumably you would call this a simple logical fallacy for the same reason, but a little reflection would show that in many cases such a statement is true! It depends on context, in this case your estimate of how well your search covered the possible search space.

Evidence is a continuous variable - things can be weak evidence, strong evidence... There's a whole spectrum. I just take issue with statements like "X is zero evidence of Y" because often you can do a lot better than that with the information at hand.

HarHarVeryFunny · 2024-06-13T12:09:45 1718280585

We know that computers are capable of things that humans can't - anything related to brute force computation, search and memory for example.

So, just because a human can't do something, or struggles to do it, doesn't mean that the task requires a huge IQ or generality - it may just require a lot of compute/memory, such as DeepBlue playing chess.

In the case in point of these ARC puzzles, they are easy for a human, so "absense of evidence" doesn't even apply, and it's worth noting that one could also brute force solve them by trying all applicable solution techniques (as indicated by the examples and challenge description) in combinatorial fashion, or just (as Chollet notes) generate a massive training set and train an LLM on it, and solve them via recall rather than active inference, which again proves nothing about AGI.

The point of the ARC challenge is to encourage advances in active inference (i.e. reasoning/problem solving), which is what LLMs lack. It's HOW you solve them that matters if you want to show general intelligence. Even in the realm of static inference, which is what they are built for, LLMs are really closer to DeepBlue than something intelligent - they brute force extract the training set rules using gradient descent. The interesting thing is that they have any learning ability at all (in-context learning) at inference time, but it's clearly no match for a human and they are also architecturally missing all the machinery such as working memory and looping/iteration to perform any meaningful try/fail/backtrack/try-again (while learning the whole time) active inference.

It'll be interesting to see to what extent pre-trained transformers can be combined with other components (maybe some sort of DeepBlue/AlphaGo MCTS?) to get closer towards human-level problem solving ability, but IMO it's really the wrong architecture. We need to stop using gradient descent and find a learning algorithm that can be used at inference time too.

bubblyworld · 2024-06-14T06:21:47 1718346107

I disagree that gradient descent brute force extracts the training set. That "overfitting" kind of thing has been shown to be false many times. Transformers learn predictive models of their input, beyond what their training set contains.

But in general I agree about active inference. Clearly there is something missing there.

Doing alpha-go style MCTS would be interesting but how would you approach training the policy and value net? It's not like we can take snapshots of people's thought processes as they read text in the same way you can perform arbitrary rollouts of your game engine.

HarHarVeryFunny · 2024-06-12T11:43:16 1718192596

Yes, a narrow domain, but the core capability it is testing for (explorative combination/application of learned patterns and skills) is a general one that in a meaningful AGI would be available across domains.

nickpsecurity · 2024-06-11T21:15:23 1718140523

To parent: the spatial reasoning and blind person were great counterexamples. It still might be OK despite the blind exceptions if it showed general reasoning.

To OP: I like your project goal. I think you should look at prior, reasoning engines that tried to build common sense. Cyc and OpenMind are examples. You also might find use for the list of AGI goals in Section 2 of this paper:

https://arxiv.org/pdf/2308.04445

When studying intros of brain function, I also noted many regions tie into the hippocampus which might do both sense-neutral storage of concepts and make inner models (or approximations) of external world. The former helps tie concepts together through various senses. The latter helps in planning when we are imagining possibilities to evaluate and iterate on them.

Seems like AGI should have these hippocampus-like traits and those in the Cyc paper. One could test if an architecture could do such things in theory or on a small scale. It shouldn’t tie into just one type of sensory input either. At least two with the ability to act on what only exists in one or what is in both.

Edit: Children also have an enormous amount of unsupervised training on visual and spatial data. They get reinforcement through play and supervised training by parents. A realistic benchmark might similarly require GB of prettaining.

HarHarVeryFunny · 2024-06-11T23:31:12 1718148672

CYC was an expert system, which is arguably what LLMs are.

A similar vintage GOFAI project that might do better on these, with a suitable visual front end, is SOAR - a general purpose problem solver.

nickpsecurity · 2024-06-12T12:43:05 1718196185

LLM’s aren't expert systems. A hallmark of expert systems is they encoded human-readable, human-checked knowledge with explainable reasoning. It was usually done as if-then rules. Others with logic programming. Forward and backward chaining for rules. Usually had specialist knowledge for one, use case.

LLM’s are unsupervised, use probabilities with unpredictable results, and don’t explain every step of their thinking. They’re the opposite.

You might argue Cyc was. It was also more complex than any expert system I had ever seen. We just called stuff like that a reasoning engine or just Cyc to avoid confusion.

HarHarVeryFunny · 2024-06-12T12:58:06 1718197086

An expert system is just a system based on repeated application of declarative rules. CYC was certainly an expert system - the ultimate scaling experiment of expert systems. I believe CYC also had a variety of inference/reasoning engines in addition to it's set of rules.

The rules (some prefer to call it a world model) in an LLM are deduced, via gradient descent, from the training samples, but are still there. The transformations effected by each layer of a transformer are exactly those it has learnt - the rules it is applying.

As with CYC people seem to be hoping that some external scaffolding (better inference engine(s)) will rescue LLMs from just being a set of rules to something more general and capable, but I tend to agree with Chollet that this active inference (reasoning) is actually the hard part.

andoando · 2024-06-12T02:16:39 1718158599

I would argue that spatial reasoning encompasses all reasoning. All the things you mentioned have a direct analogue to abstract models and logic we employ and are engrained deeply into language. For example, shapes containing eachother:

There are two countries both which lay claim to the same territory. There is a set X that contains Y and there is a set Z that contains Y. In the case that the common overlap is 3D and one in on top of the other, we can extend this to there is a set X that contains -Y and a set Z that contains Y, and just as you can only see one on top and not both depending on where you stand, we can apply the same property here and say set X and Z cannot both exist, and therefore if set X is on then -Y and if set Z then Y.

If you pay attention to the language you use youll start to realize how much of it uses spatial relationships to describe completely abstract things. For example, one can speak of disintigrating hegonomic economies. i.e turning things built on top of eachother into nothing, to where it came

We are after all, reasoning about things which happen in time and space.

And spatial != visual. Even if you were blind youd have to reason spatially, because again any set of facts are facts in space-time. What does it take to understand history? People in space, living at various distances from each other, producing goods from various locations of the earth using physical processes, and physically exchanging them. To understand battles you have to understand how armies are arranged physically, how moving supplies works, weather conditions, how weapons and their physical forms affect what they can physically do, etc.

Hell LLMs, the largest advancement we had in artificial intelligence do what exactly? Encode tokens into multi dimensional space.

parentheses · 2024-06-12T04:47:01 1718167621

Spatial reasoning is easily isomorphic to many kinds of reasoning - just not all of them. Spatial reasoning in this case also limits the AI to 2 dimensions. I concede that with more dimensions, there will be more isomorphisms.

Is there a number of dimensions that captures all reasoning? I don't know..

dimask · 2024-06-12T08:11:27 1718179887

Claims of isomorphisms are really strong claims to not be backed up with some kind of evidence.

andoando · 2024-06-12T13:38:00 1718199480

I think the reasoning is very simple. Everything that happens happens in space through time. Intelligent systems must solve problems where they observe what's happening in space over some amount of time, and then predict whats going to happen to space over some other amount of time.

CooCooCaCha · 2024-06-11T21:53:00 1718142780

“Would an intelligent but blind human be able to solve these problems?”

This is the wrong way to think about it IMO. Spatial relationships are just another type of logical relationship and we should expect AGI to be able to analyze relationships and generate algorithms on the fly to solve problems.

Just because humans can be biased in various ways doesn’t mean these biases are inherent to all intelligences.

crazygringo · 2024-06-11T22:28:27 1718144907

> Spatial relationships are just another type of logical relationship and we should expect AGI to be able to analyze relationships and generate algorithms on the fly to solve problems.

Not really. By that reasoning, 5-dimensional spatial reasoning is "just another type of logical relationship" and yet humans mostly can't do that at all.

It's clear that we have incredibly specialized capabilities for dealing with two- and three-dimensional spatiality that don't have much of anything to do with general logical intelligence at all.

CooCooCaCha · 2024-06-12T00:13:51 1718151231

Yes really. Problem solving on the fly doesn't mean the algorithm can instantly learn anything. Reality is HEAVILY biased towards two and three spatial dimensions so our brains have hours and hours of training on that dataset. But, with time, humans can learn to be good at all sorts of things.

It's important that we try to think from the perspective of an algorithm, not a human. And it's also important that we don't jump to extremes.

It seems like you interpreted "solving problems on the fly" to mean "instantly being an expert on a completely different and novel domain". What it does mean is flexibility, resilience to novel situations, and being able to adapt over time.

andoando · 2024-06-12T02:37:59 1718159879

Literally every single thing you reason about is something happening in space-time.

lucianbr · 2024-06-12T06:56:38 1718175398

Where exactly in space-time are complex numbers? Could you point me to 2+i for example?

How about some aliens in a SF book. When we reason about them, where are they exactly? Literally on the pages of the book?

How about a context-free grammar?

andoando · 2024-06-12T13:02:38 1718197358

Complex numbers are just 2d numbers and they are mapped on a 2d plane, yeah. They are just a 2d vector. Callling them imaginary numbers is silly in the first place, 2+i is just the vector (2,1), all we mean here are the two numbers are orthogonal, i.e they are destinguished by some independent factor. The imaginary component is no more imaginary than the real component.

I mean what problems does physics solve not with just complex number but with even more complex vectors? Problems of...space-time.

Aliens in a SF book. What do you imagine? I see some kind of physical entity having geometric compomnents in some kind of space.

Context free grammers are represented by...trees where one side of a spatial relationship maps to one idea and the other to another. What is context? Things surrounding something, where something is.

Come up with any idea, it can be represented in space and time.

janalsncm · 2024-06-11T22:06:58 1718143618

Part of the concern might be that visual reasoning problems are overrepresented in ARC in the space of all abstract reasoning problems.

It’s similar to how chess problems are technically reasoning problems but they are not representative of general reasoning.

CooCooCaCha · 2024-06-12T00:06:21 1718150781

ARC is meant to test fundamental algorithms. It's entirely ok to train a model specifically for this task. Part of the beauty of ARC is that it's resistant to memorization.

dimask · 2024-06-12T08:10:06 1718179806

> Would an intelligent but blind human be able to solve these problems?

Blind people can have spatial reasoning just fine. Visual =/= spatial [0]. Now, one would have to adapt the colour-based tasks to something that would be more meaningful for a blind person, I guess.

[0] https://hal.science/hal-03373840/document

Lerc · 2024-06-11T21:20:03 1718140803

I don't think the intent is to learn the entire problem domain from the examples, but the specific rule that is being applied.

There may (almost certainly will be) additional knowledge encoded in the solver to cover the spacial concepts etc. The distinction with the AGI-ARC test is the disparity between human and AI performance, and that it focuses on puzzles that are easier for humans.

It would be interesting to see a finetuned LLM just try and express the rule for each puzzle as english. It could have full knowledge of what ARC-AGI is and how the tests operate, but the proof of the pudding is simply how it does on the test set.

lynx23 · 2024-06-12T10:48:22 1718189302

If a blind individual can solve a visually oriented challenge is not really a question of their intelligence but more a question of accessibility/translation. Just because I cant see something myself doesnt really say anything about my ability to deal with abstractions.

pmayrgundter · 2024-06-11T21:49:28 1718142568

This claim that these tests are easy for humans seems dubious, and so I went looking a bit. Melanie Mitchell chimed in on Chollet's thread and posted their related test [ConceptARC].

In it they question the ease of Chollet's tests: "One limitation on ARC’s usefulness for AI research is that it might be too challenging. Many of the tasks in Chollet’s corpus are difficult even for humans, and the corpus as a whole might be sufficiently difficult for machines that it does not reveal real progress on machine acquisition of core knowledge."

ConceptARC is designed to be easier, but then also has to filter ~15% of its own test takers for "[failing] at solving two or more minimal tasks... or they provided empty or nonsensical explanations for their solutions"

After this filtering, ConceptARC finds another 10-15% failure rate amongst humans on the main corpus questions, so they're seeing maybe 25-30% unable to solve these simpler questions meant to test for "AGI".

ConceptARC's main results show CG4 scoring well below the filtered humans, which would agree with a [Mensa] test result that its IQ=85.

Chollet and Mitchell could instead stratify their human groups to estimate IQ then compare with the Mensa measures and see if e.g. Claude3@IQ=100 compares with their ARC scores for their average human

[ConceptArc]https://arxiv.org/pdf/2305.07141 [Mensa]https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-10...

mikeknoop · 2024-06-11T22:53:16 1718146396

Here is some published research on the human difficulty of ARC-AGI: https://cims.nyu.edu/~brenden/papers/JohnsonEtAl2021CogSci.p...

> We found that humans were able to infer the underlying program and generate the correct test output for a novel test input example, with an average of 84% of tasks solved per participant

kenjackson · 2024-06-11T23:30:20 1718148620

I just tried the first puzzle and I can't get it right. I think my solution makes logical sense and I explain why the patterns are consistent with the input, but it says its wrong. I'm either a lot dumber than I thought or they need to do a better job of vetting their tests.

mikeknoop · 2024-06-11T23:41:03 1718149263

(You can direct link to a task like this: https://arcprize.org/play?task=009d5c81 in case you want to share!)

saati · 2024-06-11T23:41:37 1718149297

It's pretty easy, just follow the second example with the colors from the test input. (if it's the same puzzle 00576224 for you too)

kenjackson · 2024-06-11T23:51:37 1718149897

https://arcprize.org/play?task=00576224 Yes the same puzzle.

And I followed the second example. This was my solution:

GRG

OBO

RGR

B is the cyan like blue color. My solution looks right, but it says it’s wrong.

halter73 · 2024-06-12T00:12:03 1718151123

You need to resize the output grid to 6x6.

kenjackson · 2024-06-12T03:21:13 1718162473

How come? The pattern should work for any size grid.

bigyikes · 2024-06-12T03:28:30 1718162910

You might be technically correct, but if you extend that logic, why not just make the grid 1x1 and select a single color?

The grid size is part of the pattern in the same way that the colors are part of the pattern. It’s not just a color pattern, it’s a generalized mapping of input to output.

In short: you need to resize the grid because that’s what the examples do.

johndough · 2024-06-12T05:13:38 1718169218

> why not just make the grid 1x1 and select a single color?

For two reasons:

1. The initially suggested grid size was 3x3.

2. Filling in a 3x3 grid is sufficient to show that you understood the pattern, but filling in a 1x1 (or even 2x2) grid is insufficient.

Requiring the user fill in a larger grid is a waste of time. The existence of the grid size selector would still make sense in cases where a 2x2 grid would be sufficient to show the solution, so it is not obvious at all that a 6x6 grid should be chosen.

> The grid size is part of the pattern in the same way that the colors are part of the pattern.

To understand a pattern, you have to see at least two valid inputs and corresponding outputs. For the first example, a valid example for the expected output grid size is missing.

I arrived at the "correct" conclusion eventually, but the only indicator was that the reading direction for the UI was absolutely ridiculous ( https://i.imgur.com/CuQ2z2N.png ), suggesting that the authors did not think this through properly, so the solution had to be weird as well.

lucianbr · 2024-06-12T07:18:43 1718176723

The fact that two intelligent beings are debating what the correct answer is shows that there is no fixed correct answer that proves "intelligence".

This is IQ tests all over again. Actually testing how alike you think to the author of the test.

sashank_1509 · 2024-06-13T03:12:34 1718248354

Honestly I’d disagree. I was a bit confused at first but moment I realized I could resize the grid, the answer strikes me as obvious and clear. Yes, in some theoretic sense you can argue a 3 x 3 grid answer is fine, but shows this to 100 different humans and majority would agree that resizing the grid is the obvious and more natural solution.

lucianbr · 2024-06-14T08:02:18 1718352138

So anyone who disagrees, including you for a short time, is actually not an intelligent being?

Not to mention that ignoring the size of the grid, one might disagree about the answer of one of the tests.

wantsanagent · 2024-06-12T15:52:53 1718207573

Twist: bigyikes is an LLM!

lucianbr · 2024-06-12T07:16:24 1718176584

What is even the meaning of "correct" in this case?

This makes me think of "math" problems requiring you to find the next number in a series. They give you 5 numbers, and ask for the 6th. When I can build a polynomial than can generate the first 5 and any 6th number. Any.

Sounds like the point of these exercises it to guess what the author had in mind, more than some universal intelligence test. Though of course the author thinks their own thoughts are the measure of universal intelligence. It's a tempting thing to believe.

Tepix · 2024-06-12T17:22:01 1718212921

Yeah into the same problem but managed to find the solution by resizing it.

Foolandmore · 2024-06-12T10:24:25 1718187865

The pattern changes in the middle as well. so you'd need to show on the full size

salamo · 2024-06-11T22:55:28 1718146528

They claim that the average score for humans is between 85% and 100%, so I think there's a disagreement on whether the test is actually too hard. Taking them at their word, if no existing model can score even half what the average human can, the test is certainly measuring some kind of significant difference.

I guess there might be a disagreement of whether the problems in ARC are a representative sample of all of the possible abstract programs which could be synthesized, but then again most LLMs are also trained on human data.

gkbrk · 2024-06-12T07:20:30 1718176830

The tasks are very easy for humans. Out of the 6 tasks assigned when I opened the web page, I got all of them correct on the first try.

Maybe if you run into some exceptionally difficult tasks it might not be 100%, but there's no way the challenge can be called unfair because it's too difficult for humans too.

mark_l_watson · 2024-06-11T22:55:14 1718146514

I saw Melanie’s post and I am intrigued by an easier AGI suite. I would like some experimenting done by individuals like myself snd smaller organizations.

bbor · 2024-06-11T23:34:34 1718148874

Are you working on (a book detailing) AGI also? It’s a lonely field but I have no doubt there are a sea of malcontent engineers across the world who saw the truth early on and are pushing solo for AGI. It’s going well for me, but I’m not sure whether to take that as “you’re great” or “it’s really that easy”, so was interested to see such a fellow brazen American on HN of all places.

Game on for the million, if so :). If not, apologies for distracting from the good fight for OSS/noncorp devs!

E: it occurred to me on the drive home how easily we (engineers) can fall into competitiveness, even when we’ve all read the thinkpieces about why an AI Race would/will be/is incredibly dangerous. Maybe not “game on”, perhaps… “god I hope it’s impossible but best of luck anyway to both of us”?

neoneye2 · 2024-06-12T08:56:48 1718182608

Melanie is coauthor/supervisor of ConceptARC, that can be tried here: https://neoneye.github.io/arc/?dataset=ConceptARC

PaulDavisThe1st · 2024-06-12T04:27:19 1718166439

You actually think that has not been going for 30, 40 or 50 years?

paxys · 2024-06-11T21:36:43 1718141803

While I agree with the spirit of the competition, a $1M prize seems a little too low considering tens of billions of dollars have already been invested in the race to AGI, and we will see many times that put into the space in the coming years. The impact of AGI will be measured in trillions at minimum. So what you are ultimately rewarding isn't AGI research but fine tuning the newest public LLM release to best meet the parameters of the test.

I'd also urge you to use a different platform for communicating with the public because x.com links are now inaccessible without creating an account.