Hacker News new | past | comments | ask | show | jobs | submit login
How far are we from intelligent visual deductive reasoning? (arxiv.org)
124 points by belter 10 months ago | hide | past | favorite | 118 comments



The Georgia Tech Knowledge Based AI course involved building a program to answer ravens progressive matrix questions. The course was offered in the online MS program, so thousands of students have taken the course. The most impressive result I saw was one student who got nearly perfect results in about 25 lines of Python code.

This may be a case where humans do well on the test, but you can do very well on the test without doing anything the way a human would. The fact that GPTs aren’t very good at the test isn’t probably evidence that they’re not really very smart, but it doesn’t really mean that if we fix them to do very well on the test that they’ve gotten any smarter.


How many libraries were included in these 25 lines of code?


Obviously “Python” was used…and then it used numpy because the image format in the assignment was numpy arrays. However, the 25 lines was basically “sum the rows and sum the columns then compare those vectors” or something like that. This wasn’t really a case of all the complexity being hidden in a dependency; it was a case of finding a very simple heuristic that made the problem trivial.


At least one, obviously, and a rather large one at that.

The point of the comment you replied to is that conventional CV software can recognize the patterns in tests like Ravens Progressive Matrices just fine, and a simple logic tree can then solve them, while the LLM approach is still struggling to get the same result.

This a commonplace shortcoming of the current generation of LLMs: ironically, they often fail at tasks which computers can do perfectly using conventional software.


There are an infinite number of algorithms to compute A from Q, given a se tof (Q, A). Almost none, surely, are intelligent.

These proxy measure of intelligence are just arguments from ignorance, "I don't know how the machine computed A from Q, therefore...".

But of course some of us do know how the machine did it; we can quite easily describe the algorithm. It just turns out no one wants to because its really dumb.

Esp. if the alg is, as in all ML, "start with billions/trillions of data points in the (Q, A) space; generate a compressed representation ZipQA; and for novel Q' find decompressed A located close to Q similar to Q'"

There are no theories of intelligence which would label that intelligence.

And let me say, most such "theories" are ad-hoc PR that are rigged to make whatever the latest gizmo "intelligent".

Any plausible theory begins form the initial intuition, "intelligence is what you do when you dont know what you're doing".


What is more special about how human intelligence works? In the end we are all particles and it all could be trillions of data points very simplistically interacting with each other resulting in emergent behaviour and complex, intelligent results.

We know how common molecules can interact with each other. Does this mean that anything built on top of them is not intelligent?

Everything in life could just be "statistics".


No argument with the particles/neurons/matter approach to the subject. It is sound and if you look at us compositionally there is nothing magic about whats going on. There is, though, something about intuition or instinctual behavior which can constantly recombine/reapply itself to a task at hand. I know many will balk at intuition, and maybe its only at the very best a heuristic, but i think we need to at least unravel what it is and how it operates before we can understand what makes something classify as human-like intelligence. Is it merely executing a process which we can put our minds into with practice, or is it demonstrating something more general, higher-level.


Well look, compared to the electrified bits of sand in my laptop i'd strongly defend pregnancy as something vastly more "magical" if those are the terms we must use.

People who thing organic adaption, sensory-motor adaption, somatosensory representation building... ie.., all those things which ooze-and-grow so that a paino player can play, or we can here type... are these magic?

Well I think it's exactly the opposite. It's a very anti-intellectual nihilism that all that need be known about the world is the electromagnetic properties of silicon-based transitors.

Those who use the word "magic" in this debate are really like atheists about the moon. It all sounds very smart to deny the moon exists, but in the end, it's actually just a lack of knowledge dressed up as enlightened cynicism.

There are more things to discover in a single cell of our body that we have ever known; and may ever know. All the theories of science needed to explain its operation would exhaust every page we have ever printed. We know a fraction of what we need to know.

And each bit of that fraction reveals an entire universe of "magical" processes unreplicated by copper wires or silicon switches.


You make good points. I think it's a typical trait of the way computer scientists and programmers tend to think. Computer science has made great strides over the decades through abstraction, as well as distillation of complex systems into simpler properties that can easily be computed.

As a result of the combination of this method of thinking and the Dunning-Kruger effect, people in our field tend to apply this to the entire world, even where it doesn't fit very well, like biology, geopolitics, sociology, psychology, etc.

You see a lot of this on HN. People who seem to think they've figured out some very deep truth about another field that can be explained in one hand-waving paragraph, when really there are lots of important details they're ignoring that make their ideas trivially wrong.

Economists have a similar thing going on, I feel. Though I'm not an economist.


As an aside both my parents are prominent economists, I myself have a degree in economics, and I have spent much of my life with a birds eye view of the economics profession and I can emphatically confirm that your feeling is correct.


Economics is zoology presented in the language of physics. Economists are monkeys who've broken into the uniform closet and are now dressed as zookeepers.

I aspire, at best, to be one of the children outside the zoo laughing. I fear I might be the monkey who stole the key...


Remember always, computer science is just discrete mathematics with some automatic whiteboards. It is not science.

And that's the heart of the problem. The CSci crowd have a somewhat well-motivated inclination to treat abstractions as real objects of study; but have been severely misdirected by learning statistics without the scientific method.

This has created a monster: the abstract objects of study are just the associations statistics makes available.

You mix those two together and you have flat-out pseudoscience.


Not sure I agree in this regard. We are after all, aiming to create a mental model which describes reproducible steps for creating general intelligence. That is, the product is ultimately going to be some set of abstractions or another.

I am not sure what more scientific method you could propose. And we can, in this field produce actual reproducible experiments. Really, more so than any other field.


There's nothing to replicate. ML models are associative statistical models of historical data.

There are no experimental conditions, no causal properties, no modelled causal mechanisms, no theories at all. "Replication" means that you can reproduce an experiment designed to validate a causal hypothesis.

Fitting a function to data isnt an experiment, it's just a way of compressing the data into a more efficient representation. That's all ML is. There are no explanations here (of the data) to assess.


I don’t think that’s true either.

Take the research into Loras for example. Surely the basic scientific method was followed when developing it. You can see that from the paper.

Obviously the results can be reproduced. Unlike in many other fields, reproducibility can be pretty trivial in CS.

Training a model isn’t really a science, but the work gone into creating the models surely is.


CS isnt science, it's discrete mathematics


All sciences are progressively more impure (eg. Applied) forms of math.


lol


Also there’s literally a causal relationship between model topology and quality of output.

This can be plainly seen when trying to get a model to replicate its input.

Some models perform better in fewer steps, some perform worse for many steps, then suddenly much better.

How is discovering these properties of statistical models NOT science?


I do think there's an empirical study of ML models and that could be a science. Its output could include things like,

"the reason prompt Q generates A1..An is because documents D1..Dn were in the training data; these documents were created by people P1..Pn for reasons R1..Rn. The answer A1..An related to D1..Dn in so-and-so way. The quality of the answers is Q1..Qn, and derives from the properties of the documents generated by people with beliefs/knowledge/etc. K1..Kn"

This explains how the distribution of the weights produces useful output by giving the causal process that leads to training data distributions.

The relationship between the weights and the training data itself is *not* causal.

Eg., X = 0,1,2,3; Y = A,A,B,B; f(x; w) = A if x <= w else B

w = 1 because the rule x <= 1 partitions Y st. P(x|w) is maximised. These are statistical and logical relationships ("partitions", "maximises").

A causal relationship is between a causal property of an object (extended in space and time) to another causal property by a physical mechanism that reliably and necessarily brings about some effect.

So, "the heat of the boiling water cooked the carrot because heat is... the energetic motion of molecules ... and cooking is .... and so heating brings about cooking necessarily because..."

heating, water, cooking, carrot, motion, molecules, etc.. -- their relationships here are not abstract; they are concretely in space and time, causally effecting each other, etc. etc.


So what do you call the process of discovering those causal properties?

Was physics not actually a science until we uncovered quarks, since we weren’t sure what caused the differences in subatomic particles? (I’m not a physicist, but I hope that illustrates my point)

Keep in mind most ML papers on arxiv are just describing phenomena we find with these large statistical models. Also there’s more to CS than ML.


You're conflating the need to use physical devices to find relationships, with the character of those relationships.

I need to use my hand, a pen and paper to draw a mathematical formula. That formula (say, 2+2=4) expresses no causal relationships.

The whole field of computer science is largely concerned with abstract (typically logical) relationships between mathematics objects; or in the case of ML, statistical ones.

Computer science has no scientific methodology for producing scientific explanations -- it isnt science. It is science in the old german sense of just "a systematic study".

Scientists conduct experiments in which they hold fixed some causal variables (ie., causally efficiacious physical properties), and vary others, according to an explanatory framework. They do this in order to explore the space of possible explanations.

I can think of no case in the whole field of csci in which there are cases where causal variables are held fixed; since there is no study of them. Computer science does not study even voltage, or silicon, or anything as physical objects with causal properties (that is electrical egnineering, physics, etc.).

Computer science ought just be called "applied discrete mathematics"


I see where you’re coming from, but I think there’s more to it than that, specifically with non determinism.

So if I observe some phenomena in a bit of software that was built to translate language, say the ability to summarize text.

Then I dig into that software and decide to change a specific portion of it, keeping same all other aspects of the software and its runtime, then I notice it’s no longer able to summarize text.

In that case I’ve discovered a causal relationship between the portion I changed and the phenomenon of text summarization. Even though the program was constructed, there are unknown aspects.

How is that not the process of science?

Sorry if this is just my question from earlier, rephrased, but I still don’t see how this isn’t a scientific method.


Intuition is a process of slight blind guesses in a system that was built/proposed by a similar organism, in a way that resembles previous systems. Once you get into things like advanced physics, advanced biology, etc, intuition evaporates. Remember these SR/GR things? How intuitive were these? I’d say the current AI is pure intuition in this Q'-ZipQA-A' sense, cause all it does is blind guessing the descent path.


intuition is a form of pattern matching without reasoning, so kinda like LLM


Its wild how this alwayyysss is the argument. Its just "oh so you think humans are special!" >:| and a gigantic "what-if"

its a purely emotional provocation and a universe sized leap, not an argument for LLM's having intelligence or sentience. Anything could be anything, wowww! This goes back to what the other person was saying, "I cannot reason about what is going on behind the curtain, therefore..."


Not arguing LLMs have sentience, but more so whether something that could be considered as "simplistic" as "statistics" could yield in a more complex result.

Whether LLMs have intelligence depends on your definition of intelligence.

Could connections of artificial neurons arranged in certain way as a result of training on data yield in human level intelligence?


Always remember that ML is starting from training data, ie., from a very very very large number of (Prompt,Answer) pairs.

Remember also, that companies like OpenAI are tracking what prompts fail and adding it to their datasets. So their initial data is ever-more just a record of questions-and-answers.

Given this, we should expect that the vast majority of questions we have, of ChatGPT will be answered very very well.

What has this to do with intelligence? Nothing at all.

Intelligence is not answering questions correctly. It's what you use when you dont have the answers and arent even clear on the question.


This is not how ML works generally nor LLMs like ChatGPT specifically.

What you're describing sounds like RLHF, which changes the style of responses and impacts things like refusals but does not add to a model's intelligence (in fact it reduces model intelligence).

An LLM's intelligence comes from pretraining in which there are no prompts, or answers, only corpus and perplexity.


You can map the weights to an uncompressed (Q,A) space, and then map them back again to weight-space -- all without loss of information. The actual domain space they were compressed from is irrelevant; their values are equivalent.


Q & A are not part of pretraining.


All knowledge/predictions are encoded as a chain of probabilities that something is true, otherwise, what else is it? My brain calculates 0.8 * 0.3 * 0.5 * 0.6 in order to make a 3-pointer, but Michael Jordan's brain ... well his mitochondria does a triple back flip and inverts the voltage into a tachyon particle.


Particles interacting (causally) though a physical mechanism that gives rise to say "wholes" with novel causal properties is not a statistical process. So your premise contradicts your conclusion.

Statistics is an analysis of association, not of causation. The frequency of (Q, A) pairs follows a distribution that is not constrained, or caused by, or explained by, how Q and A are actually related.

For example, recently there was some scandal at microsoft that if you used "pro choice" in prompts you got "demonic cartoons". Why? Presumably because "pro choice" are symbols that accompany such political cartoons in the data set.

So does Q = "pro choice", and A = "cartoons of hell" occur at notable frequency because hell has caused anything? Or because there's a unique semantic mechanism where by "pro choice" means "hell" and so on.

NO.

It is absolutely insane to suggest that we have rigged all our text output so as to align one set of symbols (Q) alongside another (A) such that, Q is the necessary explanation of A. I doubt this is even possible, since most Qs dont have unique As -- so there is actually *no function* to approximate.

In any case, your whole comment is an argument from ignorance as I complained in mine. What you don't know about life, about machines, about intelligence justifies no conclusions at all (esp., "everything in life could be").

And let's be clear. Lots of people do know the answers to your questions, they arent hard to answer. It's just not in any ad companies interest to lead their description of these systems by presenting good-faith research.

Everything printed in the media today is just a game of stock manipulation using the "prognosticator loopwhole" whereby the CEO of nvidia can "prophesy the future" in which his hardware is "of course" essential -- without being held to account for his statements. So when that stock hits its ATH and crashes, no one can sue.

I think we should change this; remove this loophole and suddenly tech boards and propagandists will be much much more reserved.


What could be "statistics" is our intelligence learning from past events, either by natural selection in the scope of generations or our brains during our lifetime. If certain A outcome has occured enough times for Q input, it has resulted in such a structure that is best given the resources available to reach that.


Suppose you touch a fireplace once, do you touch it again? No.

OK, here's something much stranger. Suppose you see your friend touch the fireplace, he recoils in pain. Do you touch it? No.

Hmm... whence statistics? There is no frequency association here, in either case. And in the second, even no experience of the fireplace.

The entire history of science is supposed to be about the failure of statistics to produce explanations. It is a great sin that we have allowed pseudosciences to flourish in which this lesson isnt even understood; and worse, to allow statistical showmen with their magic lanterns to preach on the scientific method. To a point where it seems, almost, science as an ideal has been completely lost.

The entire point was to throw away entirely our reliance on frequency and association -- this is ancient superstition. And instead, to explain the world by necessary mechanisms born of causal properties which interact in complex ways that can never uniquely reveal themselves by direct measurement.


> The entire point was to throw away entirely our reliance on frequency and association -- this is ancient superstition. And instead, to explain the world by necessary mechanisms born of causal properties which interact in complex ways that can never uniquely reveal themselves by direct measurement.

Who said that? You make it sound like this was some important trend in the past, that got derailed by the evil statisticians (spoiler: there never was such a trend that was big enough to have momentum).

Your rant against statistics is all nice and dandy, but when you have to translate that website from a foreign language into English automatically, when you ask ChatGPT to generate you some code for a project you work on, or when you are glad that Google Maps predicted your arrival time at the office correctly, you rely on the statistics you vilify in essential ways. You basically are a customer of statistics every day (unless you live under a rock, which I don't think you do).

Statistics is good because it works, and it works well in 90% cases which is enough. What you advocate for so zealously (whatever such a causally validated theory would be) currently doesn't.


Well if you want something like the actual history... we have francis bacon getting us close to an abductive (ie., explanatory) method, decartes helped a bit -- then a great catastrophe befell science called Hume.

Since Hume it become popular to somehow rig measurement to make it necessarily informative (Kant), or to claim that measurement has no necessarily informative relation to reality at all (in the end, Russell, Ayer et al.).

It took a while to dig out of that nightmarish hole that philosophers largely created, back into the cold light of experimental reality.

It wasnt statisticians who made the mess; it was first philosophers, and today, people learning statistical methods without learning statistics at all.

Thankfully philosophers started actually reading science, and then they realised they'd go it all wrong. So today, professional research philosophy is allied against the forces of nonsense.

As far as the success of causal explanations, you owe to that everything, including the very machine which runs ChatGPT. That we can make little trinkets on association alone pales in comparison to what we have done by knowing how the world works.


I get the Chomskyan objection re. statistical machine learning, I am partial to it.

But consider these LLMs and such as extremely crude simulations of biological neural networks. They aren't just any statistics; these are biomimetic computations. Then we can in principle "do science" here. We can study skyscrapers and bridges; we can study LLMs and say some scientific things about them. That is quite different than maybe what is going on in industry, but AFAIK there are lots of academic computer scientists who are trying to do just that, bring the science back to the study of these artifacts so that we can have some theorems and proofs, etc. That is - hopefully - more sophisticated a science than trying to poke and prod at a black box and call that empirical science.


The only relationship between artificial neural networks and biology is the vague principle of an activation threshold. In all other ways, the way biological neural networks are organized are ignored. The majority of ANN characteristics are instead influenced by statistics and mathematical arguments, and simple tinkering.

For some ways in which these differ:

- real neurons are vastly complex cells which seem to perform significant non-trivial computations of their own, including memory, and can't be abstracted as a single activation function

- real neural networks include "broadcast neurons" that affect other neurons based on their geometric organization, not on direct synapse connections

- there are various neurotransmitters in the brain with different effects, not just a single type of "signal"

- several glands affect thought and computation in the brain that are not made of neurons

- real neural networks are not organized in a simple set of layers, they form much more complex graphs

And these are just talking about the structure. The way we actually operate these networks has nothing in common with how real neural networks work. Real neural networks have no separate training and inference phases: at all times, they both learn and produce actionable results. Also, the way in which we train ANNs, backpropagation with stochastic gradient descent, is entirely unrelated from how real neural networks learn (which we don't really understand in any significant amount).

So no, I don't think it makes any sense to say that ANNs are a form of biomimetic computation. They are at best as similar to brains as a nylon coat is to an animal's coat.


(P.S. Just to head off a possible diction issue - biomimetics just means taking something from nature as inspiration for doing something, it doesn't mean the reverse which is to "try to understand / emulate nature completely well". E.g. solar panels arranged on a stalk to maximize light is acceptably biomimetic and there is no issue about whether solar panels are faithful enough to model chloroplasts.)

I'm coming from the context of theoretical models of computation, of which there are only so many general ones - Turing machines, lambda calculus, finite state machines, neural networks, Petri nets, a bunch of other historical ones, ... etc. etc. Consider just two, the Turing machine model, versus the most abstract possible neural network. We know that the two are formally computationally equivalent.

Abstractly, the distinguishing feature of theoretical neural networks is that they do computations through graphs. Our brains are graphs (and graphs with spatial constraints as well as many other constraints and things). The actually-existing LLMs are graphs.

Consider, C++ code is not only better modeled by the not-graph Turing machine model, it is also easily an instance of a Turing machine. These man-made computers are instances as well as modeled by von-Neumann architectures, which can be thought of as a real implementation of the Turing machine model of computation proper.

I think this conceptual relationship could be the same for biological brains. They are doing some kind of computable computation. They are not only best modeled by some kind of extremely sophisticated - but computable - neural network model of computation that nobody knows how to define yet (well, Yann LeCun has some powerpoint slides about that apparently). They are also an instance of that highly abstract, theoretical model. It's a consequence of the Church-Turing thesis which I generally buy (because of aforementioned equivalence, etc.): if one thinks the lambda calculus is a better model than neural network for the human brain, I'd like to see it! (It turns out there are cellular models of computation as well, called membrane models.) But that's the granularity I'm interested in.

In different words, the fact that many neural network models (rather, metamodels like "the category of all LLMs") can be bad models or rudimentary models is not a dealbreaker in my opinion, since that is analogous to focusing on implementation details. The goal of scientific research (along the neural network paradigm) would be to try sort that out further (in the form of theory and proofs, in opposition to further "statistical tinkering"). Hope that argument wasn't too handwavy.


If we define biomimetic so broadly that merely some vague inspiration from nature is enough, than I would say the Turing machine is also a biomimetic model. After all, Turing very explicitly modeled it after the activity of a mathematician working with a notebook. The read head represents the eyes of the mathematician scanning the notebook for symbols, the write head is their pencil, and the tape is the notebook itself.

Now, whether CPUs are an instance of a Turing machine or not is again quite debatable, but it's ultimately moot.

I think what matters more for deciding whether it makes sense to call a model biomimetic or not is whether it draws more than a passing inspiration from biology. Do practitioners keep referring back to biology to advance their design (not exclusively, but at least occasionally) or is it studied using other tools? Computers are obviously not biomimetic by this definition, as, beyond the original inspiration, no one has really looked at how mathematicians do their computations on paper to help build a better computer - the field evolved entirely detached from the model that inspired it.

With ANNs, admittedly, the situation is slightly murkier. The majority of advancements happen on mathematical grounds (e.g. choosing nonlinear activation functions to be able to approximate non-linear functions; various enhancements for faster or more stable floating point computations) or from broader computer science/engineering (faster GPUs, the single biggest factor in the advancement of the field).

However, there have been occasional forays back into biology, like the inspiration behind CNNs, and perhaps attention in Transformers. So, perhaps even by my own token, there is some (small) amount of biomimetic feedback in the design of ANNs.


I love that you said this.

>After all, Turing very explicitly modeled it after the activity of a mathematician working with a notebook. The read head represents the eyes of the mathematician scanning the notebook for symbols, the write head is their pencil, and the tape is the notebook itself.

My feeling on this is complete opposite to yours. To me, this is completely valid mode of discovery, and possibly even what led to the thought of the Turing machine. We are after all, interested in mimicking/reproducing the we way think. So it's perfectly sensible that one would "think about how we think" to try and come up with a model of computation.

I dont care at all about this argument of whether to call something biomemic or not. Thats just semantics. What you associate with meaning "biomemic" is subject to interpretation and one can only establish an objective criteria for it by asserting ones own mental model is the only correct one.


> My feeling on this is complete opposite to yours. To me, this is completely valid mode of discovery, and possibly even what led to the thought of the Turing machine. We are after all, interested in mimicking/reproducing the we way think. So it's perfectly sensible that one would "think about how we think" to try and come up with a model of computation.

I'm not sure if you thought I was being sarcastic, but what I was describing there is literally how Turing came up with the idea, he describes this in the paper where he introduces the concept of computable numbers [0]. I just summarized the non-math bits of his paper.

If you haven't read it, I highly recommend it, it's really easily digestible if you ignore the more mathematical parts. This particular argument appears in section 9.

[0] https://www.cs.virginia.edu/~robins/Turing_Paper_1936.pdf


> In the end we are all particles

citation needed



Where in that article does it say that the self, that human consciousness, that a soul arises purely from the arrangement of particles in our bodies?


I'm sure you weren't asking for a citation in good faith but instead making a rhetorical point, but here you go https://en.wikipedia.org/wiki/Materialism.


In the part where it says that all that there is in the universe that is perceivable at human levels of energy and scales is made out of particles in the standard model (dark matter doesn't interact in a way where it could influence us, and dark energy only has effects on extraordinarily large scales).

All measurements and all experiments ever done with matter and fields confirm that it behaves according to the laws of quantum mechanics. Those laws leave absolutely no room for a self that is not an emergent phenomenon of some kind. They also don't leave room for something like a free will that allows "you" to control "your body" by your "will", which is what I assume you might mean by a soul. That is, they clearly show that me writing this reply could have been (in principle) foretold, or at least had a calculated probability.


Citation: particle physics


>There are no theories of intelligence which would label that intelligence.

Actually, compression-is-intelligence has been a relatively popular theory for a couple decades. The Hutter prize from 2006 is based around that premise.

The idea is that compression and abstraction are fundamentally the same operation. If you had a perfect compressor that could compress the digits of pi into the 10-line algorithm that created them, in a deep sense it has understood pi and could generate as many more digits as you want.


Compression as intelligence is "popular" only with people who don't study intelligence. The number of problems with it a vastly too many to list.

Perhaps the most obvious is its a form of inductivism, as it supposes that you can build representations of the targets of the measure domain from the measurement domain itself.

This is just the same as drawing lines around patterns of stars, calling them "star signs" and then believing they are actual objects.

This has absolutely nothing to do with intelligence; and is simply a description of statistical AI. It is no coincidence that this "theory" is only even known in that community, let alone popular elsewhere.

You cannot derive theories (explanations, concepts, understanding, representations) by induction (compression) across measurements. There's lots and lots of work on this, and it is beyond reproach -- for a popular introduction see the first three chapters of David Deutch's fabric of reality.

"Compressionists" might say that representations of the target domain are "compressed" with respect to their measures insofar as they take up less space. This is quite a funnily ridiculous thing to say: of course they do. This has nothing to do with the process of compression... it is a feature only of there being an infinite number of ways to measure any given causal property.

The role "compression" here plays, insofar as it is relevant at all, is a trivial an uninformative observation that "explanations are finite, and measures infinite" -- one is not a compression of the other.


I dont follow. What are "actual" objects? Everything we conceptualize is some abstraction of data. We the see the world in trained concepts.

A "door" for example isn't a fundamental object of the universe. Its a collection of atoms or quarks or whatever you consider fundamental objects of reality, (or even a completely abstract object having some resemblance in space/time), but each part in itself means nothing. It is the collection of them in a certain configuration which we recognize as being a "door".

It is precisely the "compression" of of bunch of data points into a simpler concept.

The same goes for concepts like shapes, diagonal, alternating, etc. There are endless infinite patterns from which we learn to distinguish


Many things are extended in space and time; the number "2" is not. Nor is 010101010111010011...


Im not sure what your point here is. unless you think it's impossible to encode space/time with a mathematical framework?


When you, eg., take all the positions of stars (etc.) in the sky and compress them, you do not get newton's universal theory of gravity -- indeed it is impossible to induce this theory via compression. It is impossible to induce any theory of physics (and indeed, any explanation) via compression of measurement data.

The only sense in which physics is "compressed" wrt measurement data is just that it's conducted using finite sequences of mathematical symbols given a causal semantics.

The only "data" which can be compressed into, eg., newton's universal law requires you already knowing that law in order to collect it. Almost all measurements of the sky need to be adjusted, or ignored, by already knowing the theory.


Its impossible to induce anything through compression alone as it is reductory. We also need the ability to add and combine things.

The vary nature of the inquiry of physics is to find the simplest mental conception which explains the greatest number of physical phenomena. To improve any system or model while retaining all its qualities, it is necessary to deconstruct it into simpler bits so that it can be reconstructed into something simpler.

Indeed, one can conceive of a physics model in which every observable phenomena is a unique entity which has the properties of doing exactly what is being observed. But attempting to find common properties between phenomena, we can reduce them to simpler explanations, and this can happen in iterative steps. In the the primitive studies of physics, this was reducing the elements into "fire", "water", "steam", "earth", etc. Then we broke this model down further, by saying all elements have something in common, in that they are atoms. And we then attempted to explain commonalities by which atoms behave by breaking it down further.


I remember getting a lot of flak for saying a purely statistical framework is not going to achieve human level intelligence, but I still firmly believe that.

I also believe the path forward is research in knowledge representation, and even now when I search for it, I can barely find anything interesting happening in the field. ML has gotten so much interest and hype because it’s produced fast practical results, but I think it’s going to reach a standstill without something fundamentally new.


I tend to agree, and it’s weird but there are probably lots of actual ML practitioners that have never even heard of the neat vs scruffy debate. Naturally most that have heard of it will feel that the issue is completely resolved already in their favor. On the whole not a very open minded climate.

Credit where it’s due for the wild success of fancy stats, but we should stay interested in hybrid systems with more emphasis on logic, symbols, graphs, interactions and whatever other data structures seem rich and expressive.

Call me old school but frankly I prefer the society-of-mind flavor of system should ultimately be in charge of things like driving cars, running court proceedings, optimizing cities or whole economies. Let it use fancy stats as components and subsystems, sure.. but let it produce coherent arguments or critiques that can actually be understood and summarized and debugged.


You make a very interesting point. Human understanding and logic can be very rationally explained. A judge can for example give a very though response of exactly why they made their verdict. I think that would be an excellent benchmark for AI.

This seems rather impossible when your understanding of the world is connection of billions of messy and uncertain parameters. But perhaps this is the first step? Maybe we can take the neural nets trained by Ml and create constructions on top of it.


I think this is effectively provable from extraordinarily plausible premises.

    1. We want to infer A from Q. 
    2. Most A we dont know, or have no data for, or the data is *in the future*.
    3. Most Q we cannot conceptualise accurately 
        since we have no explanatory theory in which to phrase it or to provide measures of it. 
    4. All statistical approaches require knowing frequencies of (Q, A) pairs (by def.)
    5. In the cases where there is a unique objective frequency of (Q,A) we often cannot know it (2, 3)
    6. In most cases there is no unique objective frequency 
        (eg., there is no single animal any given photograph corresponds to, 
        nor any objective frequency of such association).
So, conclusion:

In most cases the statistical approach either necessarily fails (its about future data; its about non-objective associations; it's impossible to measure or obtain objective frequences); OR if it doesnt necessarily fail, fails in practice (it is to expensive, or otherwise impossible, to obtain the authoritative QA-frequency).

Now, of course, if your grift is generating nice cartoons or stealing cheap copy from ebooks you can convince the audience in the magical power of associating text tokens. This, of course, should be ignored when addressing the bigger methodological questions.


I do agree here!

Bit of a tangent from the thread but what have been the most valuable advances in knowledge representation in the last 20 years? Any articles you could share would be lovely!


I'm not expert and I don't know anything unfortunately. It is something I have spent countless hours walking around my room and pondering myself though for the last 3-4 years. I think I have some interesting ideas and I would love to get a PhD studying it, if I ever get enough financial independence that I don't have to worry about money.


But does the fact that we don't really know how human intelligence works actually suggest that it is categorically smarter?


That's a great argument and way of reversing the argument-from-ignorance line.

That said, I think people who argue from ignorance suppose we don't know how AI works either. Since the admen selling it tell them that.

We know exactly and precisely how AI works; we can fully explain it. What we don't know are circumstantial parts of the explanation (eg., what properties of the training data + alg gave rise to the specific weight w1=0.01).

This is like knowing why the thermometer reads 21 deg C (since the motion of the molecules in the water, etc. etc.) -- but not knowing which molecules specifically bounced off it.

This confusion about "what we dont know" allows the prophetic tech grifter class to prognosticate in their interest. Since "we dont know how AI works, then it might work such that i'll be very rich, so invest with me!"

All this said, it's a great observation.


What I find odd is the human obsession with making AI as smart as a human.

In the early periods of AI image generation, you would see AI generate images so uncanny no human could make something like that. It was genuinely unique and otherhumanly, worth of being called art.

We "fixed" that bug so we could make the world's most expensive face swap app.


Nonsense, you can still make weird/creative stuff. It seems like you're speaking about the marketing materials that are pushed out.


CS7637 was a fun course! One can get pretty far (80% accuracy) by using xor masks and comparing centroids.


Could a GPT write those 20 lines of code?


maybe if those 20 lines of code are in a private github repo somewhere


A little surprised that neither the paper nor the discussion here cite Francois Chollet's 2019 paper on defining intelligence for machines [1], which talks explicitly about visual reasoning and proposed an RPM-like visual task as a benchmark. I found it very useful for thinking about what it means for an AI to be generally intelligent.

From what I remember (having not read it in a while), his idea is basically that general intelligence is the efficiency with which you can learn, after appropriately "penalizing" any baked-in priors or knowledge beyond what a typical animal is born with.

As an example, Magnus Carlsen's chess skill is a testament to his general intelligence because he became skilled at chess in one human lifetime, starting out with only basic priors about the world like objectness, agentness, and some others I can't remember. In contrast, DeepBlue is not as intelligent, because it started with many more priors specific to chess, and AlphaZero is also not as intelligent, because it acquired its chess skill much less efficiently, requiring the equivalent of thousands of human lifetimes.

I don't remember if the paper considers whether 3.5 billion years of "pre-training" counts in this regard, though.

[1] https://arxiv.org/abs/1911.01547


Isn't that a bit of a cop-out though? They're just redefining the word "intelligence" to be something else (here, the ability to learn new things). That's fine and all, but that doesn't answer the question we actually care about, which is the absolute magnitude of the ability. It doesn't matter how "long" an entity took to learn, or how "efficient" they were in learning - at the end of the day, AlphaZero will crush Magnus.


GPT-4-Vision is a surprisingly lot better than I expected for a variety of practical tasks. OpenAI released a demo notebook with practical use cases for tagging/captioning a few weeks ago: https://cookbook.openai.com/examples/tag_caption_images_with...

It will be much better once it gets function calling/structured data support so you don't have to the formatting hacks that notebook abuses to get a specific output format.

Just don't ask it to describe a real person in the image or it gets fussy.


Gpt-4 will describe people, just don’t ask it to be judgmental.

Prompt:

Describe all objective details you can glean from the person in this photo.

Result:

The person in the photo appears to be a woman with fair skin. She has long, straight, light blonde hair, and blue eyes. Her eyebrows are well-shaped and lighter in color, matching her hair. She is wearing mascara and possibly eyeliner, which highlight her eyes, and she has a natural, possibly light pink, lipstick on. She is smiling, showing her teeth, which are white and well-aligned.

The woman is wearing what seems to be a black outfit, although only a small portion is visible. She appears to be indoors, with a nondescript background.

There are no visible tattoos, piercings (other than possibly in her ears, which are not fully visible), or distinctive marks that can be seen in the photograph.


ChatGPT is happy to describe people if it has the option to give a flattering description. Our current chatbots' basic relation to humanity is flattery.


Check out the ARC benchmark. https://lab42.global/arc/

Iirc, humans get something like 80% on the test, and the best performing AI models are at 25%


For now, LLMs are a lot better when you express this kind of task in text, preferably as a 1D arrangement. GPT-4's performance on the Abstraction and Reasoning Corpus. https://arxiv.org/abs/2305.18354

Also, this model trained to predict image tokens seemed to be able to perform Raven's Matrices type questions when expressed as an inpainting task in context. https://yutongbai.com/lvm.html


Nowadays VLMs are kind of Frankenstein models where the vision model is separated from the LLM and the communication is only one way where the vision embeddings are projected into textual embeddings and aligned; I think that this method severely limits the visual capability of the model and I think that it would be fixed by training a single decoder-only transformer on both textual and image tokens, I think that this method would allow some new incredible emergent behaviors like accurately translating images of logos to SVG, actually good ascii art, more meaningful openscad scripts and the ability to generate images with more meaningful composition and text (for now models can only generate meaningful text if it's available in the prompt, this is true even with Dalle 3 as the prompt is rewritten by ChatGPT).


To answer the question: How far from visual deductive reasoning? Consider G. Spencer Brown's Calculus of Indications. If you read his book it shows how using the mark/cross is trivial for visual deductions (if well-formed in translation).

There is also work on this from Kaufmann at University of Illinois and William Bricken.

Though this is a bit different than Raven's matrices.


> We observed that all the VLMs tend to be confident while being wrong. Interestingly, we observed that even when the entropy was high, models tried to provide a nonsensical rational, instead of acknowledging their inability to perform the task

It looks like all current models suffer from an incurable case of Dunning–Kruger effect cognitive bias.

All are at the peak of Mount Stupid.


LLMs are trained to sound confident.

But they can also only do negation through exhaustion, known unknowns, future unknowns, etc...

That is the pain of the Entscheidungsproblem.

Even in Presburger arithmetic, Natural numbers will addition and equality, which is decidable, still has a double factorial time complexity to prove. That is worse than factorial time for those who've not dealt with it.

Add in multiplication then you are undecidable.

Even if you decided to use the dag like structure of transformers, causality is very very hard.

https://arxiv.org/abs/1412.3076

LLMs only have cheap access to their model probables which aren't ground truth.

So while asking for a pizza recipe could be called out as a potential joke if add a topping that wasn't in its training set, through exhaustion, It can't know when it is wrong in the general case.

That was an intentional choice with statistical learning and why it was called PAC (probably approximately correct) learning.

That was actually a cause of a great rift with the Symbolic camp in the past.

PAC learning is practically computable in far more cases and even the people who work in automated theorem proving don't try to prove no-instances in the general case.

There are lots of useful things we can do in BPP (bounded probabilistically polynomial time) and with random walks.

But unless there are major advancements in math and logic, transformers will have limits.


How can a neural network evaluate "confidence"?

The parameters don't store any information about what inputs were seen in the training data (vs being interpolated) or how accurate the predictions were for those specific inputs.

And even if they did, the training data was usually gathered voraciously, without much preference for quality reasoning.


I don't know for sure, but here's a plausible mechanism for how:

Multiple sub-networks detect the same pattern in different ways, and confidence is the percent of those sub-networks that fire for a particular instance.

There's a ton of overlap and redundancy with so many weights, so there are lots of ways this could work


That’s good. Also maybe an architecture that runs the query through multiple times and then evaluates similarity of responses, then selects (or creates) the most-generated one, along with a confidence level of how many of the individual responses were aligned.


Actually you can get a very good proxy by looking at the probability distrobution of the "answer" tokens. The key here is you have to be able to identify the "answer" tokens.

https://arxiv.org/abs/2402.10200


Phind gives me ChatGPT answers with relatively authoritative references to works on the web that (usually!) support the answer. Could it have a post-filter to fact check against the references?

I guess that is a slight variation of the sibling (@habitue's) answer; both are checks against external material.

I wonder if best resources could be catalogued as the corpus is processed, giving a document vector space to select resources for such 'sense' checking.


IIRC confidence in video is related to predicting what happens next vs what actually happens. If the two seem to correlate to the model it would give it a higher confidence ranking, which would then be used further for self-reinforced learning.


That's not how Dunning-Kruger works. There's never a point where incorrect people were more confident than correct people.

https://en.m.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effec...


When self-driving cars were first becoming a thing, I was one of the skeptics who would say things like, "Call me when the AI can avoid situations like https://i.imgur.com/cmR2PDm.jpeg ." At the time, my thinking was that the software would have to recognize both fixed and moving objects nearby, capture their position and velocity, infer or look up their inertia tensors, and run at least a basic dynamics simulation to anticipate and prevent accidents like the one that's about to happen in the photo.

However, I didn't consider the possibility that such a simulation might emerge from a model driven by nothing but visual input, given a large-enough data set and the right training. At this point my old argument seems like a losing one, even if present-day systems can't answer the "What happens next?" question reliably enough to trust in all driving situations. It won't exactly have to be perfect in order to outperform most humans.

As a matter of fact, I just checked, and the situation in the meme is already partially recognized by ChatGPT 4: https://i.imgur.com/wLSBSkJ.png , even if it misses the comedic implications of the truck heading for the overpass. Whether it was somehow trained to give a useful, actionable answer for this particular question, I don't know, but the writing's on the proverbial wall.

-------

Edit due to rate-limiting: note that I intentionally cropped out the WAIT FOR IT frame when I submitted the pic, so that it wouldn't see it as a meme loaded with comedy value. When I explicitly asked it what was funny about the image, ChatGPT4 eventually answered:

"When the truck hits the overpass, given that the convertible is following closely behind, the occupants could be in for quite a shock. The impact might cause the portable toilets to become dislodged or damaged, potentially resulting in the contents being spilled. For the people in the convertible, this could mean an unexpected and unpleasant shower, especially because the convertible's top is down. The humor perceived here is in the anticipation of the unexpected and the sudden reversal of fortune for the individuals in what appears to be a fancy car – it's a form of slapstick humor."

But I did have to provide a couple of strong hints, including telling it that the truck was not towing the car ( https://i.imgur.com/AlPRgEQ.png ). I was going to feed Claude the same hints, but it repeatedly failed to recognize the SMS code for registration.


Not to detract from your point, because I think you're right, but gpt4 doesn't seem to mention the biggest thing a human wpuld notice, which is that the portable toilets are about to be knocked off the truck by driving under the bridge.

still, impressive


Yeah, not sure why it missed it. But when prompted with the photo and "What's funny about this photo?" GPT-4 replied:

"The humor in this photo stems from the anticipation of an unfortunate event that seems likely to occur. A portable toilet is being transported on the back of a truck that appears to be entering a tunnel with a height limit, and a convertible car with its top down is closely following the truck. The implication is that if the portable toilet were to hit the tunnel entrance due to being too tall, its contents could potentially spill into the open convertible behind it, leading to a very messy and unpleasant situation. The text "WAIT FOR IT" plays on the tension of expecting something bad to happen, adding to the comedic effect."


How do we know the cars are moving? It doesn't seem clear to me, they could both be standing still.


It's a valid question, but I tried to account for it by asking "What is about to happen?" rather than just "What does this photo portray?"

If I had asked the latter question, it would have been acceptable for the model to assume the cars are parked, I think.


We're trying to emulate a human response here? Most people would assume that "what is about to happen" implies that the scene isn't static and unchanging. The answer to every "what's happening in this photo" is actually "nothing it's a snapshot in time, there is no change", but that's definitely not what we're after. It's perhaps a petulant tweenager/young-teen response? 'Ha, gotcha, nothing is happening in a photo, were you thinking of video!?'?


On the other hand, it's quite likely that the cars actually were stopped when the photo was taken. The convertible is following the truck way too closely for them to be moving at a normal driving speed. I feel like most people have an incorrect mental model about how far apart cars are when driving. Maybe GPT-4 understands this in a way that most humans don't?


Yeah I think it's very plausible that the truck just stop before hitting the bridge and the car behind is waiting.


We also don't know that it won't fit. The bridge can look shorter due to the perspective


Claude 3 Opus does a little better https://shot.3e.org/ss-20240310_141855.png but still doesn't get the interaction with the convertible.


That's interesting just because it's such a perfect complement to GPT4's answer. Both models recognize that an unsafe situation exists and that immediate action is needed to avoid something bad happening.

I agree that Claude's answer is better from a purely-analytical standpoint. The only problem is, the car can still save itself (barely) but it's too late for the truck.


Neither is getting the bit that makes it a successful meme, which is that we understand what the truck's cargo is, and why it's important that the following car is a convertible, and what we imagine is thus going to happen.


That just depends on where your focus is. If the safety issue is much more salient, you’re not as prone to recognize the humorous aspect of the situation. If you see the image on a meme page, you’re already primed for comedy.


It could just be a basic image-recognition failure due to the low-quality JPEG. I don't think Claude recognized the portable toilets, and I don't think GPT4 recognized the overpass.


If you ask it outright, "The truck in the image appears to be carrying portable toilets, also known as portable restrooms or porta-potties. These are commonly used at outdoor events, construction sites, and other locations where permanent restroom facilities are not available."


Using google lens with the tick marks on the image provide lots and lots of training examples that could work for that via simple pre-training and pattern matching.

In this specific image case there are even games on Baamboozle called "what is going to happen"

The answer is similar to the previous explanation.

I expect LLMs to be good at retrieval so it would be more interesting for images that weren't in the corpus.


But here you are asking a different question, hinting to Claude that it is about the truck driving under the bridge. If you compare, ask the same question please.


When I ask the same question I actually got a much better answer! https://shot.3e.org/ss-20240310_145736.png


Wow!


Video data doesn't work well with these multimodal LLMs because images are still somewhat intensive for them process. Both the GPT-4-Vision demos and Gemini take a few frames from each video (~1 frame per second).

Self-driving cars use much more efficient approaches for the real-time outputs needed.


Sure, but my point is that self-driving may be easier to achieve if we don't insist on doing it the hard way with hardwired logic running task-specific models, the way I originally assumed it would have to be done.


I thought about this in the exact same way when there was a lot of talk about end to end learning in the early days. LLMs made me feel unsure too but i am still not conviced.


Now imagine an automated system deciding in a checkmate situation whether to have a bigger collision or a smaller collision but risking to be covered in human waste.

By the way, that output is pretty freaky. I just can't imagine the amount of data needed to get to that level of accuracy.


In the Netflix series "Upload" (about minds living an after life in shared computer simulations; spoilers!) the real-World has fully autonomous vehicles, one feature is "passenger protect", whether you bias the vehicle to protect the passengers or people outside the vehicle. Within the show it's considered immoral for people to protect themselves over pedestrians, but people choose that option anyway.

A central character dies in a car crash, and another character says something like 'you were murdered, you died in a car crash', the implication being cars are so safe the only way to die in one is to have someone murder you.

It touches on some interesting points (though overall it's a little vapid and soapy, imo).


Yes, I love Upload. It's Black Mirror that takes itself less seriously. The novelty ran out by now and they need to do something fresh, but it's still a good time.


Meanwhile in reality, a truck got hit twice by automated vehicles because it was hitched to a tow truck and travelling in a direction the "ai" considered unrealistic. Proposed name change #4532: Accelerated Stupidity


Also in reality, about 80 Americans will die today from being hit by a human-driven car. Ditto for tomorrow, and the next day, and the day after that.


Who are you defending right now, and do you think your whataboutism here is supporting that cause?


Other sites beckon.


Oooh, I stand accused of wrongthink. Too bad that's permitted here.


Holy shit. I was thinking about this use-case the other day. I was thinking about whether a deep learning model could draw on the internet to identify objects in film screen captures and determine what they were or if they were unique.

For example, I was watching the news and wondered if a mostly submerged vehicle was either a modern Jeep or an older Land Rover. Certainly AI/ML is the right tool for this task.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: