Maybe I'm missing what "abstraction" means here but seems like the tasks were ce...

bloaf · on Nov 19, 2023

It's also worth remembering that blind humans who can recognize squares by feel do not have the ability to recognize squares by sight upon gaining vision.

I suspect the model is bad at these kinds of "reasoning" tasks in the same way that a newly-sighted person is bad at recognizing squares by sight.

eulgro · on Nov 19, 2023

When did blind humans gain vision out of curiosity?

withinboredom · on Nov 19, 2023

The first one I heard about was 10-15 years ago by projecting an image on the tongue. Ahh, here it is: https://www.scientificamerican.com/article/device-lets-blind...

bloaf · on Nov 19, 2023

https://www.projectprakash.org/_files/ugd/2af8ef_5a0c6250cc3...

They studied people with treatable congenital blindness (dense congenital bilateral cataracts)

joe_the_user · on Nov 19, 2023

In my experience GPT4/V is pretty bad at those specifically, not necessarily around abstraction in general.

The problem with a statement like is that it leaves the door open to accepting any kind of canned generality as "abstraction in general". Abstract reasoning is indeed a fuzzy/slippery concept and spatial reason may not capture it well but I'm pretty sure it captures it better a general impression of ChatGPT.

...since it has no body, no world, no space; it "lives" in the realm of text.

There's a bizarre anthropomorphism on this thread, both reflexively compare this software system to a blind human and the implicit call to be considerate of this thing's supposed disability.

int_19h · on Nov 19, 2023

Why is it bizarre to consider the limitations inherent in the input data on which the model is trained? Fundamentally, it still "sees" the world through text, and the extent to which it can "understand" spatial relationships is defined by that. It seems utterly unsurprising that this leads to very poor grasp of the actual concepts behind what things like "above" or "left" are - the text that humans produce when talking about such things kinda relies on the reader having their own experience (if not vision, then at least body awareness) that can be mapped to those concepts. You can explain "left" and "right" to a human by telling them which of their hands is which, and I can't help but wonder what the actual information payload this constitutes when you consider the body spatial awareness that is brought into context by association.

joe_the_user · on Nov 19, 2023

Why is it bizarre to consider the limitations inherent in the input data on which the model is trained?

Sure the thing is limited, the study is demonstration of this (and general purpose abilities have been claimed for LLMs at various point).

I was pushing back against the "it's like a blind person" anthropomorphizing argument [edit: especially the assumption these things learn through experience and reflection which the parent also makes]. Maybe if the thing "had eyes", it could learn spatial information and maybe it couldn't (though it would take a lot of work to make that metaphor meaningful). The thing certainly doesn't learn text in the fashion that human learns speech since humans don't digest the entire Internet before they can speak.

pixl97 · on Nov 19, 2023

I'd recommend looking up model grounding by multi-modal training. Seemingly models improve as you add more modes.

daveguy · on Nov 19, 2023

The study did include a multimodal model.

Apparently it doesn't improve abstract reasoning capability, because according to the article the multimodal gpt4 did just as dismally as the text-only gpt4. This was surprising to me, as I would have expected an improvement with a model that did include spatial relationships.

dragonwriter · on Nov 19, 2023

> Fundamentally, it still "sees" the world through text

Fundamentally, it "sees the world" [0] through tokens, which are not text.

[0] Also a bad metaphor, but...

int_19h · on Nov 19, 2023

Technically true, but when those tokens are 1:1 mapped to text, I think we can simplify this down without losing anything important.

Of course, once you start using tokens for other things - as multimodal LMs already do - that changes. But this current crop of model still has visual modality in its infancy IMO, and gauging the overall performance of model as a whole based on that is very questionable.

dragonwriter · on Nov 20, 2023

> Technically true, but when those tokens are 1:1 mapped to text

I don't know what GPT-4V does particularly, but my understanding is that multimodal models very often have an expanded token space with special tokens related to image handling, so, literally, there is not a 1:1 relationship of tokens to text.

Jensson · on Nov 19, 2023

A string of tokens is text. Tokens is just another alphabet, like Japanese letters having many representations for the same sounds and a letter can be entire words sometimes.

og_kalu · on Nov 19, 2023

>The problem with a statement like is that it leaves the door open to accepting any kind of canned generality as "abstraction in general".

Not really

https://arxiv.org/abs/2212.09196

joe_the_user · on Nov 19, 2023

Nah,

By the very fact that there's paper here, whatever it's merit, the authors of the paper have codified their concept of generality and this doesn't validate the point I was replying to, which was essentially "my impression/feeling" is that it is better".

og_kalu · on Nov 19, 2023

Point is that it's good at abstract reasoning that isn't spatially grounded like in that paper. So it's not really leaving any door open. It's not a cop out. That's just how it is.

lazy_moderator1 · on Nov 19, 2023

> which is kinda unsurprising since it has no body, no world, no space; it "lives" in the realm of text

or rather the training set was lacking in this regard

Sharlin · on Nov 19, 2023

> DALLE3 suffers a similar problem where it has trouble with concepts like "upside down" and consistently fails to apply them to generated images.

This has nothing to do with having "no body, no world" and everything to do with the fact that training pictures where things are upside down are simply vastly rarer that pictures where they aren't.

kaoD · on Nov 19, 2023

My point is: both are two sides of the same coin.

pixl97 · on Nov 19, 2023

What would directions be for an intelligent creature that lives in zero gravity? I just like thinking about this for the same reasons humans like writing speculative science fiction. Trying to guess what alien perspectives look like, might also give us insights when we're the ones making the alien.

Retric · on Nov 19, 2023

Basically the same, gravity doesn’t define left/right or North, South, East, and West for us just up and down.

trealira · on Nov 19, 2023

However, North, South, East, and West are relative to the poles of the Earth. Something living in zero gravity would have to use some object as an anchor to determine the direction.

Retric · on Nov 19, 2023

You’re also oriented based on objects. We don’t have an abstract compass pointing north 24/7 the way we can use our bodies to determine left and right or gravity to point down.

trealira · on Nov 19, 2023

Right, that's why we use compasses, which use the poles of the Earth to determine the direction.

Something living in zero gravity doesn't have a planet, so they'd have to find something else to base the directions on.

That's what I was trying to say before.

anonymouskimmer · on Nov 19, 2023

The solar system has a north pole and a south pole based on the rotation of the Sun. Basically the only places in which there isn't something to orient against are in the depths of inter-galactic-cluster voids with nothing around. And if a being is stuck in one of those voids, orientation is way down the list of problems they have.

trealira · on Nov 19, 2023

That's a good point. The sun of a solar system could possibly be what an alien society living in zero gravity bases their directions on.

int_19h · on Nov 19, 2023

FWIW there is some interesting variability among human cultures on that, as well. There are a few that actually use cardinal directions predominantly or exclusively instead of body-relative ones like "left" and "right".

withinboredom · on Nov 19, 2023

No, but they would have front and back, and people from the bridge would share which way was “up” and “down” and “left” and “right” based on the controls.

mr_toad · on Nov 19, 2023

> DALLE3 suffers a similar problem where it has trouble with concepts like "upside down" and consistently fails to apply them to generated images.

There’s probably not many (if any) upside down images or objects in the training data.