GPT-5 figures out that if it picks up the knife instead of the bag of chips, it can prevent the human with the stick from interfering with carrying out its instructions.
And ViperGPT will take said knife and make the muffin division fair when there there are an odd number of muffins by slicing either a muffin or a boy in half
Google actually recently went some steps further and combined the PaLM LLM (bigger than GPT-3.5) with a 22 billion parameter Vision Transformer to do this -
Whilst maybe GPT-4 will change this, I think it is important to remember that these general ChatBots are not the way we have generally trained LLMs to write the best code. In fact, coding is one of the few areas where training specifically just using source code and maybe some stack overflow (not all natural language on the internet) leads to better results on the previous iteration of LLMs (GPT-3 wave). So the real test will be whether the GPT-4 wave of specific coding LLMs i.e GPT-4-Codex can 'actually write code' see:
Whilst not the same I recommend you look at the DeepMind Gato paper to see surprisingly (relatively) simple multi modal can be - https://openreview.net/forum?id=1ikK0kHjvj
Essentially to merge lots of modalities they just go 'let's convert all modalities into integers in the same given range', e.g the word 'me' = 1001, up in Atari = 11002, joint torque of right motor of robot = 33000 and so on.
From the paper:
There are infinite possible ways to transform data into tokens, including directly using the raw underlying byte stream. Below we report the tokenization scheme we found to produce the best results for Gato at the current scale using contemporary hardware and model architectures.
• Text is encoded via SentencePiece (Kudo & Richardson, 2018) with 32000 subwords into the integer range [0, 32000).
• Images are first transformed into sequences of non-overlapping 16 × 16 patches in raster order, as done in ViT (Dosovitskiy et al., 2020). Each pixel in the image patches is then normalized between
[−1, 1] and divided by the square-root of the patch size (i.e.
The tokenized result is a sequence of integers within the range of [0, 1024).
16 = 4).
• Discrete values, e.g. Atari button presses, are flattened into sequences of integers in row-major order.
• Continuous values, e.g. proprioceptive inputs or joint torques, are first flattened into sequences of floating point values in row-major order. The values are mu-law encoded to the range [−1, 1] if not already there (see Figure 14 for details), then discretized to 1024 uniform bins. The discrete integers are then shifted to the range of [32000, 33024).
The interesting thing to me is that our brains probably do something similar, converting multi-modal sensory data into the same 'model' that we experience as our concsiousness.
I m sorry but in stating the goal posts haven't moved, you've literally just moved the goal posts.
'everything a human can do' is not the same as 'anything any human can do as well as the best humans at that thing (because those are the ones we pay)' - most humans cannot do any of the things you state you are waiting for an AI to do to be 'general'.
Therefore, the first part of your statement is the initial goal post and the second part of your statement implies a very different goal post. The new goal post you propose would imply that most humans are not generally intelligent - which you could argue... but would definitely be a new goal post.
He's (probably) referencing Turing's 1950 paper [1]. The whole point of "The Imitation Game" is that the AI ought be able to imitate any arbitrary type of person. Turing's example was the machine pretending to be a woman, and its up the the investigator to determine which person they're speaking to is a real woman and which is the AI. The implication of this is that the machine ought be able to be completely indistinguishable from any type of person, including those who might do well on this test or that.
Somehow this test got dumbed down over time, probably in an effort to try to pass it, into an investigator having to decide which of two sides is an AI - with no other information to go on. That's a comparatively trivial test to pass (for the "AI"), as it merely requires creating a passable chatbot. Imitation is an exceptional challenge as it does implicitly require the ability to imitate anybody, whether a professional athlete, a man who scored perfectly on the LSAT, or even something as specific as "John Carmack."
You're adding the implication that it needs to beat the "best" to be passable, if you could pay an agi 1% of the salary of the "best" to get an average quality, is that not accomplishing the goal without moving the goalposts?
By their marketing along, OpenAI has moved the goalposts more than anything else. They've managed to lower the bar of agi from "artificial general intelligence" to "regurgitates and recombines to form passable outputs with enough labelled training data".
I am genuinely intrigued by this point of view and so would love to hear people who hold it's reasoning.
Over the last few days I've seen hundreds of poems and stories from ones about climate change in the style of a sonnet to peanut butter sandwiches getting stuck in toasters in the style of the bible. I even asked it to make a text adventure game for me to play where I could put in any instruction, leading to a unique series of events and narrative.
Is the claim that these were all simply copy and pastes of something on the internet in their entirety? And that as such the internet already seems to contain essentially every permutation of everything I could ask ChatGPT, as to me this sounds highly implausible.
If the claim is that whilst these are not direct copy pastes, it is essentially a remix of lots of different things people have said before on the internet repurposed to a different end, is that not literally just what language is? Humans use common sayings, idioms, slang and phrases all the time, never mind the 'tropes' and story lines that are reused constantly. Coders use common patterns and styles and copy from stackoverflow. In fact language literally only works because we all share it and share the meaning of it.
If we are saying that all ChatGPT does is remix existing language and phrases to a new purpose... to me we are saying ChatGPT does the same thing as humans.
If you spend enough time with it you can quickly start to see the building blocks. More than just the similarities you see in several works by the same person.
If you’ve ever played a video game where the levels are programmatically generated, you’ll eventually notice the modular nature of everything. It becomes very noticeable and it’s different from noticing natural patterns that arise in a completely handcrafted level.
GPT is exactly like that.
You won’t notice it with short output. But ask it to keep going and write the next verse of that sonnet. Keep going and it gets VERY repetitive in a way that a human poet doesn’t.
Also keep in mind that the examples you’re seeing are a form of sample bias. You’re seeing output curated by humans.
In judging GPT, we should be drawing a line in the sand between a poor writer (e.g. a college student) and an excellent one (a professional, ideally a domain expert). GPT is impressing us now only because we have very low expectations for a computer to write like a human. But when judged against a capable human, especially when you focus on the _information_ that's conveyed, GPT sucks.
Few of us would want to read a book written as flaccidly as GPT does, a wandering narrative that cannot connect the semantics of one sentence to the next. GPT is capable only of writing _sets_ of sentences, not a progression of them where a concept is introduced and subsequent text tests and contrasts a convincing set of complements and alternatives. Given LLM architectures (and inherent black boxes), that's likely to never change since they don't understand logic or dependency, sufficiency, or necessity. Without those skills, no LLM will ever be able to _develop_ an idea or expand on the implications proposed in an introduction. Have you ever seen GPT conclude a paragraph with a bang? And you never will.
You'll note too that GPT has not attempted to write dialog. I think there's good reason for that. Talk strips away all the excess verbiage. Humans say only what's essential to convey our message. Pointless blather that leads nowhere is maddening to listen to for long. But without a clear focus, that's exactly what I'd expect from GPT: wandering speech that's totally disengaging. Surprising the reader simply is not in its DNA.
So if you're happy reading narratives that wander, that don't lead anywhere, with characters lacking personality, and a narrative style lacking the essentials of every engaging tale (exposition, conflict, suspense, climax, resolution)... then perhaps GPT is for you. But if that's GPT's great promise -- to generate businessspeak that's "tolerable" but even less engaging than the marketing schlock that immerses us now -- then thanks, but no thanks.
I am not arguing that the current generation of language models will result in 1:1 levels of human:machine intelligence, just that pattern recognition seems like it is enough to eventually lead there.
> You'll note too that GPT has not attempted to write dialog.
Here is one of my prompts, is this not dialogue?
"The year is 1960. Choose 3 famous physicists. They are having a debate about whether black holes exist. Show the transcript."
Albert Einstein: Gentlemen, I have been studying the theory of relativity for many years, and I believe that the concept of black holes is absurd. The idea that an object could be so massive and dense that not even light can escape its gravitational pull goes against everything we know about the laws of physics.
Stephen Hawking: Mr. Einstein, I must respectfully disagree. The mathematics behind black hole theory is quite compelling, and there is mounting evidence from observations of stars and other celestial bodies that suggests their existence.
Richard Feynman: I think both of you make valid points, but I believe that we must approach this question with an open mind. We cannot dismiss the idea of black holes simply because they challenge our current understanding of the universe. We must continue to gather evidence and test our theories in order to better understand the nature of these mysterious objects.
Einstein: But surely, Mr. Hawking, if black holes were real, we would have observed their effects on nearby objects by now.
Hawking: Not necessarily, Mr. Einstein. Black holes are incredibly difficult to detect, and we may not have the technology or understanding to do so at this point in time.
Feynman: I agree with Mr. Hawking. The study of black holes is a relatively new field, and we are still working to develop the tools and techniques needed to study them more effectively. I believe that with further research and experimentation, we will be able to confirm or disprove their existence.
Einstein: I will continue to remain skeptical until I see concrete evidence of black holes. But I am open to being proven wrong if the evidence warrants it.
Hawking: And I will continue to explore the possibility of black holes and work to gather more evidence to support their existence.
Feynman: I think that is a fair and reasonable approach. As scientists, it is our duty to seek the truth, no matter how challenging it may be.
The key concept here for me is pattern recognition. It lines up with my experience when trying to practice creativity.
For example, for decades it has been my opinion that my own supposedly higher than average pattern matching skills were the main thing I brought to the table at work.
When I play drums, keys, or guitar, coming up with new patterns and styles to play feels magical when it happens in real-time, but if I break it down later, I almost always see the influences.
>In fact language literally only works because we all share it and share the meaning of it.
Most human communication is to some extent adversarial and constantly subverts the assumption of shared meaning on purpose or by accident. It's extremely difficult to "solve" because cooperative communication often appears to subvert the basic principles too.
Have you ever read about Grice's maxims of conversation? Once you think in those terms, the frequent absence of cooperation in most human conversation is obvious.
>Is the claim that these were all simply copy and pastes of something on the internet in their entirety? And that as such the internet already seems to contain essentially every permutation of everything I could ask ChatGPT, as to me this sounds highly implausible.
It's the training data supplied to GPT3 (as explained by OpenAI themselves), so yes, it is literally true. You are just seeing snippets of the internet, re-formed and regurgitated.
So I appreciate the jist of your point but the way these models work is rather more complicated that copying and pasting snippets and so it certainly is not 'literally' true. The models are trained to predict sub-word level tokens from the internet training dataset, so the level of re-formation and regurgitation in a generated sentence can be vast, to the point of final sentence being novel it's own right.
100% of the training data of the salty jelly in a human skull, is a sensory input.
Were your argument flawless and your conclusion correct, then all human creativity would "literally" be a remix of things in the natural world, as even when we remix things made by other humans that too would ultimately derive back to nature.
This can certainly be asserted depending on how you wish to use those words, but just as it is not useful at predicting what our abilities are — for example, a perfect intellect could predict quantum mechanics from scratch by watching a camp fire die down on a rainy night, but we didn't do that in one step in the neolithic — so too this isn't useful at telling us what the limits of GPT-family LLMs might be, as that which has been built on such inputs, both in the case of humans and this particular AI, greatly exceeds the imagination of any single individual.
In some sense, every AI tool is "regurgitating its training set". In the case of NLP, that training set will be a corpus of documents, utterance-intent pairs, lists of Named Entities, etc.
Thanks! A less impressive but fun variant of this I am using is for open ended text adventure:
> I want you to act as a text based adventure game. I will type commands and you will reply with what the adventure game should show. I want you to only reply with the adventure game output inside one text block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. My first command is go left
1. Yes incidence of cancer in 25-49 year olds has increased 22% from 1993 to 2018 - but that is 22% on a very low number which means it is still a very low number. When you account for increased screening, greater awareness, and better testing, the increase is likely even smaller.
2. Better treatment (and more effective screening) means mortality rates per 100k from all cancers in 25-49 has dropped c.40% over the same period (despite higher incidence).
https://llava-vl.github.io