Chain of Thought empowers transformers to solve inherently serial problems

lsy · 2024-09-17T02:40:17 1726540817

Note that for the purposes of this paper a “problem” just means a formally decidable problem or a formal language, and the proof is that by creatively arranging transformers you can make individual transformer runs behave like individual Boolean circuits. However, this is a long way from any practical application of transformers: for one thing, most problems we care about are not stated as formal languages, and we already have an exceptionally more efficient way to implement Boolean circuits.

shawntan · 2024-09-17T02:44:34 1726541074

If a "problem we care about" is not stated as a formal language, does it mean it does not exist in the hierarchy of formal languages? Or is it just as yet unclassified?

tsimionescu · 2024-09-17T07:33:52 1726558432

It means that there are two problems: one, to formalize the problem as stated while capturing all relevant details, and two, solving the resulting formal problem. Until you solve problem one, you can't use formal methods to say anything about the problem (it's not even clear a priori that a problem is even solvable).

Unfortunately, the task of a formalizing an informal problem is itself an informal problem that we don't know how to formalize, so we can't say much about it. So overall, we can't say much about how hard the general problem "given a problem statement from a human, solve that problem" is, whether any particular system (including a human!) can solve it and how long that might take with what resources.

viraptor · 2024-09-17T13:01:10 1726578070

> task of a formalizing an informal problem is itself an informal problem

I couldn't find details about this - do you know of a paper or some resource which digs into that idea?

tsimionescu · 2024-09-17T13:08:53 1726578533

No, but it's pretty obvious, isn't it? If you have an informal problem statement, say "I want this button to be bigger", formalizing it can't be a formal process.

naasking · 2024-09-17T16:35:13 1726590913

> "I want this button to be bigger", formalizing it can't be a formal process.

    while (!is_button_big_enough()) {
       button.scaleUp(1.1);
    }

This is one trivial way to do it, and seems like it would be formalizable. is_button_big_enough is simply an input to whatever process is responsible for judging such a thing, whether that be a design specification or perhaps input from a person.

tsimionescu · 2024-09-17T18:00:10 1726596010

You've translated my informal problem statement into a quasi-formal process, using your inherent natural language processing skills, and your knowledge of general human concepts like size. But you haven't explained the formal process you followed to go from my problem statement to this pseudocode.

And your pseudocode template only works for one particular kind of informal problem statement. If I instead have the problem "how much money do I need to buy this house and this chair?", or "does this byte fit in my mouth?", your general form will not work.

And what's more, you haven't actually produced a formally solvable problem definition, that we could analyze for complexity and computability, because you rely on two completely unspecified functions. Where is the formal defintion of a button? Is it a physical push button or a UI control or a clothing button? What does it mean that it is bigger or smaller? When do we know it's big enough, is that computable? And how do we scale it up? Do we increase its volume? Its surface area? One of its sides? Or maybe the radius? And how do we go about doing that? All of these, and many more, need to be explicitly defined in order to apply any kind of formal analysis to this problem. And there is no formal way to do so in a way that matches the intent of whoever posed the problem.

naasking · 2024-09-17T21:42:30 1726609350

> And what's more, you haven't actually produced a formally solvable problem definition, that we could analyze for complexity and computability, because you rely on two completely unspecified functions. Where is the formal defintion of a button?

Well your statement was underspecified. You said "I want this button bigger". There are procedures to translate informal statements to formal ones, but one basic step is underspecified referents are delegated to abstractions that encapsulate those details, so "this button" designates some kind of model of a button, and "I" refers to a subject outside the system thereby implying some kind of interactive process to query the subject whether the model is satisfactory, eg. a dialog prompt asking "Is this button big enough now?"

You call these skills "inherent", but humans are not magical. We employ bug riddled poorly specified procedures for doing this kind of interpretive work, and LLMs have already started to do this too, and they'll only get better. Is asking a deterministic LLM to generate a formal specification or program to achieve some result a formal process? I don't think these lines are as clear as many think, not anymore.

tsimionescu · 2024-09-18T10:16:03 1726654563

I think we're mostly agreed actually. I'm not trying to claim that this is an unsolvable problem, just that it's a difficult problem that we don't have a solution for yet. And yes, LLMs are probably our best tool so far. And asking for clarifying questions is clearly a part of the process.

I will say that there is also a possibility the general form of the formal problem is in fact uncomputable. It seems possible to me it might be related to the halting problem. But, until we have a formal specification of it, we won't know, of course.

freejazz · 2024-09-17T19:34:00 1726601640

What is the repeatable method by which you came to that conclusion? That is what needs to be formalized for your response to make sense.

naasking · 2024-09-17T21:46:35 1726609595

There are procedures for translating informal statements to formal ones. If I submit such informal statements to an LLM and ask it to generate a spec or program to achieve some result, that can be made repeatable. There are various arrangements to make this more robust, like having another LLM generate test cases to check the work of the other. Does this qualify?

freejazz · 2024-09-18T14:47:45 1726670865

What are the procedures? How do they apply?

naasking · 2024-09-18T15:27:48 1726673268

https://en.wikipedia.org/wiki/Logic_translation

freejazz · 2024-09-18T21:49:20 1726696160

So what would be the methods here?

viraptor · 2024-09-17T14:22:53 1726582973

It's... "knee-jerk obvious". But is it actually true? People seem to be interested in the concept in formal logic arguments for example https://www.researchgate.net/publication/346658578_How_to_Fo... (which uses formal process for part of formalization), so maybe it's not as simple as it seems initially. I mean, if we're already talking about formal problems, it could use a stronger proof ;)

tsimionescu · 2024-09-17T18:04:21 1726596261

At best, this is a formal process for manipulating certain kinds of statements. But the general problem, "take a human's statement of a problem and translate it into a formal statement of a problem that, if solved, will address what the human was asking for" is far harder and more nebulous. Ultimately, it's exactly the problem that LLMs have been invented for, so it has been studied in that sense (and there is a broad literature in AI for NLP, algorithm finding, expert systems, etc). But no one would claim that they are even close to having a formal specification of this problem that they could analyze the complexity of.

moi2388 · 2024-09-17T14:35:36 1726583736

Why not? Bigger is a measure of size and ought to be easy enough to formalise.

Apply a transformation to B which increases its area and leaves the proportion of its sides equal.

Why would this statement not be formalisable?

tsimionescu · 2024-09-17T18:20:25 1726597225

I'm not saying that the statement "I want this button to be bigger" can't be formalized. I'm saying that there is no formal process you can follow to get from this problem to a formal problem that is equivalent. There isn't even a formal process you can use to check if a formal definition is equivalent to this problem.

Consider that if someone asked you solve this problem for them with just this statement, either of the following could be a sketch of a more formal statement of what they actually want:

1. In a given web page, the css class used for a particular <button> element should be changed to make the button's height larger by 10%, without changing any other <button> element on the page, or any other dimension.

2. For a particular piece of garment that you are given, the top most button must be replaced with a different button that appears to have the same color and finish to a human eye, and that has the same 3D shape up to human observational precision, but that has a radius large enough to not slip through the opposing hole under certain forces that are commonly encountered, but not so large that it doesn't fit in the hole when pushed with certain forces that are comfortable for humans.

I think you would agree that (a) someone who intended you to solve either of these problems might reasonably describe them with the statement I suggested, and (b), that it would be very hard to devise a formal mathematical process to go from that statement to exactly one of these statements.

moi2388 · 2024-09-18T05:54:30 1726638870

Ah, gotcha. I agree it would be difficult. I’m still not convinced it would be impossible though.

LLMs could even formalise what you want in the context, even now.

Or do you mean that you can’t formalise every statement when given incomplete information about the context of the statement, since then we have a single word pointing to multiple different contexts?

tsimionescu · 2024-09-18T07:04:51 1726643091

Oh yes, it's not impossible, I'm just saying we don't know how to do it yet. LLMs themselves are probably our best attempt so far.

Zhyl · 2024-09-17T14:44:23 1726584263

But here's the thing, it's not that the statement isn't formalisable, it's the method that you used to formalise it isn't formalisable.

qwertytyyuu · 2024-09-17T14:43:06 1726584186

Yeah you could make it one pixel bigger but if someone asked you that, is that what they actually want?

esjeon · 2024-09-17T13:35:41 1726580141

Ah, you are informally inquiring about a formal description concerning the informal nature of formalization of informal questions.

Joke aside, this is about the nature of the formalization process itself. If the process of formalizing informal problems were fully formalized, it would be possible to algorithmically compute the solution and even optimize it mathematically. However, since this is obviously impossible (e.g. vague human language), it suggests that the formalization process can't be fully formalized.

wslh · 2024-09-17T12:30:05 1726576205

My 2 cents: Since LLMs (Large Language Models) operate as at least a subset of Turing machines (which recognize recursively enumerable languages), the chain of thought (CoT) approach could be equivalent to or even more expressive than that subset. In fact, CoT could perfectly be a Turing machine.

If we leave CoT aside for a moment, it's worth exploring the work discussed in the paper "Neural Networks and the Chomsky Hierarchy"[1], which analyzes how neural networks (including LLMs) map onto different levels of the Chomsky hierarchy, with a particular focus on their ability to recognize formal languages across varying complexity.

[1] https://ar5iv.labs.arxiv.org/html/2207.02098v1

flir · 2024-09-17T14:11:19 1726582279

> In fact, CoT could perfectly be a Turing machine.

Are we going to need an infinite number of LLMs, arranged on a tape?

julienreszka · 2024-09-17T11:53:27 1726574007

> most problems we care about are not stated as formal languages

then a way would be to translate them to formal language

lgessler · 2024-09-17T14:42:39 1726584159

I liked Yoav Goldberg snarky's quote tweet:

> next paper: transformers can solve any problem but on some of them they may compute indefinitely and never provide an answer

> (and you cannot tell in advance which is which!!)

https://twitter.com/yoavgo/status/1835802380589203802

teqsun · 2024-09-17T15:52:56 1726588376

It reminds me of Busy Beaver

sigmoid10 · 2024-09-17T06:42:52 1726555372

>Remarkably, constant depth is sufficient.

How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states? Since transformers also use fully connected layers, none of this should really come as a surprise. But from glancing at the paper, they don't even mention it.

nexustext · 2024-09-17T08:01:03 1726560063

It's 'remarkable' because (a) academic careers are as much about hype as science, (b) arxiv doesn't have peer review process to quash this, (c) people take arxiv seriously.

logicchains · 2024-09-17T13:21:06 1726579266

>How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states

Only with infinite precision, which is highly unrealistic. Under realistic assumptions, fixed depth transformer without chain-of-thought are very limited in what they can express: https://arxiv.org/abs/2207.00729 . Chain of thought increases the class of problems which fixed depth transformers can solve: https://arxiv.org/abs/2310.07923

IshKebab · 2024-09-17T17:28:04 1726594084

The universal approximation theorem has no practical relevance.

larodi · 2024-09-17T08:30:59 1726561859

I'm waiting for peoples of AI to discover syllogism and inference in its original PROLOG sense, which this CoT abomination basically tries to achieve. Interestingly, if all logical content is translated to rules, and then only rules are fed into the LLM training set, what would the result be, and can the probabilistic magic be made into actually following reason without all the dice.

trescenzi · 2024-09-17T11:10:39 1726571439

Right we’ve now gotten to the stage of this AI cycle where we start using the new tool to solve problems old tools could solve. Saying a transformer can solve any Formally decidable problem if given enough tape isn’t saying much. It’s a cool proof, don’t mean to deny that, but it doesn’t mean much practically as we already have more efficient tools that can do the same.

marcosdumay · 2024-09-17T13:36:34 1726580194

What I don't get is... didn't people prove that in the 90s for any multi-layer neural network? Didn't people prove transformers are equivalent on the transformers paper?

Nevermark · 2024-09-20T20:32:42 1726864362

Yes they did. A two layer network with enough units in the hidden layer can form any mapping to any desired accuracy.

And a two layer network with single-delay feedback from the hidden units to themselves can capture any dynamic behavior (to any desired accuracy).

Adding layers and more structured architectures creates the opportunity for more efficient training and inference, but doesn't enable any new potential behavior. (Except in the sense that reducing resource requirements can allow impractical problems to become practical.)

larodi · 2024-09-21T06:47:35 1726901255

Putting a 50 bucks bet that some very smart kid in the near future will come with some enthrophy-meets-graphical-structures theorem which gives an estimation of how the loss of information is affected by the size and type of the underlying structure holding this information.

It took a while for people to actually start talking about LZW as grammar algo, not a "dictionary"-based algorithm. Which is then reasoned about in a more general sense again by https://en.wikipedia.org/wiki/Sequitur_algorithm.

This is not to say that LLMs are not cool, we put them to use every day. But the reasoning part is never going to be a trustworthy one without a 100% discreet system, which can infer the syllogistic chain with zero doubt and 100% tracable origin.

sunir · 2024-09-17T13:22:16 1726579336

I was thinking about the graphrag paper and prolog. I’d like to extract predicates. The source material will be inconsistent and contradictory and incomplete.

Using the clustering (community) model, an llm can summarize the opinions as a set of predicates which don’t have to agree and some general weight of how much people agree or disagree with them.

The predicates won’t be suitable for symbolic logic because the language will be loose. However an embedding model may be able to connect different symbols together.

Then you could attempt multiple runs through the database of predicates because there will be different opinions.

Then one could attempt to reason using these loosely stitched predicates. I don’t know how good the outcome would be.

I imagine this would be better in an interactive decision making tool where a human is evaluating the suggestions for the next step.

This could be better for planning than problem solving.

larodi · 2024-09-21T06:55:25 1726901725

Hm... a RAG over DB of logical rules actually may be interesting. But loosely stitched predicates you can easily put to work with some random dice when you decide inference.

Chris Coyne of OKCupid and KeyBase (https://chriscoyne.com/) produced ContextFree (https://www.contextfreeart.org/) before all that. It is a grammar-based inference with probabilistic chance for the inference of the next rule. Very very very inspiring, not only because of the aesthetic side of the result. Digging further you find ProbLog which allows probabilities for rules (https://dtai.cs.kuleuven.be/problog/tutorial/basic/08_rule_p...)

So how about we start thinking of AI as combination of the graphical probabilistic whatever which compresses the infromation from the training set in a very lossy manner; which is then hooked, internally or externally, with a discreet logical core, whenever CoT is needed. So this construct now can benefit from both worlds.

pkoird · 2024-09-17T18:22:09 1726597329

I've said this before and I'll say it again: Any sufficiently advanced LLM is indistinguishable from Prolog.

detourdog · 2024-09-17T11:32:10 1726572730

I’m surprised that understanding how to be thought unfolds is being considered not relevant to the answer. I have done a lot of problem solving in groups and alone. How thoughts develop seems fundamental to understand the solutions.

The story regarding the banning of terms that can be used with a reasoning system is a big red flag to me.

This sort of knee jerk reaction displays immature management and an immature technology product.

larodi · 2024-09-26T13:51:46 1727358706

a little late to reply, but perhaps you see this. does it not make impression to you that lots of these articles on AI that get published are very childish. not in the math sense, but in the rasoning sense. besides, most of them are anything but interdisciplinary. I've almost never encountred prompt engineers who actually tried to delve into what GPTs do, and then these CoT guys, they don't know a thing about predicat logic, yet try to invent it anew.

On your comment reg. banning tokens/terms we are on the same page. We can agree all of this is very immature, and many of the peoples also, including this lot of chinese kids who seem to put out one paper per our. You see, the original seq2seq paper is 8 pages, topic included. Can you imagine? But Sutskever was ot a child back then, he was already deep into this all. We can easily state/assume the LLM business is in its infancy. It may easily stay there for a century until everyone levels up.

wodenokoto · 2024-09-17T06:38:25 1726555105

But didn't we already know that NN can solve any computable problem? The interesting thing is if they can be trained to solve any (computable) problem.

imhoguy · 2024-09-17T06:58:14 1726556294

I don't know why I have read "HN", indeed HN can solve any problem.

tossandthrow · 2024-09-17T07:36:40 1726558600

Feed forward NNs can approximate all functions f: X -> Y only for closed domains.

But recurrent neural networks can do solve any computational problem given enough precision.

roboboffin · 2024-09-17T13:58:40 1726581520

Does that mean when we reduce the precision of a NN, for example using bfloat16 instead of float32, we reduce the set of computational problems that can be solved.

How would that compare with a biological neural network with presumably near-infinite precision ?

wodenokoto · 2024-09-17T10:10:42 1726567842

First day of introductions to NN we were asked to create all the logic gates using artificial neurons, and then told "If you have all gates, you can do all computations".

I got to admit, I'm sorta sticking to that at face value, because I don't know enough computer science to a) discern if that is true and b) know what "f: X -> Y only for closed domains" means.

tossandthrow · 2024-09-17T10:56:55 1726570615

I think the easiest way to think about this is in terms of natural numbers, ie. 1, 2, 3, 4.

When you only have a fixed width, ie. a static feed forward network, you have an upper limit to the data you can represent and compute on.

Eg. if the highest number you can represent is 1.000, then you will need a new NN if you want to do computations on 1.001.

... or use an inductive structure, like a recurrent neural network has.

logicchains · 2024-09-17T13:22:27 1726579347

Only NNs of infinite size or precision. Under more realistic assumptions, transformers without chain of thought are actually limited in what they can solve: https://arxiv.org/abs/2207.00729

nopinsight · 2024-09-17T01:46:42 1726537602

In the words of an author:

"What is the performance limit when scaling LLM inference? Sky's the limit.

We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient.

http://arxiv.org/abs/2402.12875 (ICLR 2024)"

https://x.com/denny_zhou/status/1835761801453306089

ec109685 · 2024-09-17T02:07:38 1726538858

Is this the infinite monkey Shakespeare trope?

throwup238 · 2024-09-17T04:56:57 1726549017

More like the universal approximation theorem extended to computation rather than network complexity: https://en.wikipedia.org/wiki/Universal_approximation_theore...

immibis · 2024-09-17T05:33:53 1726551233

The universal approximation theorem is good to know because says there's no theoretical upper bound to a function-approximating NN's accuracy. In practice it says nothing about what can be realistically achieved, though.

nopinsight · 2024-09-17T06:11:05 1726553465

A key difference is that the way LMMs (Large Multimodal Models) generate output is far from random. These models can imitate/blend existing information or imitate/probably blend known reasoning methods in the training data. The latter is a key distinguishing feature of the new OpenAI o1 models.

Thus, the signal-to-noise ratio of their output is generally way better than infinite monkeys.

Arguably, humans rely on similar modes of "thinking" most of the time as well.

CamperBob2 · 2024-09-17T03:00:55 1726542055

Yeah. Monkeys. Monkeys that write useful C and Python code that needs a bit less revision every time there's a model update.

Can we just give the "stochastic parrot" and "monkeys with typewriters" schtick a rest? It made for novel commentary three or four years ago, but at this point, these posts themselves read like the work of parrots. They are no longer interesting, insightful, or (for that matter) true.

visarga · 2024-09-17T03:32:28 1726543948

If you think about it, humans necessarily use abstractions, from the edge detectors in retina to concepts like democracy. But do we really understand? All abstractions leak, and nobody knows the whole stack. For all the poorly grasped abstractions we are using, we are also just parroting. How many times are we doing things because "that is how they are done" never wondering why?

Take ML itself, people are saying it's little more than alchemy (stir the pile). Are we just parroting approaches that have worked in practice without real understanding? Is it possible to have centralized understanding, even in principle, or is all understanding distributed among us? My conclusion is that we have a patchwork of partial understanding, stitched together functionally by abstractions. When I go to the doctor, I don't study medicine first, I trust the doctor. Trust takes the place of genuine understanding.

So humans, like AI, use distributed and functional understanding, we don't have genuine understanding as meant by philosophers like Searle in the Chinese Room. No single neuron in the brain understands anything, but together they do. Similarly, no single human understands genuinely, but society together manages to function. There is no homunculus, no centralized understander anywhere. We humans are also stochastic parrots of abstractions we don't really grok to the full extent.

throwaway290 · 2024-09-17T05:02:08 1726549328

> My conclusion

Are you saying you understood something? Was it genuine? Do you think LLM feels the same thing?

visarga · 2024-09-17T06:49:46 1726555786

Haha, "I doubt therefore I am^W don't understand"

exe34 · 2024-09-17T06:56:28 1726556188

do llms feel?

throwaway290 · 2024-09-17T07:53:50 1726559630

seems like would be the implication if yes

kaechle · 2024-09-17T04:59:22 1726549162

Great points. We're pattern-matching shortcut machines, without a doubt. In most contexts, not even good ones.

> When I go to the doctor, I don't study medicine first, I trust the doctor. Trust takes the place of genuine understanding.

The ultimate abstraction! Trust is highly irrational by definition. But we do it all day every day, lest we be classified as psychologically unfit for society. Which is to say, mental health is predicated on a not-insignificant amount of rationalizations and self-deceptions. Hallucinations, even.

kaechle · 2024-09-17T04:09:41 1726546181

Every time I read "stochastic parrot," my always-deterministic human brain surfaces this quote:

> “Most people are other people. Their thoughts are someone else's opinions, their lives a mimicry, their passions a quotation.”

- Oscar Wilde, a great ape with a pen

OKRainbowKid · 2024-09-17T11:02:29 1726570949

Reading this quote makes me wonder why I should believe that I am somehow special or different, and not just another "other".

HeatrayEnjoyer · 2024-09-17T13:37:08 1726580228

That's just it. We're not unique. We've always been animals running on instinct in reaction to our environment. Our instincts are more complex than other animals but they are not special and they are replicable.

hegFdH · 2024-09-17T09:33:45 1726565625

The infinite monkey post was in response to this claim, which, like the universal approximation theorem, is useless in practice:

"We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient."

Like an LLM, you omit the context and browbeat people with the "truth" you want to propagate. Together with many political forbidden terms since 2020, let us now also ban "stochastic parrot" in order to have a goodbellyfeel newspeak.

chaosist · 2024-09-17T12:54:04 1726577644

There is also a problem of "stochastic parrot" being constantly used in a pejorative sense as opposed to a neutral term to keep grounded and skeptical.

Of course, it is an overly broad stroke that doesn't quite capture all the nuance of the model but the alternative of "come on guys, just admit the model is thinking" is much worse and has much less to do with reality.

ffsm8 · 2024-09-17T05:17:29 1726550249

> novel commentary three or four years ago,

Chatgpt was released November 2022. That's one year and 10 months ago. Their marketing started in the summer of the same year, still far of from 3-4 years.

Banou · 2024-09-17T05:52:29 1726552349

But chatgpt wasnt the first, openai had coding playground with gpt2, and you could already code even before that, around 2020 already, so I'd say it has been 3-4years

killerstorm · 2024-09-17T08:32:17 1726561937

GPT-3 paper announcement got 200 comments on HN back in 2020.

It doesn't matter when marketing started, people were already discussing it in 2019-2020.

Stochastic parrot: The term was coined by Emily M. Bender[2][3] in the 2021 artificial intelligence research paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? " by Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell.[4]

ffsm8 · 2024-09-17T13:33:48 1726580028

Confusing to read your comment. So the term was coined 3 yrs ago, but it's been 4 years out of date? Seems legit

It could be that the term no longer applies, but there is no way you could honestly make that claim pre gpt4, and that's not 3-4yrs ago

killerstorm · 2024-09-17T21:16:45 1726607805

Text generated by GPT-3 usually makes more sense than your comments.

ffsm8 · 2024-09-18T05:34:34 1726637674

Arw, did I hurt your feelings by pointing out how nonsensical you were?

Poor boy <3

93po · 2024-09-17T04:53:49 1726548829

AI news article comments bingo card:

* Tired ClosedAI joke

* Claiming it's predictive text engine that isn't useful for anything

* Safety regulations are either good or bad, depending on who's proposing them

* Fear mongering about climate impact

* Bringing up Elon for no reason

* AI will never be able to [some pretty achievable task]

* Tired arguments from pro-IP / copyright sympathizers

kmeisthax · 2024-09-17T16:45:52 1726591552

> Tired ClosedAI joke

> Tired arguments from pro-IP / copyright sympathizers

You forgot "Tired ClosedAI joke from anti-IP / copyleft sympathizers".

Remember that the training data debate is orthogonal to the broader debate over copyright ownership and scope. The first people to start complaining about stolen training data were the Free Software people, who wanted a legal hook to compel OpenAI and GitHub to publish model weights sourced from GPL code. Freelance artists took that complaint and ran with it. And while this is technically an argument that rests on copyright for legitimacy; the people who actually own most of the copyrights - publishers - are strangely interested about these machines that steal vast amounts of their work.

larodi · 2024-09-17T06:34:45 1726554885

Interestingly there should be one which is missing which is well appropriate unless everyone is super smart math professor level genius:

These papers become increasingly difficult to properly comprehend.

…and thus perhaps the plethora of arguably nonsensical follow ups.

CamperBob2 · 2024-09-17T15:11:27 1726585887

These papers become increasingly difficult to properly comprehend.

Feed it to ChatGPT and ask for an explanation suited to your current level of understanding (5-year-old, high-school, undergrad, comp-sci grad student, and so on.)

No, really. Try it.

larodi · 2024-09-21T07:00:17 1726902017

No, really, I've tried it and its okay for a crow's flight over these papers, but I'd never put my trust in random() to fetch me precisely what I'm looking for.

My daily usage of ChatGPT, Claude, etc. for nearly 2 years now shows one and the same - unless I provide enough of the right context for it to get the job done, job is never done right. ever. accidentally maybe, but never ever. and this becomes particularly evident with larger documents.

The pure RAG-based approach is a no go, you cannot be sure important stuff is not omitted. The "feed the document into Context" still by definition will not work correctly thanks to all the bias accumulated in the LLMs layers.

So it is a way to approach papers if you really know what they contain, and know the surrounding terminology. But this is really a no go if you read about .... complex analysis and know nothing about algebra in 5th degree. Sorry, this is not gonna work, and will probably summarily take longer as total time/energy on behalf of the reader.

aurareturn · 2024-09-17T06:14:29 1726553669

>* Claiming it's predictive text engine that isn't useful for anything

This one is very common on HN and it's baffling. Even if it's predictive text, who the hell cares if it achieves its goals? If an LLM is actually a bunch of dolphins typing on a keyboard made for dolphins, I could care less if it does what I need it to do. For people who continue to repeat this on HN, why? I just want to know out of my curiosity.

>* AI will never be able to [some pretty achievable task]

Also very common on HN.

You forgot the "AI will never be able to do what a human can do in the exact way a human does it so AI will never achieve x".

HarHarVeryFunny · 2024-09-17T14:47:24 1726584444

> Even if it's predictive text, who the hell cares if it achieves its goals?

Haha ... well in the literal sense it does achieve "its" goals, since it only had one goal which was to minimize its training loss. Mission accomplished!

OTOH, if you mean achieving the user's goals, then it rather depends on what those goals are. If the goal is to save you typing when coding, even if you need to check it all yourself anyway, then I guess mission accomplished there too!

Whoopee! AGI done! Thanks you Dolphins!

peterhadlaw · 2024-09-17T10:22:01 1726568521

I think it's less about what it is, but what it claims to be. "Artificial Intelligence"... It's not. Dolphin keyboard squad (DKS), then sure.

The "just fancy autocomplete" is in response, but a criticism

aurareturn · 2024-09-17T11:23:47 1726572227

What's wrong with the phrase "artificial intelligence"? To me, it doesn't imply that it's human-like. It's just human created intelligence to me.

danparsonson · 2024-09-17T13:23:12 1726579392

Partly because "artificial intelligence" is a loaded phrase which brings implications of AGI along for the ride, partly because "intelligence" is not a well defined term, so an artificial version of it could be argued to be almost anything, and partly because even if you lean on the colloquial understanding of what "intelligence" is, ChatGPT (and its friends) still isn't it. It's a Chinese Room - or a stochastic parrot.

CamperBob2 · 2024-09-17T15:16:14 1726586174

It's a Chinese Room - or a stochastic parrot.

Show me a resident of a Chinese Room who can do this: https://chatgpt.com/share/66e83ff0-76b4-800b-b33b-910d267a75...

The Chinese Room metaphor was always beneath Searle's intellectual level of play, and it hasn't exactly gotten more insightful with age.

danparsonson · 2024-09-17T23:29:42 1726615782

I understand and agree that ChatGPT achieves impressive results but your appeal to incredulity doesn't make it anything more than it is I'm afraid.

CamperBob2 · 2024-09-18T02:42:49 1726627369

It's not incredulity, just pointing out the obvious. Searle placed very specific limitations on the operator of the Room. He rests his whole argument on the premise that the operator is illiterate in Chinese, or at least has no access to the semantics of the material stored in the Room. That's plainly not the case with ChatGPT, or it couldn't review its previous answers to find and fix its mistakes.

And you certainly would not get a different response, much less a better one, from the operator of a Chinese Room simply by adding "Think carefully step by step" to the request you hand him.

It's just a vacuous argument from square one, and it annoys me to an entirely-unreasonable extent every time someone brings it up. Add it to my "Stochastic Parrot" and "Infinite Monkeys" trigger phrases, I guess.

danparsonson · 2024-09-18T15:28:07 1726673287

> ... He rests his whole argument on the premise that the operator is illiterate in Chinese, or at least has no access to the semantics of the material stored in the Room.

...and yet outputs semantically correct responses.

> That's plainly not the case with ChatGPT, or it couldn't review its previous answers to find and fix its mistakes.

Which is another way of saying, ChatGPT couldn't produce semantically correct output without understanding the input. Disagreeing with which is the whole point of the Chinese Room argument.

Why cannot the semantic understanding be implicitly encoded in the model? That is, why cannot the program I (as the Chinese Room automaton) am following be of sufficient complexity that my output appears to be that of an intelligent being with semantic understanding and the ability to review my answers? That, in my understanding, is where the genius of ChatGPT lies - it's a masterpiece of preprocessing and information encoding. I don't think it needs to be anything else to achieve the results it achieves.

A different example of this is the work of Yusuke Endoh, whom you may know for his famous quines. https://esoteric.codes/blog/the-128-language-quine-relay is to me one of the most astonishing feats of software engineering I've ever seen, and little short of magic - but at its heart it's 'just' very clever encoding. Each subsequent program understands nothing and yet encodes every subsequent program including itself. Another example is DNA; how on Earth does a dumb molecule create a body plan? I'm sure there are lots of examples of systems that exhibit such apparently intelligent and subtly discriminative behaviour entirely automatically. Ant colonies!

> And you certainly would not get a different response, much less a better one, from the operator of a Chinese Room simply by adding "Think carefully step by step" to the request you hand him.

Again, why not? It has access to everything that has gone before; the next token is f(all the previous ones). As for asking it to "think carefully", would you feel differently if the magic phrase was "octopus lemon wheat door handle"? Because it doesn't matter what the words mean to a human - it's just responding to the symbols it's been fed; the fact that you type something meaningful to you just obscures that fact and lends subconscious credence to the idea that it understands you.

> It's just a vacuous argument from square one, and it annoys me to an entirely-unreasonable extent every time someone brings it up. Add it to my "Stochastic Parrot" and "Infinite Monkeys" trigger phrases, I guess.

With no intent to annoy, I hope you at least understand where I'm coming from, and why I think those labels are not just apt, but useful ways to dispel the magical thinking that some (not you specifically) exhibit when discussing these things. We're engineers and scientists and although it's fine to dream, I think it's also fine to continue trying to shoot down the balloons that we send up, so we're not blinded by the miracle of flight.

CamperBob2 · 2024-09-18T17:41:48 1726681308

Why cannot the semantic understanding be implicitly encoded in the model?

That just turns the question into "OK, so what distinguishes the model from a machine capable of genuine understanding and reasoning, then?"

At some point you (and Searle) must explain what the difference is in engineering terms, not through analogy or by appeals to ensoulment or by redecorating the Chinese Room with furnishings it wasn't originally equipped with. Having moved the goalpost back to the far corner of the parking garage already, what's your next move?

It's easy to dismiss a "stochastic parrot" by saying that "The next token is a function of all of the previous ones," but welcome to our deterministic universe, I guess... deterministic, that is, apart from the randomness imparted by SGD or thermal noise or what-have-you. Again, how is this different from what human brains do? Von Neumann himself naturally assumed that stored-program machines would be modeled on networks of neuron-like structures (a factoid I just ran across while reading about McCullough and Pitts), so it's not that surprising that we're finally catching up to his way of looking at it.

At the end of the day we're all just bags of meat trying to minimize our own loss functions. There's nothing special about what we're doing. The magical thinking you're referring to is being done by those who claim "AI isn't doing X" or "AI will never do X" without bothering to define X clearly.

I don't think it needs to be anything else to achieve the results it achieves.

Exactly, and that's earth-shaking because of the potential it has to illuminate the connection between brains and minds. It's sad that the discussion inevitably devolves into analogies to monkeys and parrots.

danparsonson · 2024-09-19T15:17:20 1726759040

> That just turns the question into "OK, so what distinguishes the model from a machine capable of genuine understanding and reasoning, then?"

And that's a great question which is not far away from asking for definitions of intelligence and consciousness, which of course I don't have, however I could venture some suggestions about what we have that LLMs don't, in no particular order:

- Self-direction: we are goal-oriented creatures that will think and act without any specific outside stimulus

- Intentionality: related to the above - we can set specific goals and then orient our efforts to achieve them, sometimes across decades

- Introspection: without guidance, we can choose to reconsider our thoughts and actions, and update our own 'models' by deliberately learning new facts and skills - we can recognise or be given to understand when we're wrong about something, and can take steps to fix that (or choose to double down on it)

- Long term episodic memory: we can recall specific facts and events with varying levels of precision, and correlate those memories with our current experiences to inform our actions

- Physicality: we are not just brains in skulls, but flooded with all manner of chemicals that we synthesise to drive our biological functions, and which affect our decision making processes; we are also embedded in the real physical world and recieving huge amounts of sensory data almost constantly

> At some point you (and Searle) must explain what the difference is in engineering terms, not through analogy or by appeals to ensoulment or by redecorating the Chinese Room with furnishings it wasn't originally equipped with. Having moved the goalpost back to the far corner of the parking garage already, what's your next move?

While I think that's a fair comment, I have to push back a bit and say that if I could give you a satisfying answer to that, then I may well be defining intelligence or consciousness and as far as I know there are no accepted definitions for those things. One theory I like is Douglas Hofstadter's strange loop - the idea of a mind thinking about thinking about thinking about itself, thus making introspection a primary pillar of 'higher mental functions'. I don't see any evidence of LLMs doing that, nor any need to invoke it.

> It's easy to dismiss a "stochastic parrot" by saying that "The next token is a function of all of the previous ones," but welcome to our deterministic universe, I guess... deterministic, that is, apart from the randomness imparted by SGD or thermal noise or what-have-you. Again, how is this different from what human brains do?

...and now we're onto the existence or not of free will... Perhaps it's the difference between automatic actions and conscious choices? My feeling is that LLMs deliberately or accidentally model a key component of our minds, the faculty of pattern matching and recall, and I can well imagine that in some future time we will integrate an LLM into a wider framework that includes other abilities that I listed above, such as long term memory, and then we may yet see AGI. Side note that I'm very happy to accept the idea that each of us encodes our own parrot.

> Von Neumann himself naturally assumed that stored-program machines would be modeled on networks of neuron-like structures (a factoid I just ran across while reading about McCullough and Pitts), so it's not that surprising that we're finally catching up to his way of looking at it.

Well OK but very smart people in the past thought all kinds of things that didn't pan out, so I'm not really sure that helps us much.

> At the end of the day we're all just bags of meat trying to minimize our own loss functions. There's nothing special about what we're doing. The magical thinking you're referring to is being done by those who claim "AI isn't doing X" or "AI will never do X" without bothering to define X clearly.

I don't see how that's magical thinking, it's more like... hard-nosed determinism? I'm interested in the bare minimum necessary to explain the phenomena on display, and expressing those phenomena in straightforward terms to keep the discussion grounded. "AI isn't doing X" is a response to those saying that AI is doing X, so it's as much on those people to define what X is; in any case I rather prefer "AI is only doing Y", where Y is a more boring and easily definable thing that nonetheless explains what we're seeing.

> Exactly, and that's earth-shaking because of the potential it has to illuminate the connection between brains and minds.

Ah! Now there we agree entirely. Actually I think a far more consequential question than "what do LLMs have that makes them so good?" is "what don't we have that we thought we did?".... but perhaps that's because I'm an introspecting meat bag and therefore selfishly fascinated by how and why meat bags introspect.

aurareturn · 2024-09-17T14:54:35 1726584875

Do people really associate AI with AGI?

Because we've been using "AI" to describe things many years before AGI became mainstream. Companies used to use "AI" to describe basic ML algorithms.

When I see "AI", I just think it's some sort of NL or ML. I never think it's AGI. AGI is AGI.

tsimionescu · 2024-09-17T07:39:32 1726558772

One question, if anyone knows the details: does this prove that there exists a single LLM that can approximate any function to arbitrary precision given enough CoT, or does it prove that for every function, there exists a Transformer that fits those criteria?

That is, does this prove that a single LLM can solve any problem, or that for any problem, we can find an LLM that solves it?

jstanley · 2024-09-17T10:12:50 1726567970

Doesn't the latter imply the former?

If it's possible to find an LLM for any given problem, then find an LLM for the problem "find an LLM for the problem and then evaluate it" and then evaluate it, and then you have an LLM that can solve any problem.

It's the "Universal Turing Machine" for LLMs.

I wonder what's the LLM equivalent of the halting problem?

progval · 2024-09-17T10:31:35 1726569095

> It's the "Universal Turing Machine" for LLMs.

A closer analogy is the Hutter Search (http://hutter1.net/ai/pfastprg.pdf), as it is also an algorithm that can solve any problem. And it is probably too inefficient to use in practice, like the Hutter Search.

detourdog · 2024-09-17T11:56:18 1726574178

In the late ‘80s they were called expert systems.

Most demonstrations were regarding troubleshooting large systems, industrial processes, and education.

shawntan · 2024-09-17T02:11:08 1726539068

Theoretical results exist that try to quantify the number of CoT tokens needed to reach different levels of computational expressibility: https://arxiv.org/pdf/2310.07923

TL;DR: Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size. For a field that constantly harps on parallelism and compute efficiency, this requirement seems prohibitive.

We really need to get away from constant depth architectures.

benkuykendall · 2024-09-17T03:33:56 1726544036

> Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size.

So, as stated, this is impossible since it violates the Time Hierarchy Theorem.

The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens. Which is... not a particularly impressive bound? Any non-trivial fixed neural network can, for instance, compute the NAND of two inputs. And any polynomial computable function can be computed with a polynomial number of NAND gates.

shawntan · 2024-09-17T03:43:57 1726544637

> The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens.

You're right.

Re: NAND of two inputs. Isn't this doable even by a single layer (no hidden layers) neural network?

Re: Polynomial computable function. I'm assuming this makes no assumption of constant-depth.

Because my entire point was that the result of this paper is not actually impressive AND covered by a previous paper. Hopefully that's clearer.

ljsprague · 2024-09-17T06:06:55 1726553215

Skynet's the limit.

__loam · 2024-09-17T02:19:00 1726539540

> We have mathematically proven that transformers can solve any problem

We should require that you've passed an algorithms and a thermodynamics class before you can post.

nopinsight · 2024-09-17T02:23:55 1726539835

To be clear I think the tweet is a bit exaggerated (and the word ‘performance’ there doesn’t take into account efficiency, for example) but I don’t have the time to read the full paper (just skimmed the abstract and conclusion). I quoted the tweet by an author for people to discuss since it’s still a fairly remarkable result.

bonoboTP · 2024-09-17T09:51:17 1726566677

This is an accepted ICLR paper by authors from Stanford, Toyota and Google. That's not a guarantee for anything, of course, but they likely know basic algorithms and the second law. You can certainly argue against their claims, but you need to put in the legwork.

__loam · 2024-09-17T15:42:07 1726587727

I don't think I should need to argue with the absurd claim that these can solve any problem.

riku_iki · 2024-09-17T05:53:34 1726552414

> Remarkably, constant depth is sufficient.

I think article also says log(n) embedding size (width?) is required, where n is size of input.

candiddevmike · 2024-09-17T02:00:32 1726538432

> We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.

That seems like a bit of a leap here to make this seem more impressive than it is (IMO). You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

Wake me up when a LLM figures out stable fusion or room temperature superconductors.

krackers · 2024-09-17T02:11:46 1726539106

I think you're misrepresenting the study. It builds upon previous work that examines the computation power of the transformer architecture from a circuit-complexity perspective. Previous work showed that the class of problems that a "naive" Transformer architecture could compute was within TC0 [1, 2] and as a consequence it was fundamentally impossible for transformers to solve certain classes of mathematical problems. This study actually provides a more realistic bound of AC0 (by analyzing the finite-precision case) which rules out even more problems, including such 'simple' ones as modular parity.

We also had previous work that hinted that part of the reason why chain-of-thought works from a theoretical perspective is that it literally allows the model to perform types of computations it could not under the more limited setting (in the same way jumping from FSMs to pushdown automata allows you to solve new types of problems) [3].

[1] https://news.ycombinator.com/item?id=35609652 [2] https://blog.computationalcomplexity.org/2023/02/why-cant-li... [3] https://arxiv.org/abs/2305.15408

shawntan · 2024-09-17T02:16:35 1726539395

Generally, literature on the computational power of the SAME neural architecture can differ on their conclusions based on their premises. Assuming finite precision will give a more restrictive result, and assuming arbitrary precision can give you Turing completeness.

From a quick skim this seems like it's making finite precision assumptions? Which doesn't actually tighten previous bounds, it just makes different starting assumptions.

Am author of [1].

krackers · 2024-09-17T02:18:37 1726539517

Ah my bad, great catch! I've updated my comment accordingly.

shawntan · 2024-09-17T02:33:05 1726540385

You can't really be blamed though, the language in the paper does seem to state what you originally said. Might be a matter of taste but I don't think it's quite accurate.

The prior work they referenced actually did account for finite precision cases and why they didn't think it was useful to prove the result with those premises.

In this work they simply argued from their own perspective why finite precision made more sense.

The whole sub-field is kinda messy and I get quoted differing results all the time.

Edit: Also, your original point stands, obviously. Sorry for nitpicking on your post, but I also just thought people should know more about the nuances of this stuff.

Horffupolde · 2024-09-17T02:03:57 1726538637

It is actually impressive.

One could argue that writing enabled chain of thought across generations.

Veedrac · 2024-09-17T04:09:08 1726546148

> Wake me up when a LLM figures out stable fusion or room temperature superconductors.

Man, the goalposts these days.

FeepingCreature · 2024-09-17T06:08:18 1726553298

"I love [goalposts]. I love the whooshing noise they make as they go by." --Douglas Adams, slightly adjusted

WalterSear · 2024-09-17T04:47:29 1726548449

Shh!! It's working! It's working!

whimsicalism · 2024-09-17T03:21:36 1726543296

it's a TCS result.

seems like many commenting don't know about computability

WalterSear · 2024-09-17T04:52:36 1726548756

> You can say the same thing about humans

1. Holy shit.

2. You can't apply Moore's law to humans.

Tostino · 2024-09-17T06:04:59 1726553099

You can't to chips any more either.

Density has continued to increase, but so have prices. The 'law' was tied to the price to density ratio, and it's been almost a decade now since it died.

gryn · 2024-09-17T08:17:39 1726561059

> 2. You can't apply Moore's law to humans.

not with that attitude. /s

if you take reproduction into account and ignore all the related externalities you can definitely double your count of transistors (humans) every two years.

aurareturn · 2024-09-17T02:29:57 1726540197

> You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

Isn’t this a good thing since compute can be scaled so that the LLM can do generations of human thinking in a much shorter amount of time?

Say humans can solve quantum gravity in 100 years of thinking by 10,000 really smart people. If one AGI is equal to 1 really smart person. Scale enough compute for 1 million AGI and we can solve quantum gravity in a year.

The major assumption here is that transformers can indeed solve every problem humans can.

wizzwizz4 · 2024-09-17T02:39:59 1726540799

> Isn’t this a good thing since compute and be scaled so that the LLM can do generations of human thinking in a much shorter amount of time?

But it can't. There isn't enough planet.

> The major assumption here is that transformers can indeed solve every problem humans can.

No, the major assumptions are (a) that ChatGPT can, and (b) that we can reduce the resource requirements by many orders of magnitude. The former assumption is highly-dubious, and the latter is plainly false.

Transformers are capable of representing any algorithm, if they're allowed to be large enough and run large enough. That doesn't give them any special algorithm-finding ability, and finding the correct algorithms is the hard part of the problem!

aurareturn · 2024-09-17T02:42:00 1726540920

> But it can't. There isn't enough planet.

How much resource are you assuming an AGI would consume?

wizzwizz4 · 2024-09-17T02:53:28 1726541608

Are we talking about "an AGI", or are we talking about overfitting large transformer models with human-written corpora and scaling up the result?

"An AGI"? I have no idea what that algorithm might look like. I do know that we can cover the majority of cases with not too much effort, so it all depends on the characteristics of that long tail.

ChatGPT-like transformer models? We know what that looks like, despite the AI companies creatively misrepresenting the resource use (ref: https://www.bnnbloomberg.ca/business/technology/2024/08/21/h...). Look at https://arxiv.org/pdf/2404.06405:

> Combining Wu’s method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem.

AlphaGeometry had an entire supercomputer cluster, and dozens of hours. GOFAI approaches have a laptop and five minutes. Scale that inconceivable inefficiency up to AGI, and the total power output of the sun may not be enough.

aurareturn · 2024-09-17T03:55:02 1726545302

When computers first became useful, you needed computers the size of rooms to compute. In 2024, my earphones have more compute.

krige · 2024-09-17T06:41:53 1726555313

It's always a hindsight declaration though. Currently we can only say that Intel has reused the same architecture several times already and cranking up the voltage until it breaks because they seem to be yet to find the next design leap, while AMD has been toying around with 3D placement but their latest design is woefully unimpressive. We do not know when the next compute leap will happen until it happens.

visarga · 2024-09-17T03:51:12 1726545072

> Scale enough compute for 1 million AGI and we can solve quantum gravity in a year.

That is wrong, it misses the point. We learn from the environment, we don't secrete quantum gravity from our pure brains. It's a RL setting of exploration and exploitation, a search process in the space of ideas based on validation in reality. A LLM alone is like a human locked away in a cell, with no access to test ideas.

If you take child Einstein and put him on a remote island, and come back 30 years later, do you think he would impress you with is deep insights? It's not the brain alone that made Einstein so smart. It's also his environment that had a major contribution.

exe34 · 2024-09-17T07:05:42 1726556742

if you told child Einstein that light travels at a constant speed in all inertial frames and taught him algebra, then yes, he would come up with special relativity.

in general, an AGI might want to perform experiments to guide its exploration, but it's possible that the hypotheses that it would want to check have already been probed/constrained sufficiently. which is to say, a theoretical physicist might still stumble upon the right theory without further experiments.

westurner · 2024-09-18T00:10:15 1726618215

Labeling of observations better than a list of column label strings at the top would make it possible to mine for insights in or produce a universal theory that covers what has been observed instead of the presumed limits of theory.

CSVW is CSV on the Web as Linked Data.

With 7 metadata header rows at the top, a CSV could be converted to CSVW; with URIs for units like metre or meter or feet.

If a ScholarlyArticle publisher does not indicate that a given CSV or better :Dataset that is :partOf an article is a :premiseTo the presented argument, a human grad student or an LLM needs to identify the links or textual citations to the dataset CSV(s).

Easy: Identify all of the pandas.read_csv() calls in a notebook,

Expensive: Find the citation in a PDF, search for the text in "quotation marks" and try and guess which search result contains the dataset premise to an article;

Or, identify each premise in the article, pull the primary datasets, and run an unbiased automl report to identify linear and nonlinear variance relations and test the data dredged causal chart before or after manually reading an abstract.

aurareturn · 2024-09-17T04:18:00 1726546680

Assumption is that the AGI can solve any problem humans can - including learning from the environment if that is what is needed.

But I think you're missing the point of my post. I don't want to devolve this topic into yet another argument centered around "but AI can't be AGI or can't do what humans can do because so and so".

visarga · 2024-09-17T06:47:03 1726555623

I often see this misconception that compute alone will lead us to surpass human level. No doubt it is inspired by the "scaling laws" we heard so much about. People forget that imitation is not sufficient to surpass human level.

DarkNova6 · 2024-09-17T10:52:28 1726570348

The more interesting is whether the ability of reason and solve problems scales linearly or logarithmically.

m3kw9 · 2024-09-17T03:33:59 1726544039

Sort of like quantum superposition state? So here is an idea, using quantum to produce all possible inferences and use some not yet invented algorithms to collapse to the final result

tooltower · 2024-09-17T02:29:11 1726540151

Constant depth circuits can solve everything? I feel like I missed some important part of circuit complexity. Or this is BS.

shawntan · 2024-09-17T02:35:46 1726540546

Using CoT implicitly increases the depth of the circuit. But yes, poorly worded.

whimsicalism · 2024-09-17T03:22:13 1726543333

CoT means you're adding loops

JSDevOps · 2024-09-17T06:22:14 1726554134

So given infinite time and resources it can solve any problem? Hardly groundbreaking is it.

mrbungie · 2024-09-17T14:53:04 1726584784

The "We need more Nvidia GPUs and we will reach AGI" theorem.

dilyevsky · 2024-09-17T06:36:32 1726554992

Infinite Token Theorem

JTyQZSnP3cQGa8B · 2024-09-17T06:52:27 1726555947

Rendered useless by the infinite money problem.

observationist · 2024-09-17T15:06:50 1726585610

$7 Trillion dollars is all you need.

dotancohen · 2024-09-17T09:23:13 1726564993

Rendered obsolete by the Desperate Venture Capitalist syndrome.

See: MS investing in OpenAI

bilekas · 2024-09-17T06:38:36 1726555116

Nice quip but in reality it's the exact same right?

zeofig · 2024-09-17T07:57:34 1726559854

I have faith that Nvidia can sell them infinity gpus

imhoguy · 2024-09-17T07:03:28 1726556608

Now it is time to prove can it loop into creating an infinite number of paperclips? /s https://en.wikipedia.org/wiki/Instrumental_convergence#Paper...

mg · 2024-09-17T08:24:27 1726561467

Has it been publicly benchmarked yet, if this approach:

    Hello LLM, please solve this task: <task>

Can be improved by performing this afterwards?

    for iteration in range(10):
        Hello LLM, please solve this task: <task>
        Here is a possible solution: <last_reply>
        Please look at it and see if you can improve it.
        Then tell me your improved solution.

lorepieri · 2024-09-18T08:53:02 1726649582

Not sure if it has been benchmarked, but I've called this technique the "for-loop of thought". :)

bachback · 2024-09-17T09:10:00 1726564200

for coding tasks see

https://aider.chat/docs/leaderboards/

the question is how would you define "improve" and "solve". RLHF in a way delegates this to humans.

Kiro · 2024-09-17T08:37:15 1726562235

Isn't that the whole reason that o1 works?

ben_w · 2024-09-17T09:06:56 1726564016

I think o1 is more like "pretend you're doing a job interview, think step and show your working".

I tried something similar to the suggested iterative loop on a blog post I'd authored but wanted help copy editing; first few were good enough, but then it got very confused and decided the blog post wasn't actually a blog post to be edited and instead that what I really wanted to know was the implications of Florida something something Republican Party.

Benchmark would be neat, because all I have is an anecdote.

eykrehbein · 2024-09-17T09:29:53 1726565393

BruteforceLLM

HarHarVeryFunny · 2024-09-17T11:38:27 1726573107

Sure, in same sense as an infinitely long tape let's a Turing machine solve arbitrary problems. In theory at least. If one had the right program.

falcor84 · 2024-09-17T11:41:28 1726573288

It's not clear me what you're saying; isn't the whole deal here that by performing RL on the CoT (given sufficient size and compute) it would converge to the right program?

HarHarVeryFunny · 2024-09-17T11:58:20 1726574300

I was really saying two things:

1) The theoretical notion that a fixed depth transformer + COT can solve arbitrary problems involving sequential computation is rather like similar theoretical notions of a Turing machine as universal computer, or of an ANN with a hidden layer able to represent arbitrary functions .. it may be true, but at the same time not useful

2) The Turing machine, just as the LLM+COT, is only as useful as the program it is running. If the LLM-COT is incapable of runtime learning and just trying to mimic some reasoning heuristics, then that is going to limit it's function, even if theoretically such an "architecture" could do more if only it were running a universal AGI program

Using RL to encourage the LLM to predict continuations according to some set of reasoning heuristics is what it is. It's not going to make the model follow any specific reasoning logic, but is presumably hoped to generate a variety of continuations that the COT "search" will be able to utilize to arrive at a better response than it otherwise would have done. More of an incremental improvement (as reflected in the benchmark scores it achieves) than "converging to the right program".

__loam · 2024-09-17T15:58:23 1726588703

Sometimes reading hackernews makes me want to slam my head on the table repeatedly. Given sufficient size and compute is one of the most load bearing phrases I've ever seen.

falcor84 · 2024-09-17T16:29:21 1726590561

But it is load bearing. I mean, I personally can't stop being amazed at how with each year that passes, things that were unimaginable with all the world's technology a decade ago are becoming straightforward to run on a reasonably priced laptop. And at this stage, I wouldn't bet even $100 against any particular computational problem being solved in some FAANG datacenter by the end of the decade.

HarHarVeryFunny · 2024-09-17T19:03:27 1726599807

That's an apples and oranges comparison.

Technology advances, but it doesn't invent itself.

CPUs didn't magically get faster by people scaling them up - they got faster by evolving the design to support things like multi-level caches, out-of-order execution and branch prediction.

Perhaps time fixes everything, but scale alone does not. It'll take time for people to design new ANN architectures capable of supporting AGI.

__loam · 2024-09-17T18:34:52 1726598092

There's unimaginable and there's physically and mathematically impossible.

falcor84 · 2024-09-17T21:05:47 1726607147

Agreed - but would you wager a bet on what in TFA (or the related discussion) is physically/mathematically impossible?

tossandthrow · 2024-09-17T07:34:41 1726558481

> We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.

This is also the case with plain and regular RNNs

baq · 2024-09-17T07:53:45 1726559625

Now just need an autoregressive transformer <==> RNN isomorphism paper and we're golden

logicchains · 2024-09-17T13:25:07 1726579507

Plain RNNs are theoretically weaker than transformers with COT: https://arxiv.org/abs/2402.18510 .

tossandthrow · 2024-09-17T13:47:39 1726580859

The paper says transformers perform better than RNNs, which is not surprising.

However, they are both, theoretically, Turing complete computers. So they are equally expressive.

Intralexical · 2024-09-17T08:11:40 1726560700

Isn't it also expected to be the case with RNGs?

smusamashah · 2024-09-17T07:49:37 1726559377

https://x.com/ctjlewis/status/1786948443472339247

"Running cellular automata and other programs on Claude 3 Opus."

Its one of the replies on this tweet.

seydor · 2024-09-17T06:44:15 1726555455

'can'

But will they? I believe the frontier has moved to making them make sense instead of just making infinite language.

The infinite monkey problem is not solved yet

scotty79 · 2024-09-17T08:54:03 1726563243

Chain of thought GPT is sort of a Turing machine with a tape that it's allowed to write to for purposes other than outputting the answer.

smusamashah · 2024-09-17T07:53:10 1726559590

A reply in this twitter thread links to a detailed blog post titled "Universal computation by attention: Running cellular automata and other programs on Claude 3 Opus." https://x.com/ctjlewis/status/1786948443472339247

phemartin · 2024-09-18T18:09:22 1726682962

Google Illuminate Summary Chat: https://illuminate.google.com/library?play=UzUI5b_HF8UI

floppiplopp · 2024-09-17T06:46:22 1726555582

They have also mathematically proven that transformers are great randomness generators.

cpldcpu · 2024-09-17T10:47:50 1726570070

Can any of these tools do anything that the Github copilot cannot do? (Apart from using other models?). I tried Continue.dev and cursor.ai, but it was not immediately obvious to me. Maybe I am missing something workflow specific?

empath75 · 2024-09-17T14:15:39 1726582539

Is this more general than LLMs? Is it possible to do something Chain-of-Thought-like in a transformer model that _isn't_ trained on language?

glial · 2024-09-17T02:49:33 1726541373

Apologies if this is a dumb question, but aren't all computations inherently serial? In that a Turing machine performs operations serially?

joe_the_user · 2024-09-17T03:41:33 1726544493

Aren't all computations inherently serial?

No. "inherently serial" refers to problems that are specified serially and can't be spend up by parallel processing. The sum of a set of N numbers is an example of a problem that is not inherently serial. You can use parallel reduction to perform the computation in O(log(N)) time on an idealized parallel computer but it takes O(N) time on an idealized serial computer.

And, it turns, exactly which problems are really are inherently serial is somewhat challenging problem.

visarga · 2024-09-17T04:00:32 1726545632

> The sum of a set of N numbers is an example of a problem that is not inherently serial.

But addition with floats (not reals) is non associative.

immibis · 2024-09-17T05:41:32 1726551692

They didn't say floats, and the sum of a set of floats is not uniquely defined as a float for the rain you stated, at least not without specifying a rounding mode. Most people use "round to whatever my naïve code happens to do" which has many correct answers. To add up a set of floats with only the usual 0.5ULP imprecision, yes, isn't trivial.

rand_r · 2024-09-17T11:29:31 1726572571

Using hardware floating point types is not suitable if mathematical correctness matters, and is largely a deprecated practice. Check out Python’s fraction module for example, for exact arithmetic[0].

[0]: https://www.geeksforgeeks.org/fraction-module-python/

tromp · 2024-09-17T07:52:27 1726559547

Turing Machines are just one of many computational models. Others offer more parallelism. Two examples:

In lambda calculus, disjoint redexes can be reduced in parallel.

And in interaction nets, all active pairs can be reduced in parallel [1].

[]1 https://en.wikipedia.org/wiki/Interaction_nets

ants_everywhere · 2024-09-17T03:06:17 1726542377

You can model parallel computation by an arbitrary finite product of Turing machines. And then, yes, you can simulate that product on a single Turing machine. I think that's the sort of thing you have in mind?

But I'm not aware of what "inherently serial" means. The right idea likely involves talking about complexity classes. E.g. how efficiently does a single Turing machine simulate a product of Turing machines? An inherently serial computation would then be something like a problem where the simulation is significantly slower than running the machines in parallel.

ninetyninenine · 2024-09-17T03:03:20 1726542200

Yeah it's talking about a new feature for LLMs where the output of an LLM is fed back in as input and done again and again and again and this produces way more accurate output.

tonii141 · 2024-09-17T10:27:30 1726568850

Random generator of tokens can also solve any problem if you give it enough time and memory.

qmatch · 2024-09-17T06:39:59 1726555199

Is this similar to the Universal Approximator Theorem?

CarRamrod · 2024-09-17T06:47:31 1726555651

Damn, we just used our entire Round A acquiring an infinite amount of bananas and typewriter ink. The boss is not going to like this.

nopinsight · 2024-09-17T08:26:31 1726561591

No worries! With the magic bananas and ink you've acquired, those monkeys will surely produce output with a signal-to-noise ratio rivaling the best LLMs.

I’m sure your startup will achieve the coveted Apeicorn status soon!

dotancohen · 2024-09-17T09:21:10 1726564870

Naturally.

It's the printer ink that is forbiddingly expensive. And the bananas are carbon neutral.

imjonse · 2024-09-17T09:14:02 1726564442

Hopefully not Cavendish, as those are too sugary for monkeys and you'll just get hallucinations.

bryanrasmussen · 2024-09-17T07:22:07 1726557727

did you get both infinite bananas and infinite typewriter ink, or was there a limited supply of typewriter ink? If the first, it was worth it.

theshrike79 · 2024-09-17T06:51:52 1726555912

Are we getting to a point where the LLM will just answer "42" and we need to figure out the question? =)

bottlepalm · 2024-09-17T01:55:36 1726538136

Forget UBI, we're going to need Universal Basic Compute.