Human performance is 85% [1]. o3 high gets 87.5%. This means we have an algorith...

phillipcarter · 2024-12-20T18:33:43 1734719623

As excited as I am by this, I still feel like this is still just a small approximation of a small chunk of human reasoning ability at large. o3 (and whatever comes next) feels to me like it will head down the path of being a reasoning coprocessor for various tasks.

But, still, this is incredibly impressive.

qt31415926 · 2024-12-20T19:04:02 1734721442

Which parts of reasoning do you think is missing? I do feel like it covers a lot of 'reasoning' ground despite its on the surface simplicity

john_minsk · 2024-12-21T04:31:27 1734755487

My personal 5 cents is that reasoning will be there when LLM gives you some kind of outcome and then when questioned about it can explain every bit of result it produced.

For example, if we asked an LLM to produce an image of a "human woman photorealistic" it produces result. After that you should be able to ask it "tell me about its background" and it should be able to explain "Since user didn't specify background in the query I randomly decided to draw her standing in front of a fantasy background of Amsterdam iconic houses. Usually Amsterdam houses are 3 stories tall, attached to each other and 10 meters wide. Amsterdam houses usually have cranes on the top floor, which help to bring goods to the top floor since doors are too narrow for any object wider than 1m. The woman stands in front of the houses approximately 25 meters in front of them. She is 1,59m tall, which gives us correct perspective. It is 11:16am of August 22nd which I used to calculate correct position of the sun and align all shadows according to projected lighting conditions. The color of her skin is set at RGB:xxxxxx randomly" etc.

And it is not too much to ask LLMs for it. LLMs have access to all the information above as they read all the internet. So there is definitely a description of Amsterdam architecture, what a human body looks like or how to correctly estimate time of day based on shadows (and vise versa). The only thing missing is logic that connects all this information and which is applied correctly to generate final image.

I like to think about LLMs as a fancy genius compressing engines. They took all the information in the internet, compressed it and are able to cleverly query this information for end user. It is a tremendously valuable thing, but if intelligence emerges out of it - not sure. Digital information doesn't necessarily contain everything needed to understand how it was generated and why.

typhon04 · 2024-12-22T08:25:22 1734855922

I see two approaches for explaining the outcome: 1. Reasoning back on the result and justifying it. 2. Explainability - somehow justifying by looking at which neurons have been called. The first could lead to lying. E.g. think of a high schooler explaining copied homework. While the second one does indeed access the paths influencing the decision, but is a hard task due to the inherent way neural networks work.

concordDance · 2024-12-21T09:11:25 1734772285

> if we asked an LLM to produce an image of a "human woman photorealistic" it produces result

Large language models don't do that. You'd want an image model.

Or did you mean "multi-model AI system" rather than "LLM"?

owenpalmer · 2024-12-21T10:03:02 1734775382

It might be possible for a language model to paint a photorealistic picture though.

0points · 2024-12-21T14:54:51 1734792891

It is not.

You are confusing LLM:s with Generative AI.

owenpalmer · 2024-12-22T05:17:52 1734844672

No, I'm not confusing it. I realize that LLMs sometimes connect with diffusion models to produce images. I'm talking about language models actually describing pixel data of the image.

amelius · 2024-12-21T13:21:53 1734787313

Can an LLM use tools like humans do? Could it use an image model as a tool to query the image?

0points · 2024-12-21T14:53:49 1734792829

No, a LLM is a Large Language Model.

It can language.

amelius · 2024-12-21T16:04:05 1734797045

You could teach it to emit patterns that (through other code) invoke tools, and loop the results back to the LLM.

phillipcarter · 2024-12-20T21:34:28 1734730468

I think it's hard to enumerate the unknown, but I'd personally love to see how models like this perform on things like word problems where you introduce red herrings. Right now, LLMs at large tend to struggle mightily to understand when some of the given information is not only irrelevant, but may explicitly serve to distract from the real problem.

zmgsabst · 2024-12-21T04:19:29 1734754769

That’s not inability to reason though, that’s having a social context.

Humans also don’t tend to operate in a rigorously logical mode and understand that math word problems are an exception where the language may be adversarial: they’re trained for that special context in school. If you tell the LLM that social context, eg that language may be deceptive, their “mistakes” disappear.

What you’re actually measuring is the LLM defaults to assuming you misspoke trying to include relevant information rather than that you were trying to trick it — which is the social context you’d expect when trained on general chat interactions.

Establishing context in psychology is hard.

KaoruAoiShiho · 2024-12-20T22:06:08 1734732368

o1 already fixed the red herrings...

Xmd5a · 2024-12-21T06:53:54 1734764034

LLMs are still bound to a prompting session. They can't form long term memories, can't ponder on it and can't develop experience. They have no cognitive architecture.

'Agents' (i.e. workflows intermingling code and calls to LLMs) are still a thing (as shown by the fact there is a post by anthropic on this subject on the front page right now) and they are very hard to build.

Consequence of that for instance: it's not possible to have a LLM explore exhaustively a topic.

mjhagen · 2024-12-21T13:32:38 1734787958

LLMs don’t, but who said AGI should come from LLMs alone. When I ask ChatGPT about something “we” worked on months ago, it “remembers” and can continue on the conversation with that history in mind.

I’d say, humans are also bound to promoting sessions in that way.

Xmd5a · 2024-12-21T15:18:31 1734794311

Last time I used ChatGPT 'memory' feature it got full very quickly. It remembered my name, my dog's name and a couple tobacco casing recipes he came up with. OpenAI doesn't seem to be using embeddings and a vector database, just text snippets it injects in every conversation. Because RAG is too brittle ? The same problem arises when composing LLM calls. Efficient and robust workflows are those whose prompts and/or DAG were obtained via optimization techniques. Hence DSPy.

Consider the following use case: keeping a swimming pool water clean. I can have a long running conversation with a LLM to guide me in getting it right. However I can't have a LLM handle the problem autonomously. I'd like to have it notify me on its own "hey, it's been 2 days, any improvement? Do you mind sharing a few pictures of the pool as well as the ph/chlorine test results ?". Nothing mind-boggingly complex. Nothing that couldn't be achieved using current LLMs. But still something I'd have to implement myself and which turns out to be more complex to achieve than expected. This is the kind of improvement I'd like to see big AI companies going after rather than research-grade ultra smart AIs.

mistermann · 2024-12-21T15:46:19 1734795979

Optimal phenomenological reasoning is going to be a tough nut to crack.

Luckily we don't know the problem exists, so in a cultural/phenomenological sense it is already cracked.

tim333 · 2024-12-21T13:29:36 1734787776

Current AI is good at text but not very good at 3d physical stuff like fixing your plumbing.

amelius · 2024-12-21T13:22:53 1734787373

Does it include the use of tools to accomplish a task?

Does it include the invention of tools?

Agentus · 2024-12-21T02:15:08 1734747308

kinda interesting, every single CS person (especially phds) when talking about reasoning are unable to concisely quantify, enumerate, qualify, or define reasoning.

people with (high) intelligence talking and building (artificial) intelligence but never able to convincingly explain aspects of intelligence. just often talk ambiguously and circularly around it.

what are we humans getting ourselves into inventing skynet :wink.

its been an ongoing pet project to tackle reasoning, but i cant answer your question with regards to llms.

YeGoblynQueenne · 2024-12-21T03:03:42 1734750222

>> Kinda interesting, every single CS person (especially phds) when talking about reasoning are unable to concisely quantify, enumerate, qualify, or define reasoning.

Kinda interesting that mathematicians also can't do the same for mathematics.

And yet.

logicchains · 2024-12-21T13:18:23 1734787103

Mathematicians absolutely can, it's called foundations, and people actively study what mathematics can be expressed in different foundations. Most mathematicians don't care about it though for the same reason most programmers don't care about Haskell.

YeGoblynQueenne · 2024-12-21T14:18:40 1734790720

I don't care about Haskell either, but we know what reasoning is [1]. It's been studied extensively in mathematics, computer science, psychology, cognitive science and AI, and in philosophy going back literally thousands of years with grandpapa Aristotle and his syllogisms. Formal reasoning, informal reasoning, non-monotonic reasoning, etc etc. Not only do we know what reasoning is, we know how to do it with computers just fine, too [2]. That's basically the first 50 years of AI, that folks like His Nobelist Eminence Geoffrey Hinton will tell you was all a Bad Idea and a total failure.

Still somehow the question keeps coming up- "what is reasoning". I'll be honest and say that I imagine it's mainly folks who skipped CS 101 because they were busy tweaking their neural nets who go around the web like Diogenes with his lantern, howling "Reasoning! I'm looking for a definition of Reasoning! What is Reasoning!".

I have never heard the people at the top echelons of AI and Deep learning - LeCun, Schmidhuber, Bengio, Hinton, Ng, Hutter, etc etc- say things like that: "what's reasoning". The reason I suppose is that they know exactly what that is, because it was the one thing they could never do with their neural nets, that classical AI could do between sips of coffee at breakfast [3]. Those guys know exactly what their systems are missing and, to their credit, have never made no bones about that.

_________________

[1] e.g. see my profile for a quick summary.

[2] See all of Russeel & Norvig, as a for instance.

[3] Schmidhuber's doctoral thesis was an implementation of genetic algorithms in Prolog, even.

Agentus · 2024-12-21T15:39:46 1734795586

i have a question for you, in which ive asked many philosophy professors but none could answer satisfactorily. since you seem to have a penchant for reasoning perhaps you might have a good answer. (i hope i remember the full extent of the question properly, i might hit you up with some follow questions)

it pertains to the source of the inference power of deductive inference. do you think all deductive reasoning originated inductively? like when some one discovers a rule or fact that seemingly has contextual predictive power, obviously that can be confirmed inductively by observations, but did that deductive reflex of the mind coagulate by inductive experiences. maybe not all deductive derivative rules but the original deductive rules.

YeGoblynQueenne · 2024-12-21T17:09:32 1734800972

I'm sorry but I have no idea how to answer your question, which is indeed philosophical. You see, I'm not a philosopher, but a scientist. Science seeks to pose questions, and answer them; philosophy seeks to pose questions, and question them. Me, I like answers more than questions so I don't care about philosophy much.

Agentus · 2024-12-21T17:24:44 1734801884

well yeah its partially philosphical, i guess my haphazard use of language like “all” makes it more philosophical than intended.

but im getting at a few things. one of those things is neurological. how do deductive inference constructs manifest in neurons and is it really inadvertently an inductive process that that creates deductive neural functions.

other aspect of the question i guess is more philosophical. like why does deductive inference work at all, i think clues to a potential answer to that can be seen in the mechanics of generalization of antecedents predicting(or correlating with) certain generalized consequences consistently. the brain coagulates generalized coinciding concepts by reinforcement and it recognizes or differentiates inclusive instances or excluding instances of a generalization by recognition properties that seem to gatekeep identities accordingly. its hard to explain succinctly what i mean by the latter, but im planning on writing an academic paper on that.

YeGoblynQueenne · 2024-12-21T18:14:41 1734804881

I'm sorry, I don't have the background to opine on any of the subjects you discuss. Good luck with your paper!

mistermann · 2024-12-21T15:51:54 1734796314

>Those guys know exactly what their systems are missing

If they did not actually, would they (and you) necessarily be able to know?

Many people claim the ability to prove a negative, but no one will post their method.

YeGoblynQueenne · 2024-12-21T17:07:10 1734800830

To clarify, what neural nets are missing is a capability present in classical, logic-based and symbolic systems. That's the ability that we commonly call "reasoning". No need to prove any negatives. We just point to what classical systems are doing and ask whether a deep net can do that.

mistermann · 2024-12-24T13:14:53 1735046093

Do Humans have this ability called "reasoning"?

Agentus · 2024-12-21T03:14:43 1734750883

well lets just say i think i can explain reasoning better than anyone ive encountered. i have my own hypothesized theory on what it is and how it manifests in neural networks.

i doubt your mathmatician example is equivalent.

examples that are fresh on the mind that further my point. ive heard yann lecun baffled by llms instantiation/emergence of reasoning, along with other ai researchers. eric Schmidt thinks the agentic reasoning is the current frontier and people should be focusing on that. was listening to the start of an ai machine learning interview a week ago with some cs phd asked to explain reasoning and the best he could muster up is you know it when you see it…. not to mention the guy responding to the grandparent that gave a cop out answer ( all the most respect to him).

YeGoblynQueenne · 2024-12-21T14:30:14 1734791414

>> well lets just say i think i can explain reasoning better than anyone ive encountered. i have my own hypothesized theory on what it is and how it manifests in neural networks.

I'm going to bet you haven't encountered the right people then. Maybe your social circle is limited to folks like the person who presented a slide about A* to a dumb-struck roomfull of Deep Learning researchers, in the last NeurIps?

https://x.com/rao2z/status/1867000627274059949

Agentus · 2024-12-21T15:15:54 1734794154

possibly, my university doesn’t really do ai research beyond using it as a tool to engineer things. im looking to transfer to a different university.

but no, my take on reasoning is really a somewhat generalized reframing of the definition of reasoning (which you might find on the stanford encylopedia of philosophy) thats reframed partially in axiomatic building blocks of neural network components/terminology. im not claiming to have discovered reasoning, just redefine it in a way thats compatible and sensible to neural networks (ish).

YeGoblynQueenne · 2024-12-21T17:16:28 1734801388

Well you're free to define and redefine anything and as you like, but be aware that every time you move the target closer to your shot you are setting yourself up for some pretty strong confirmation bias.

Agentus · 2024-12-21T17:37:49 1734802669

yeah thats why i need help from the machine interpretability crowd to make sure my hypothesized reframing of reasoning has sufficient empirical basis and isn’t adrift in lalaland.

necovek · 2024-12-21T07:21:09 1734765669

Care to enlighten us with your explanation of what "reasoning" is?

Agentus · 2024-12-21T14:08:30 1734790110

terribly sorry to be such a tease, but im looking to publish a paper on it, and still need to delve deeper into machine interpretability to make sure its empirically properly couched. if u can help with that perhaps we can continue this convo in private.

azeirah · 2024-12-20T23:52:22 1734738742

I'd like to see this o3 thing play 5d chess with multiverse time travel or baba is you.

The only effect smarter models will have is that intelligent people will have to use less of their brain to do their work. As has always been the case, the medium is the message, and climate change is one of the most difficult and worst problems of our time.

If this gets software people to quit en-masse and start working in energy, biology, ecology and preservation? Then it has succeeded.

concordDance · 2024-12-21T09:16:38 1734772598

> climate change is one of the most difficult and worst problems of our time.

Slightly surprised to see this view here.

I can think of half a dozen more serious problems off hand (e.g. population aging, institutional scar tissue, dysgenics, nuclear proliferation, pandemic risks, AI itself) along most axes I can think of (raw $ cost, QALYs, even X-risk).

TranquilMarmot · 2024-12-22T18:42:51 1734892971

None of those problems really matter if we don't have a planet to live on

concordDance · 2024-12-26T09:09:34 1735204174

You've been greviously mislead if you think climate change could plausibly make the world uninhabbitable in the next couple of centuries given current trajectories. I advise going to the primary sources and emailing a climate scientist at your local university for some references.

TranquilMarmot · 2024-12-26T18:13:55 1735236835

> going to the primary sources and emailing a climate scientist at your local university for some references

I assume you've done this, otherwise you wouldn't be telling me to? Bold of you to assume my ignorance on this subject. You sound like you've fallen for corporate grifters who care more about short-term profit and gains over long-term sustainability (or you are one of said grifters, in which case why are you wasting your time on HN, shouldn't you be out there grinding?!)

Severe weather events are going to get more common and more devastating over the next couple of decades. They'll come for you and people you care about, just as they come for me and people I care about. It doesn't matter what you think you know about it.

concordDance · 2024-12-27T08:41:56 1735288916

I've read some climate papers but haven't done the email thing (I should, but have not).

The IPCC summaries are a good read too.

Do you genuinely think severe weather events are going to be even amongst the top ten killers this century? If so, I do strongly advise emailing local uni climate scientist. (What's the worst that can happen? Heck, they might confirm your views!)

(In other circumstances I might go through the whole "what have you observed that has given you this belief?" thing, but in this case there is a simple and reliable check in the form of a 5 minute email)

... actually, I can do so on your behalf... would you like me to? The specific questions I would be asking unless told otherwise would be:

1. Probability of human extinction in the next century due to climate change. 2. Probability of more than 10% of human deaths being due to extreme weather. 3. Places to find good unbiased summaries of the likely effects of climate change.

Any others?

azeirah · 2025-01-05T01:48:44 1736041724

Please do! I would love for you to do this.

Would you be so kind to ask

1. Do you think a tornado has real probability of forming in north-western Europe, where historically there has never been one before? And what do you think are the chances of it being destructive in ways before unseen? (Think Netherlands, Belgium, Germany, ...)

2. How are the attractors (chaos theory) changing? Is it correct to say that, no, our weather prediction models are not going to be more accurate, all we can say is that weather is going to _change_ in all extremes? More intense storms. Colder winters. Hotter summers. Drier droughts.

3. What institution predicted the floods in Spain? Did anyone? Or was this completely unprecedented and a complete surprise?

I think these are my primary questions for now.

TranquilMarmot · 2024-12-28T00:31:59 1735345919

I don't think that humans will go extinct from climate change, but it will drastically change where we can comfortably live and will uproot our ability to make meaningful cultural and scientific progress.

In your comment above you mention: > e.g. population aging, institutional scar tissue, dysgenics, nuclear proliferation, pandemic risks, AI itself

These are all intertwined with each other and with climate change. People are less likely to have kids if they don't think those kids will have a comfortable future. Nuclear war is more likely if countries are competing for less and less resources as we deplete the planet and need to increase food production. Habitat loss from deforestation leads to animals comingling where they normally wouldn't, leading to increased risk of disease spillover into humans.

You claim that somebody saying "climate change is one of the most difficult and worst problems of our time" is a take you're surprised to see here on HN, but I'm more surprised that you don't list it in what you consider important problems.

scotty79 · 2024-12-20T18:38:59 1734719939

Still it's comparing average human level performance with best AI performance. Examples of things o3 failed at are insanely easy for humans.

cchance · 2024-12-20T20:54:57 1734728097

You'd be surprised what the AVERAGE human fails to do that you think is easy, my mom can't fucking send an email without downloading a virus, i have a coworker that believes beyond a shadow of a doubt the world is flat.

The Average human is a lot dumber than people on hackernews and reddit seem to realize, shit the people on mturk are likely smarter than the AVERAGE person

mirkodrummer · 2024-12-21T02:25:06 1734747906

Not being able to send an email or believing the world is flat it’s not a sign of intelligence, I’d rather say it’s more about culture or being more or less scholarized. Your mom or coworker still can do stuff instinctively that is outperforming every algorithm out there and still unexplained how we do it. We still have no idea what intelligence is

staticman2 · 2024-12-20T21:24:08 1734729848

Yet the average human can drive a car a lot better than ChatGPT can, which shows that the way you frame "intelligence" dictates your conclusion about who is "intelligent".

p1esk · 2024-12-20T21:42:57 1734730977

Pretty sure a waymo car drives better than an average SF driver.

Mordisquitos · 2024-12-21T11:53:44 1734782024

And how well would a Waymo car do in this challenge with the ARC-AGI datasets?

manquer · 2024-12-21T09:31:57 1734773517

Waymo cannot handle poor weather at all, average human can.

Being able to perform better than humans in specific constrained problem space is how every automation system has been developed.

While self driving systems are impressive, they don’t drive with anywhere close to skills of the average driver

tim333 · 2024-12-21T13:46:18 1734788778

Waymo blog with video of them driving in poor weather https://waymo.com/blog/2019/08/waymo-and-weather

manquer · 2024-12-21T14:27:41 1734791261

And nikola famously made a video of a truck using one which had no engine, we don’t take a company word for anything until we can verify.

This is not offered to public, they are actively expanding in only cities like LA , Miami or Phoenix now where weather is good through the year.

The tech for bad weather is nowhere close to ready for public. Average human on other hand is driving in bad weather every day

tim333 · 2024-12-21T15:14:48 1734794088

"Extreme Weather" tech "will be available to riders in the near future" https://www.cnet.com/roadshow/news/waymos-latest-robotaxi-is...

daveguy · 2024-12-21T17:32:43 1734802363

I'm sure the source of that CNET article came with a forward looking statements disclaimer.

coldcode · 2024-12-21T13:03:58 1734786238

There's a reason why Waymo isn't offered in Buffalo.

fragmede · 2024-12-21T13:28:07 1734787687

Is that reason because Buffalo is the 81st most populated city in the United States, or 123rd by population density, and Waymo currently only serves approximately 3 cities in North America?

We already let computers control cars because they're better than humans at it when the weather is inclement. It's called ABS.

shadowerm · 2024-12-22T12:18:45 1734869925

I would guess you haven't spent much time driving in the winter in the Northeast.

There is an inherent danger to driving in snow and ice. It is a PR nightmare waiting to happen because there is no way around accidents if the cars are on the road all the time in rust belt snow.

fragmede · 2024-12-22T19:33:13 1734895993

I get the feeling that the years I spent in Boston with a car including during the winter and driving to Ithaca somehow aren't enough, but whether or not I have is irrelevant. Still, I'll repeat the advice I was given before you have to drive in snow, go practice driving in the snow (in eg a parking lot) before needing to do so, esp during a storm. Waymo's been spotted driving in Buffalo doing testing, so it seems someone gave them similar advice. https://www.wgrz.com/article/tech/waymo-self-driving-car-pro...

There's always an inherent risk to driving, even in sunny Phoenix, Az. Winter dangers like black ice further multiply that risk but humans still manage to drive in winter. Taking a picture/video of a snowed over road and judging the width and inventing lanes based on the width taking into account snowbanks doesn't take an ML algorithm. Lidar can see black ice while human eyes can not, giving cars equipped with lidar (wether driven by a human or a computer) an advantage over those without it, and Waymo cars currently have lidar.

I'm sure there are new challenges for Waymo to solve before deploying the service in Buffalo, but it's not this unforeseen gotcha parent comment implies.

As far as the possible PR nightmare, you'd never do self driving cars in the first place if you let that fear control you because, you you pointed out, driving on the roads is inherently dangerous with too many unforeseen complications.

tracerbulletx · 2024-12-20T21:50:54 1734731454

If you take an electrical sensory input signal sequence, and transform it to a electrical muscle output signal sequence you've got a brain. ChatGPT isn't going to drive a car because it's trained on verbal tokens, and it's not optimized for the type of latency you need for physical interaction.

And the brain doesn't use the same network to do verbal reasoning as real time coordination either.

But that work is moving along fine. All of these models and lessons are going to be combined into AGI. It is happening. There isn't really that much in the way.

HarHarVeryFunny · 2024-12-21T15:27:27 1734794847

Maybe, but no doubt these "dumb" people can still get dressed in the morning, navigate a trip to the mall, do the dishes, etc, etc.

It's always been the case that the things that are easiest for humans are hardest for computers, and vice versa. Humans are good at general intelligence - tackling semi-novel problems all day long, while computers are good at narrow problems they can be trained on such as chess or math.

The majority of the benchmarks currently used to evaluate these AI models are narrow skills that the models have been trained to handle well. What'll be much more useful will be when they are capable of the generality of "dumb" tasks that a human can do.

0points · 2024-12-21T14:57:29 1734793049

Your examples are just examples of lack of information. That's not a measure for intelligence.

As a contrary point, most people think they are smarter than they really are.

FrustratedMonky · 2024-12-20T19:04:32 1734721472

There are things Chimps do easily that humans fail at, and vice/versa of course.

There are blind spots, doesn't take away from 'general'.

Matumio · 2024-12-21T13:01:52 1734786112

We can't agree whether Portia spiders are intelligent or just have very advanced instincts. How will we ever agree about what human intelligence is, or how to separate it from cultural knowledge? If that even makes sense.

FrustratedMonky · 2024-12-21T13:57:41 1734789461

I guess my point is more, if we can't decide about Portia Spiders or Chimps, then how can we be so certain about AI. So offering up Portia and Chimps as counter examples.

noobermin · 2024-12-21T12:53:00 1734785580

The downvotes should tell you, this is a decided "hype" result. Don't poo poo it, that's not allowed on AI slop posts on HN.

FrustratedMonky · 2024-12-21T13:10:50 1734786650

Yeah, I didn't realize Chimp studies, or neuroscience were out of vogue. Even in tech, people form strong 'beliefs' around what they think is happening.

cryptoegorophy · 2024-12-20T18:42:30 1734720150

What’s interesting is it might be very close to human intelligence than some “alien” intelligence, because after all it is a LLM and trained on human made text, which kind of represents human intelligence.

hammock · 2024-12-20T18:47:57 1734720477

In that vein, perhaps the delta between o3 @ 87.5% and Human @ 85% represents a deficit in the ability of text to communicate human reasoning.

In other words, it's possible humans can reason better than o3, but cannot articulate that reasoning as well through text - only in our heads, or through some alternative medium.

unsupp0rted · 2024-12-20T21:38:10 1734730690

It's possible humans reason better through text than not through text, so these models, having been trained on text, should be able to out-reason any person who's not currently sitting down to write.

85392_school · 2024-12-20T18:53:57 1734720837

I wonder how much of an effect amount of time to answer has on human performance.

yunwal · 2024-12-20T19:30:29 1734723029

Yeah, this is sort of meaningless without some idea of cost or consequences of a wrong answer. One of the nice things about working with a competent human is being able to tell them "all of our jobs are on the line" and knowing with certainty that they'll come to a good answer.

hamburga · 2024-12-21T02:30:31 1734748231

Agreed. I think what really makes them alien is everything else about them besides intelligence. Namely, no emotional/physiological grounding in empathy, shame, pride, and love (on the positive side) or hatred (negative side).

6gvONxR4sf7o · 2024-12-20T19:41:32 1734723692

Human performance is much closer to 100% on this, depending on your human. It's easy to miss the dot in the corner of the headline graph in TFA that says "STEM grad."

tim333 · 2024-12-21T13:35:37 1734788137

A fair comparison might be average human. The average human isn't a STEM grad. It seems STEM grad approximately equals an IQ of 130. https://www.accommodationforstudents.com/student-blog/the-su...

From a post elsewhere the scores on ARC-AGI-PUB are approx average human 64%, o3 87%. https://news.ycombinator.com/item?id=42474659

Though also elsewhere, o3 seems very expensive to operate. You could probably hire a PhD researcher for cheaper.

jeremyjh · 2024-12-21T15:49:20 1734796160

Why would an average human be more fair than a trained human? The model is trained.

ALittleLight · 2024-12-20T18:38:06 1734719886

It's not saturated. 85% is average human performance, not "best human" performance. There is still room for the model to go up to 100% on this eval.

lastdong · 2024-12-21T05:33:42 1734759222

Curious about how many tests were performed. Did it consistently manage to successfully solve many of these types of problems?

antirez · 2024-12-20T18:58:05 1734721085

NNs are not algorithms.

notfish · 2024-12-20T19:02:38 1734721358

An algorithm is “a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer”

How does a giant pile of linear algebra not meet that definition?

antirez · 2024-12-20T19:07:25 1734721645

It's not made of "steps", it's an almost continuous function to its inputs. And a function is not an algorithm: it is not an object made of conditions, jumps, terminations, ... Obviously it has computation capabilities and is Turing-complete, but is the opposite of an algorithm.

janalsncm · 2024-12-20T22:05:13 1734732313

If it wasn’t made of steps then Turing machines wouldn’t be able to execute them.

Further, this is probably running an algorithm on top of an NN. Some kind of tree search.

I get what you’re saying though. You’re trying to draw a distinction between statistical methods and symbolic methods. Someday we will have an algorithm which uses statistical methods that can match human performance on most cognitive tasks, and it won’t look or act like a brain. In some sense that’s disappointing. We can build supersonic jets without fully understanding how birds fly.

antirez · 2024-12-20T22:44:54 1734734694

Let's see that Turing machines can approximate the execution of NN :) That's why there are issues related to numerical precision, but the contrary is also true indeed, NNs can discover and use similar techniques used by traditional algorithms. However: the two remain two different methods to do computations, and probably it's not just by chance that many things we can't do algorithmically, we can do with NNs, what I mean is that this is not just related to the fact that NNs discover complex algorithms via gradient descent, but also that the computational model of NNs is more adapt to solving certain tasks. So the inference algorithm of NNs (doing multiplications and other batch transformations) is just needed for standard computers to approximate the NN computational model. You can do this analogically, and nobody would claim much (maybe?) it's running an algorithm. Or that brains themselves are algorithms.

necovek · 2024-12-21T07:43:47 1734767027

Computers can execute precise computations, it's just not efficient (and it's very much slow).

NNs are exactly what "computers" are good for and we've been using since their inception: doing lots of computations quickly.

"Analog neural networks" (brains) work much differently from what are "neural networks" in computing, and we have no understanding of their operation to claim they are or aren't algorithmic. But computing NNs are simply implementations of an algorithm.

Edit: upon further rereading, it seems you equate "neural networks" with brain-like operation. But brain was an inspiration for NNs, they are not an "approximation" of it.

antirez · 2024-12-21T08:00:39 1734768039

But the inference itself is orthogonal to the computation the NN is going. Obviously the inference (and training) are algorithms.

tsimionescu · 2024-12-21T17:00:16 1734800416

NN inference is an algorithm for computing an approximation of a function with a huge number of parameters. The NN itself is of course just a data structure. But there is nothing whatsoever about the NN process that is non-algorithmic.

It's the exact same thing as using a binary tree to discover the lowest number in some set of numbers, conceptually: you have a data structure that you evaluate using a particular algorithm. The combination of the algorithm and the construction of the data structure arrive at the desired outcome.

antirez · 2024-12-21T17:21:29 1734801689

That's not the point, I think: you can implement the brain in BASIC, in theory, this does not means that the brain is per-se a BASIC program. I'll provide a more theoretical framework for reasoning about this: if the way to solve certain problems by an NN (the learned weights) can't be translated in some normal program that DOES NOT resemble the activation of an NN, then the NNs are not algorithms, but a different computational model.

tsimionescu · 2024-12-22T10:20:05 1734862805

This may be what they were getting it, but it is still wrong. An NN is a computable function. So, NN inference is an algorithm for computing the function the NN represents. If we have an NN that represents a function f, with f(text) = most likely next character a human would write, then running the inference for that NN is an algorithm for finding out which character it's most likely a human would write next.

It's true that this is not an "enlightening" algorithm, it doesn't help us understand why or how that is the most likely next character. But this doesn't mean it's not an algorithm.

zeroonetwothree · 2024-12-21T06:40:43 1734763243

We don’t have evidence that a TM can simulate a brain. But we know for a fact that it can execute a NN.

raegis · 2024-12-20T20:58:09 1734728289

> It's not made of "steps", it's an almost continuous function to its inputs.

Can you define "almost continuous function"? Or explain what you mean by this, and how it is used in the A.I. stuff?

taneq · 2024-12-21T01:12:41 1734743561

Well, it's a bunch of steps, but they're smaller. /s

CooCooCaCha · 2024-12-21T20:23:31 1734812611

Each layer of the network is like a step, and each token prediction is a repeat of those layers with the previous output fed back into it. So you have steps and a memory.

necovek · 2024-12-21T07:35:36 1734766536

I would say you are right that function is not an algorithm, but it is an implementation of an algorithm.

Is that your point?

If so, I've long learned to accept imprecise language as long as the message can be reasonably extracted from it.

mvkel · 2024-12-21T03:24:35 1734751475

> continuous

So, steps?

necovek · 2024-12-21T07:29:05 1734766145

"Continuous" would imply infinitely small steps, and as such, would certainly be used as a differentiator (differential? ;) between larger discrete stepped approach.

In essence, infinite calculus provides a link between "steps" and continuous, but those are different things indeed.

benlivengood · 2024-12-20T19:21:45 1734722505

Deterministic (ieee 754 floats), terminates on all inputs, correctness (produces loss < X on N training/test inputs)

At most you can argue that there isn't a useful bounded loss on every possible input, but it turns out that humans don't achieve useful bounded loss on identifying arbitrary sets of pixels as a cat or whatever, either. Most problems NNs are aimed at are qualitative or probabilistic where provable bounds are less useful than Nth-percentile performance on real-world data.

drdeca · 2024-12-20T21:39:54 1734730794

How do you define "algorithm"? I suspect it is a definition I would find somewhat unusual. Not to say that I strictly disagree, but only because to my mind "neural net" suggests something a bit more concrete than "algorithm", so I might instead say that an artificial neural net is an implementation of an algorithm, rather than or something like that.

But, to my mind, something of the form "Train a neural network with an architecture generally like [blah], with a training method+data like [bleh], and save the result. Then, when inputs are received, run them through the NN in such-and-such way." would constitute an algorithm.

necovek · 2024-12-21T07:33:27 1734766407

NN is a very wide term applied in different contexts.

When a NN is trained, it produces a set of parameters that basically define an algorithm to do inference with: it's a very big one though.

We also call that a NN (the joy of natural language).

KeplerBoy · 2024-12-20T20:28:09 1734726489

Running inference on a model certainly is a algorithm.

dyauspitr · 2024-12-20T23:58:12 1734739092

I’ll believe it when the AI can earn money on its own. I obviously don’t mean someone paying a subscription to use the AI I mean, letting the AI lose on the Internet with only the goal of making money and putting it into a bank account.

creer · 2024-12-21T19:07:35 1734808055

You don't think there are already plenty of attempts out there?

When someone is "disinterested enough" to publish though, note the obvious way to launch a new fund or advisor with a good track record: crank out a pile of them, run them one or two years, discard the many losers and publish the one or two top winners. I.E. first you should be suspicious of why it's being published, then of how selected that result is.

hamburga · 2024-12-21T02:38:39 1734748719

Do trading bots count?

1659447091 · 2024-12-21T07:00:48 1734764448

No, the AI would have to start from zero and reason it's way to making itself money online, such as the humans who were first in their online field of interest (e-commerce, scams, ads etc from the 80's and 90's) when there was no guidance, only general human intelligence that could reason their way into money making opportunities and reason their way into making it work.

concordDance · 2024-12-21T09:19:09 1734772749

I don't think humans ever do that. They research/read and ask other humans.

1659447091 · 2024-12-21T20:46:57 1734814017

Which AI already has stored in spades, even more so since people in the 80's 90's weren't working with the information available today. The AI is free to research and read all the information stored from other humans as well, just like the humans who reasoned their way into money making opportunities--only with vastly more information now, talk about an advantage. But is it intelligent enough do so without a human giving direct/step-by-step instructions; the way humans figure it out?

hypoxia · 2024-12-20T20:24:29 1734726269

It actually beats the human average by a wide margin:

- 64.2% for humans vs. 82.8%+ for o3.

...

Private Eval:

- 85%: threshold for winning the prize [1]

Semi-Private Eval:

- 87.5%: o3 (unlimited compute) [2]

- 75.7%: o3 (limited compute) [2]

Public Eval:

- 91.5%: o3 (unlimited compute) [2]

- 82.8%: o3 (limited compute) [2]

- 64.2%: human average (Mechanical Turk) [1] [3]

Public Training:

- 76.2%: human average (Mechanical Turk) [1] [3]

...

References:

[1] https://arcprize.org/guide

[2] https://arcprize.org/blog/oai-o3-pub-breakthrough

[3] https://arxiv.org/abs/2409.01374

usaar333 · 2024-12-20T20:50:28 1734727828

Super human isn't beating rando mech turk.

Their post has stem grad at nearly 100%

tripletao · 2024-12-20T22:32:58 1734733978

This is correct. It's easy to get arbitrarily bad results on Mechanical Turk, since without any quality control people will just click as fast as they can to get paid (or bot it and get paid even faster).

So in practice, there's always some kind of quality control. Stricter quality control will improve your results, and the right amount of quality control is subjective. This makes any assessment of human quality meaningless without explanation of how those humans were selected and incentivized. Chollet is careful to provide that, but many posters here are not.

In any case, the ensemble of task-specific, low-compute Kaggle solutions is reportedly also super-Turk, at 81%. I don't think anyone would call that AGI, since it's not general; but if the "(tuned)" in the figure means o3 was tuned specifically for these tasks, that's not obviously general either.

dmead · 2024-12-21T10:06:52 1734775612

This is so strange. people think that an llm trained on programming questions and docs can do mundane tasks like this means intelligent? Come on.

It really calls into question two things.

1. You don't know what you're talking about about.

2. You have a perverse incentive to believe this such that you will preach it to others and elevate some job salary range or stock.

Either way, not a good look.

javaunsafe2019 · 2024-12-21T13:42:44 1734788564