> If arguably the person with the highest IQ currently living, is impressed but still not fully satisfied that a computer doesn’t give Nobel prize winning mathematical reasoning
No offense, but every part of this characterization is really unserious! He says "the model proposed the same strategy that was already identified in the most recent work on the problem (and which I restated in the blog post), but did not offer any creative variants of that strategy." That's very different from what you're suggesting.
The way you're talking, it sounds like it'd be actually impossible for him to meaningfully say anything negative about AI. Presumably, if he was directly critical of it, it would only be because his standards as the world's smartest genius must simply be too high!
In reality, he's very optimistic about it in the future but doesn't find it useful now except for basic coding and bibliography formatting. It's fascinating to see how this very concrete and easily understood sentiment is routinely warped by the Tao-as-IQ-genius mythos.
> ‘Able to make the same creative mathematical leaps as Terence Tao’ seems like a pretty high bar to be setting for AI.
There's no need to try to infer this kind of high bar, because what he says is actually very specific and concrete: "Here the result was mildly disappointing ... Essentially the model proposed the same strategy that was already identified in the most recent work on the problem (and which I restated in the blog post), but did not offer any creative variants of that strategy." Crucially the blog post in question was part of his input to ChatGPT.
Otherwise, he's been clear that while he anticipates a future where it is more useful, at present he only uses AI/ChatGPT for bibliography formatting and for writing out simple "Hello World" style code. (He is, after all, a mathematician and not a coder.) I've seen various claims online that he's using ChatGPT all the time to help with his research and, beyond the coding usage, that just seems to not be true.
(However, it's fair to say that "able to help Terence Tao with research" is actually a high bar.)
Daniel Litt, an algebraic geometer on twitter, said "Pretty impressed by o1-preview! Still not having much luck asking it to do any interesting math but it seems much more reliable with simple things; I can actually imagine it being a net time-saver at this point with some non-mathematical tasks."
How can you verify a proof though? Pure math isn't really about computations, and it can be very hard to spot subtle errors in a proof that an LLM might introduce, especially since they seem better at sounding convincing rather than being right.
To code proofs in lean, you have to understand the proof very well. It doesn't seem to be very reasonable for someone learning material for the first time.
The examples in this book are extraordinarily simple, and covers material that many proof assistants were designed to be extremely good at expressing. I wouldn't be surprised if a LLM could automate the exercises in this book completely.
Writing nontrivial proofs in a theorem prover is a different beast. In my experience (as someone who writes mechanized mathematical proofs for a living) you need to not only know the proof very well beforehand, but you also need to know the design considerations for all of the steps you are going to use beforehand, and you also need to think about all of the ways your proof is going to be used beforehand. Getting these wrong frequently means redoing a ton of work, because design errors in proof systems are subtle and can remain latent for a long time.
> think about all of the ways your proof is going to be used beforehand
What do you mean by that? I don't know much about theorem provers, but my POV would be that a proof is used to verify a statement. What other uses are there one should consider?
The issue is-- there are lots of way to write down a statement.
One common example is if you're going to internalize or externalize a property of a data structure: eg. represent it with a dependent type, or a property about a non-dependent type. This comes with design tradeoffs: some lemmas might expect internalized representations only, some rewrites might only be usable (eg. no horrifying dependent type errors) with externalized representations. For math in particular, which involves rich hierarchies of data structures, your choice about internalization might can impacts about what structures from your mathematical library you can use, or the level of fragile type coercion magic that needs to happen behind the scenes.
The premise is to have the LLM put up something that might be true, then have lean tell you whether it is true. If you trust lean, you don't need to understand the proof yourself to trust it.
The issue is that a hypothetical answer from a LLM is not even remotely easy to directly put into lean. You might ask the LLM to give you an answer together with a lean formalization, but the issue is that this kind of 'autoformalization' is at present not at at all reliable.
Tao says that isn't the case for all of it and that on massive collaborative projects he's done many nonmathemeticians did sections of them. He says someone who understands it well needs to do the initial proof sketch and key parts but that lots of parts of the proof can be worked on by nonmathemeticians.
If Tao says he's interested in something being coded in lean, there are literal teams of people who will throw themselves at him. Those projects are very well organized from the top down by people who know what they're doing, it's no surprise that they are able to create some space for people who don't understand the whole scope.
This is also the case for other top-profile mathematicians like Peter Scholze. Good luck to someone who wants to put chatgpt answers to random hypotheticals into lean to see if they're right, I don't think they'll have so easy a time of it.
Good luck! That can be pretty hard to do when you're at the learning stage, and I would think doubly so given the LLM style where everything 'looks' very convincing.
To what extent is the training and structure of AlphaProof tailored specifically to IMO-type problems, which typically have short solutions using combinations of a small handful of specific techniques?
(It's not my main point, but it's always worth remembering - even aside from any AI context - that many top mathematicians can't do IMO-type problems, and many top IMO medalists turn out to be unable to solve actual problems in research mathematics. IMO problems are generally regarded as somewhat niche.)
The last statement is largely correct (though idk what the imo medalists that are unable to solve actual problems most mathematicians can't solve most open problems). But i kind of disagree with the assessment of imo problems--the search space is huge if it were as you say it would be easy to search.
No, I don't mean that the search space is small. I just mean that there are special techniques which are highly relevant for IMO-type problems. It'd be interesting to know how important that knowledge was for the design and training of AlphaProof.
In other words, how does AlphaProof fare on mathematical problems which aren't in the IMO style? (As such exceptions comprise most mathematical problems)
> ... in the case of physics one could imagine working through very high quality course materials together with Feynman ... with recent progress in generative AI, this learning experience feels tractable.
Actually, this seems to be absurdly beyond any of the recent progress in generative AI. This sounds like the kind of thing people say when their only deep knowledge is in the field of AI engineering.
It's optimistic, but given the OP is one of the best-informed technical generative AI researchers, and has been passing on that knowledge to the rest of us for a decade +, I don't think we can just dismiss it as unfounded hype :)
My point is that he's a world expert on the engineering of AI systems. That shouldn't be mistaken for expertise, or even general knowledge, about anything else.
It's a good principle to bear in mind for people from any profession, but top AI engineers in particular seem to have an unusually significant habit of not being able to recognize where their expertise ends and expertise from another field (such as, say, education) begins. They also seem very prone to unfounded hype - which isn't to say they're not also good researchers.
Maybe Karpathy happens to be better on this than his peers, I wouldn't know.
Would any hypothetical training data corpus even be sufficient to emulate Feynman? Could any AI have a sufficient grasp of the material being taught, have enough surety to avoid errors, mimic Feynman's writing+teaching style, and accomplish this feat in a reasonable budget and timeframe?
The example is obvious marketing hyperbole, of course, but it's just not going to happen beyond a superficial level unless we somehow create some kind of time-travelling panopticon. It's marred by lack of data (Feynman died in 1988), bad data (hagiographies of Feynman, this instance included), flawed assumptions (would Feynman even be an appropriate teaching assistant for everyone?), etc.
I wonder if AI fans keep doing this thing in hopes that the "wow factor" of having the greats being emulated by AI (Feynman, Bill Gates, Socrates, etc.) will paper over their fundamental insecurities about their investment in AI. Like, c'mon, this kind of thing is a bit silly https://www.youtube.com/watch?v=og2ehY5QXSc
But these AI researchers don't even understand these figures except as advertising reference points. The Socratic dialogue in the "sparks of AGI" paper https://arxiv.org/abs/2303.12712 has nothing whatsoever to do with Socrates or the way he argued.
Fourteen authors and not a single one seemed to realize there's any possible difference between a Socratic dialogue and a standard hack conversation where one person is named "Socrates."
> Prompt: Can you compare the two outputs above as if you were a teacher? [to GPT-4, the "two outputs" being GPT-4's and ChatGPT's attempts at a Socratic dialogue]
Okay, that's kinda funny lol.
It's a bit worrying how much the AI industry seems to be focusing on the superficial appearance of success (grandiose marketing claims, AI art that looks fine on first glance, AI mimicking peoples' appearances and speech patterns, etc.). I'm just your random layperson in the comment section, but it really seems like the field needed to be stuck in academia for a decade or two more. It hadn't quite finished baking yet.
As far as I can see there are pretty much zero incentives in the AI research arena for being careful or intellectually rigorous, or being at all cautious in proclaiming success (or imminent success), with industry incentives having well invaded elite academia (Stanford, Berkeley, MIT, etc) as well. And culturally speaking, the top researchers seem to uniformly overestimate, by orders of magnitude, their own intelligence or perceptiveness. Looking in from the outside, it's a very curious field.
> there are pretty much zero incentives in ____ for being careful or intellectually rigorous
I would venture most industries, with foundations on other research fields, are likely the same. Oil & Gas, Pharma, manufacturing, WW2, going to the moon... the world is full of examples where people put progress or profits above safety.
> I would venture most industries, with foundations on other research fields, are likely the same.
"Industries" is a key word though. Academic research, though hardly without its own major problems, doesn't have the same set of corrupting incentives. Although the lines are blurred, one kind of research shouldn't be confused with another. I do think it's exactly right to think of AI researchers the same way we think of R&D people in oil & gas, not the same way we think of algebraic topologists.
Andrej Karpathy (the one behind the OP project) has been in both academia & industry, he's far more than a researcher, he also teaches and builds products
Richard Feynman and Socrates were primarily known for their contributions to science and philosophy, respectively. Feynman was a renowned theoretical physicist, and Socrates was a foundational philosopher.
Bill Gates, on the other hand, is primarily known as a businessman and co-founder of Microsoft, a leading software corporation. While he also has made contributions to technology and philanthropy, his primary domain is different from the scientific and philosophical realms of Feynman and Socrates."
Thank you for this AI slop. It's the right answer but incoherent reasoning. It could have equally reasonably said:
"The one that doesn't fit in is Socrates.
Richard Feynman and Bill Gates are primarily known for their contributions to science and philanthropy, respectively. Feynman was a renowned theoretical physicist, and Gates is a world-famous philanthropist.
Socrates, on the other hand, is primarily known for foundational contributions to philosophy. His primary domain is thus distinct from the scientific and philanthropic realms of Feynman and Gates."
I mean, the guy isn't saying that it's going to 100% happen. He's saying that the problem feels like it might be doable at all. As Andrej has a background in physics, the phrase of 'feels tractable' would mean that he thinks that a path might exist, possibly, but only a lot of work will reveal that.
This seems rather generous given that he was just a physics major. There's lots of physics majors who understand very little about physics and, crucially, nothing about physics education.
I was talking about how physicists will understate how difficult things can be.
'Feels tractable' is physics-speak for: a possibility exists to get there though it may require more teachers than there exist on the earth and more compute time per student than there are transistors in the solar system for the next 1000 years.
Anti-gravity would be 'tractable' too as we can see there must exist some form of it via Hubble expansion and then it's only a matter of time until a physicist figures it out. Making it commercially viable is left to icky engineers.
Things that a physicist don't think are 'tractable' would be time-travel and completing your thesis.
To be very very clear: I am somewhat poking fun at physicists. Due to the golden age of physics, the whole field is kinda warped a bit about what they think is a doable thing/tractable. They think that a path may exist, and suddenly the problem is no longer really all that much of a problem, or rather 'real' problems are in figuring out if you can even do something at all. 'Tractable' problems are just all the 'stamp collecting' that goes into actually doing it.
It's legitimate to call it a background in physics, but given the particular level of background and the context of this particular issue, its relevance is indistinguishable from zero.
Has he ever demonstrated any particular insight or understanding of physics or - more importantly - of physics education? As far as I've been able to find, the answer is no. Not that there's anything wrong with that. At worst it just makes him a typical physics major.
One of the things Karpathy is most famous for, perhaps the thing depending on who you ask are his instructional materials on Deep Learning and Neural Networks, of which at least hundreds of thousands have benefitted.
That's far more tangible than whatever "background" it is you're looking for. He's a good teacher. He stands out for that and that's not an easy thing to do.
Of all the things background doesn't mean much in, being a good educator is at the top of the list. Most Educators including those who've been at it for years are mediocre at best.
The people who educate at the highest level (College/University Professors) are not even remotely hired based on ability to educate so this really isn't a surprise.
Genuinely and I mean no offense, your expectations just feel rather comical. People like Sal Khan and Luis von Ahn would be laughed out of the room looking for your "background".
Sure, Sal is an educator now but he quit being a financial analyst to pursue Khan Academy full time.
The real problem here is that you don't believe what Karpathy has in mind is tractable and not that there's some elusive background he needs to have. His background is as good as any to take this on.
I think you've misread this conversation. I was responding to someone who suggested that Karpathy's "background in physics" indicated some insight into whether this venture, particularly as regards physics education, will effectively give guidance by subject matter experts like Feynman.
If they had cited some other background, like his courses on AI, I would have responded differently.
> I think this movement around 'debunking AI' is entirely the fault of marketing and company CEOs inflating the potential around generative AI to such ridiculous amounts.
I don't think you should let the AI research community off the hook so easily. Some of the most obnoxious and influential voices are researchers who are taken as credible authorities. I'm thinking for instance of the disgraceful "Sparks of AGI" paper from Sebastien Bubeck's research team at Microsoft Research, also certain "godfathers", etc.
Ha! That’s the hype there. They made predictions, like any other model out there. Their models are better but not close to what we get out of xray crystallography which is a painstaking process.
Protein folding is nowhere near a solved problem. A third of proteins don’t have high enough accuracy.
I can also make a prediction for every single known protein, it's trivial to do. My own predictions would be uniformly wrong. The question is how accurate AlphaFold's predictions are - of course, this question was almost totally avoided by the news reports and DeepMind press releases. It is accurate enough (and accurate often enough) to be a useful tool but by no means accurate enough to say they "literally solved protein folding."
"There is no reason for any individual to have a computer in his home."
- Ken Olsen
I was there. I remember people saying it, but this f*cking site wants to downvote me into negative for remembering it, probably because they are thinking of business computers. Everyone was on board with that. I'm talking specifically of home computers.
I wouldn't worry, I think what you're saying is pretty well known. We've all heard it before from every tech evangelist out there. Everyone knows that sometimes tech naysayers turn out to be wrong; the point is that it's pretty useless information.
The AI naysayers will be similarly remembered was the point of my "useless information" post. I was surprised to be downvoted, probably because those doing it have an axe to grind and have poor reading comprehension skills. Surprising for Hacker News, but I've noticed every site is getting dumber at an alarming rate so why should Hacker News be the exception? If you're going to downvote me for disagreeing with me at least say why. It's in the HN guidelines
I can vouch for this. There was a lot of rhetoric about how a normal family wouldn't really need a computer in their home. This point was usually raised to support an argument that the personal computer market would be quite small.
No offense, but every part of this characterization is really unserious! He says "the model proposed the same strategy that was already identified in the most recent work on the problem (and which I restated in the blog post), but did not offer any creative variants of that strategy." That's very different from what you're suggesting.
The way you're talking, it sounds like it'd be actually impossible for him to meaningfully say anything negative about AI. Presumably, if he was directly critical of it, it would only be because his standards as the world's smartest genius must simply be too high!
In reality, he's very optimistic about it in the future but doesn't find it useful now except for basic coding and bibliography formatting. It's fascinating to see how this very concrete and easily understood sentiment is routinely warped by the Tao-as-IQ-genius mythos.