What do you think, how long until we can give conjectures to a specially trained...

vakkermans · on Dec 8, 2023

Finding a proof is a search in a large space, akin to searching from abstractions to concretions. LLMs don’t do anything like this, and so you’re looking at the planning problem again. It’s not clear to me how framing this particular problem as a language problem is helpful in any case.

sfink · on Dec 8, 2023

Perhaps an LLM could produce a plausible-looking proof that is completely and utterly incorrect.

As a party trick.

generalizations · on Dec 8, 2023

Wouldn't that be the point of pairing it with Lean? You wouldn't get false positives.

Sharlin · on Dec 8, 2023

Doesn’t mean you’d get true positives either. Garbage in, garbage out.

akoboldfrying · on Dec 9, 2023

IIUC, any sensible way of "pairing up" these things will mean that anything you get out will be true. But the search might take millennia, and the outcome might be nothing (equivalently, "the LLM's conjecture is false").

hackerlight · on Dec 9, 2023

This is equivalent to saying it's not going to work.

thisismyswamp · on Dec 8, 2023

Playing chess & go is also search in a large tree of moves leading to particular game states

pxeger1 · on Dec 8, 2023

But AlphaGo etc don’t use any kind of language-based AI, so LLMs (which this thread was about) are no good.

thisismyswamp · on Dec 8, 2023

The next step seems to be applying past advances in reinforcement learning with modern transformer based models

mattsan · on Dec 8, 2023

Which multiple teams are working on - OpenAI (Q*), and Meta just released a reinforcement learning framework

npsomaratna · on Dec 9, 2023

Could you point me towards Meta's reinforcement learning framework? I'd like to see how it stacks up against the OpenAI gym.

mattsan · on Dec 10, 2023

Sure thing - https://pearlagent.github.io/

HN post here: https://news.ycombinator.com/item?id=38564526

npsomaratna · on Dec 10, 2023

Thank you!

greysphere · on Dec 8, 2023

The final state in chess is a single* state which yes, then branches out to N checkmate configurations and then N*M one-move-from-checkmates, and so on. (*Technically it's won/lost/draw.)

The equivalent final state in theorem proving is unique to each theorem so such a system would need to handle an additional layer-of-generalization.

ChainOfFools · on Dec 9, 2023

Is this how some of the more advanced chess engines work, or even the not so advanced ones, where there's a point at which it stops searching the forward move tree in greatest depth, and instead starts searching backwards from a handful of plausible (gross move limit-bound) checkmate states looking for an intersection with a shallow forward search state?

zone411 · on Dec 9, 2023

Kind of, but it's calculated offline and then just accessed during the game: https://www.chessprogramming.org/Endgame_Tablebases

cubefox · on Dec 9, 2023

Language models are currently the best AI systems that solve general reasoning problems, including mathematical ones. So it seems more than obvious to model the problem of finding proofs as a language problem.

llwu · on Dec 8, 2023

I contributed a few trivial proofs to this project, and I tried enlisting GPT-4, Copilot, Moogle, and Morph Prover to help out (I did not try LLMStep or ReProver).

Out of these:

- GPT-4 answered maybe one or two questions about syntax

- Copilot autocompleted maybe 0.5%

- Moogle was helpful much more often but still often pointed to the wrong theorem in the right file; in most of those cases I could have just done go-to-def to get to the right file and scroll through it

Note that these results are actually very good! And Terence Tao seems to have a positive opinion as well. But we're very far away from automatically proving conjectures IMO.

I will say that Lean has great metaprogramming facilities to accept any type of AI innovation that might emerge. Currently I find the most helpful tactics to be `aesop?`, which is a classical search tactic that tries a bunch of substitutions and simplifications; and `exact?`, when you know there's a trivial proof but not sure of the name. But the UX for AI can be as simple as, for example, typing the `gptf` or `llmstep` tactic, and that is fully hooked into the Lean state (goals, hypotheses, etc). So there's a lot of opportunity for incremental progress in improving existing workflows (which consist of dispatching trivial manipulations with search/simplification tactics)

golly_ned · on Dec 9, 2023

Recent work on combining LLMs with theorem provers with promising initial results:

https://paperswithcode.com/paper/linc-a-neurosymbolic-approa...

> Logical reasoning, i.e., deductively inferring the truth value of a conclusion from a set of premises, is an important task for artificial intelligence with wide potential impacts on science, mathematics, and society. While many prompting-based strategies have been proposed to enable Large Language Models (LLMs) to do such reasoning more effectively, they still appear unsatisfactory, often failing in subtle and unpredictable ways. In this work, we investigate the validity of instead reformulating such tasks as modular neurosymbolic programming, which we call LINC: Logical Inference via Neurosymbolic Computation. In LINC, the LLM acts as a semantic parser, translating premises and conclusions from natural language to expressions in first-order logic. These expressions are then offloaded to an external theorem prover, which symbolically performs deductive inference. Leveraging this approach, we observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate. On ProofWriter, augmenting the comparatively small open-source StarCoder+ (15.5B parameters) with LINC even outperforms GPT-3.5 and GPT-4 with Chain-of-Thought (CoT) prompting by an absolute 38% and 10%, respectively. When used with GPT-4, LINC scores 26% higher than CoT on ProofWriter while performing comparatively on FOLIO. Further analysis reveals that although both methods on average succeed roughly equally often on this dataset, they exhibit distinct and complementary failure modes. We thus provide promising evidence for how logical reasoning over natural language can be tackled through jointly leveraging LLMs alongside symbolic provers. All corresponding code is publicly available at https://github.com/benlipkin/linc

cubefox · on Dec 9, 2023

But "theorem provers" don't prove anything, they just check existing proofs for validity. That's a much less demanding task than coming up with a proof in the first place.

auntienomen · on Dec 8, 2023

Open conjectures are usually solved by inventing new mathematical frameworks that embed the context of the existing question. Inventing new definitions is trivial, but there's no rules that constrain the space of moves. And inventing _interesting_ new definitions is extremely hard. I wouldn't bet on LLMs doing this ever.

auntienomen · on Dec 8, 2023

What would be nice is an LLM to aid in translation between mathematics and a theorem proving language like Lean. That might fuel a revolution in how mathematics is done.

steego · on Dec 8, 2023

Check out this paper:

https://leandojo.org/

People have already trained models to assist suggestion tactics. They then linked it up to ChatGPT to interactively solve proofs.

In this scenario, ChatGPT asks the model for tactic suggestions, applies it to the proof and uses the feedback from Lean to then proceed with the next step.

FYI, The programmatic interface to Lean was written by an OpenAI employee who was on the Lean team a few years ago.

Also, check out Lean’s roadmap. They aspire to position Lean to becoming a target for LLMs because it has been designed for verification from the ground up.

As math and compsci nerds contribute to mathlib, all of those proofs are also building up a huge corpus that will likely be leveraged for both verification and optimization.

If AI can make verification a lot easier, then we’re likely going to see verification change programming similarly to the way it changed electronics.

staunton · on Dec 8, 2023

> it has been designed for verification from the ground up.

It has a lot of places where you assume things work in a certain way and a lot of ways to assume it (and you can prove `false` if any of those assumptions aren't correct). I'd say it's a lot more geared towards formalizing math than verification.

There's some more things, e.g., the `Expression -> C-code` "compiler" isn't verified and neither is the C-compiler. Those could be fixed with some work (that nobody is working on currently, as far as I can tell). But the first issue seems pervasive. I don't think Lean will become a standard tool for software verification or significantly advance the discipline. Of course, I'd love to be wrong.

zozbot234 · on Dec 9, 2023

If the foundations are workable (and they seems to be, though IIRC there is a quotienting construct that's quite ad hoc and that people have complained about), the fact that you can additionally "assume" other things is fairly irrelevant, these are no different than extra axioms.

staunton · on Dec 10, 2023

It's not useful to prove that "your program does what you want, provided the following 10000 functions implemented in C do what we assumed they do". Almost certainly at least one of them doesn't.

There isn't even a straightforward (let alone verified) way to list all assumptions of this type for some proof (you can print the axioms that a definition depends on, but that's not the same thing). It's possible to prove `false` in Lean using functions you'd normally use to make pretty much any useful program.

steego · on Dec 11, 2023

You’re right and thank you for correcting this.

I was playing fast and loose with the term verification and was thinking about it in a broader context.

kaba0 · on Dec 8, 2023

For trivial stuff, it might be able to even now (but those can probably be solved algorithmically as well). For more complex stuff, they are not even scratching the surface, I would believe — LLMs can’t really do long, complex thoughts/inferences, which are essential for coming up with proofs, or even just to solve a sudoku (which they can’t do — no, writing a program (which was likely part of its training set) and executing that doesn’t count).

srcreigh · on Dec 8, 2023

It will never happen. Some open math problems are equivalent to the halting problem; and we know that finite computers can’t solve the halting problem.

MaxBarraclough · on Dec 8, 2023

That doesn't sound right, we already have automated theorem provers, as imperfect as they may be.

bawolff · on Dec 8, 2023

At the same level of success as a mathematician? Never.

At the same level of success as your average high school graduate? Probably already can.

kaba0 · on Dec 8, 2023

I assume an average high school graduate can solve a sudoku. LLMs are very interesting but anything “thinking”-related is not their strong suit.

bawolff · on Dec 8, 2023

Solving sudoku is not really a proof.

Like its a witness that a solution exists but i dont think that's really the thing most people mean by write a proof.

LLMs are really good at cutting and pasting from the internet. That's the level i would expect from average high school student.

thaumasiotes · on Dec 9, 2023

Solving sudoku is a pure exercise in proving what the solution is.

But most sudoku tend to be unsatisfying in that you're forced to use trial and error. The more formal term would be "proof by cases": once your fun proving has reached its limits, you've shown that a particular cell must contain either 5 or 7. If you show that that cell containing a 5 will lead to a contradiction, you've created an impeccable proof that the cell contains a 7.

The distinction you've stated would only come into play if the number of possible solutions to a puzzle differs from one; if there are no solutions, you have inconsistent premises and your proofs will be valid but not sound. If there are several solutions, then guessing and being right will, as you describe, create a proof that a solution exists, but it won't prove that that solution is unique or characterize what valid solutions are like, both of which are common mathematical goals.

brap · on Dec 8, 2023

My uneducated hunch is that we’re just a few years away from “proof search” being a solved problem. But that’s only a part of mathematical research.

prmph · on Dec 8, 2023

I think you are much too optimistic.

Proof verification, though still hard to automate, seems at least tractable. I'm not a mathematician by any means, but something tells me automated proof search, in the general case, would require solving the halting problem, which means it is impossible even in principle.

dumbo-octopus · on Dec 8, 2023

In the general case, proof search is obviously impossible, for the reason you state. But the case need not be general. A proof search machine that takes in a theorem and outputs `"proof: ..." | "antiProof: ..." | "idk"` would be quite powerful indeed.

gpm · on Dec 8, 2023

    def decider(_): print("idk")

Done ;)

---

More seriously, the benchmark would be something along the lines of "and it is better at finding proofs than humans are", like how chess engines haven't solved chess, but they are so much better than humans that they always win in a competition between the two.

There's no good reason to think that humans are capable of finding proofs that are hard to find algorithmically.

dumbo-octopus · on Dec 9, 2023

Clearly the prevalence of idk's in the output is the metric for determining power.

As for the human-vs-computer, I think the most powerful solutions will be those that combine human and computer prowess, similar to how a Human+Stockfish beats Stockfish^. To that end I've been working on Lean search tools that keep a human firmly in the loop. Terence has actually called out a tool I helped make as being quite helpful in his work, which is rather neat.

^ I heard this to be true at some point, but cannot find any reference on it at the moment.

gpm · on Dec 9, 2023

> similar to how a Human+Stockfish beats Stockfish^

> ^ I heard this to be true at some point, but cannot find any reference on it at the moment.

Known as centaur chess (amongst other things). It was certainly the case at one point. I strongly suspect that it isn't still the case, but it's hard to say for absolute certainty because it never developed a serious competitive community like engine and non-engine chess has. I can't find anyone listing any results post 2017 [1]. It was barely the case in 2017, and Stockfish has improved a lot since then [2].

[1] https://gwern.net/note/note#advanced-chess-obituary

[2] E.g. here's someone running stockfish 14 (current from July 2 2021 - April 18 2022) against stockfish 8 (current from November 1 2016 to February 1 2018). Stockfish 14 won 61 games and drew 3: https://www.reddit.com/r/chess/comments/oov714/64_game_match... (PS: They don't document time controls and I'm not sure how scientific this test was, but it matches with what I would expect, so eh)

Not to discourage you from your project with lean. At the very least I would expect there to be a transitional period, likely of decades, where computer+human is better than computer.

zone411 · on Dec 9, 2023

In the 2022 Correspondence Chess Championship, humans hurt the performance of computers by themselves. I posted about this here: https://twitter.com/LechMazur/status/1721008768899555677

brap · on Dec 8, 2023

>There's no good reason to think that humans are capable of finding proofs that are hard to find algorithmically

That’s really all I was trying to say, not sure why I’m getting downvoted

thaumasiotes · on Dec 9, 2023

> "antiProof: ..."

The standard terminology would be "counterexample".

on Dec 8, 2023

[deleted]

Ar-Curunir · on Dec 8, 2023

Given that algorithm design is very much a field of mathematics, it’s in principle (and probably in practice) not too difficult to adapt the infrastructure you mentioned to support a workflow where you say “I want an algorithm that on input x computes y” (more formally, ofc), and the tool outputs exactly that for you

dellamonica · on Dec 8, 2023

It doesn't require anything like that.

Math proofs are of NP complexity. If you had access to a non deterministic Turing machine you could enumerate all possible proofs of a given length and check them all in poly time.

That does not say anything about LLMs though. Personally, I believe they could be quite helpful to mathematicians in a way similar to copilot for software programming.

Jweb_Guru · on Dec 8, 2023

> Math proofs are of NP complexity.

This is only true for very specific kinds of proofs, and doesn't apply to stuff that uses CiC like Lean 4. This is because many proof steps proceed by conversion, which can require running programs of extremely high complexity (much higher than exponential) in order to determine whether two terms are equal. If this weren't the case, it would be much much easier to prove things like termination of proof verification in CiC (which is considered a very difficult problem that requires appeal to large cardinal axioms!). There are formalisms where verifying each proof steps is much lower complexity, but these can be proven to have exponentially (or greater) longer proofs on some problems (whether these cases are relevant in practice is often debated, but I do think the amount of real mathematics that's been formalized in CiC-based provers suggests that the extra power can be useful).

dellamonica · on Dec 8, 2023

This is all very interesting but it seems that we're just taking different views on what is the instance size. If it is the length of the theorem statement in some suitable encoding and the goal is to find a proof, of any possible length, then yeah, this is way too hard.

I'm taking the view that the (max) length of the proof can be taken as a parameter for the complexity because anything too long would not have any chance of being found by a human. It may also not be trusted by mathematicians anyway... do you know if the hardware is bug free, the compiler is 100% correct and no cosmic particle corrupted some part of your exponential length proof? It's a tough sell.

Jweb_Guru · on Dec 9, 2023

I'm talking about the proof length, not the length of the proposition. In CiC, you can have proofs like (for example) "eq_refl : eq <some short term that takes a very long time to compute> <some other short term that takes a very long time to compute>" (where both terms can be guaranteed to terminate before you actually execute them, so this isn't semidecidable or anything, just really slow). In other words, there exist short correct proofs that are not polynomial time to verify (where "verify" is essentially the same as type checking). So "find a proof of length n for this proposition" is not a problem in NP. You can't just say "ah but proofs that long aren't meaningful in our universe" when the proof can be short.

dellamonica · on Dec 10, 2023

Could you give me a reference? This is not something I'm familiar with.

Can you claim that this equivalence proof is not in NP, without requiring this specific encoding? I would be very surprised to learn that there is no encoding where such proofs cannot be checked efficiently.

Jweb_Guru · on Dec 11, 2023

Reference for complexity: https://cs.stackexchange.com/questions/147849/what-is-the-ru....

> Can you claim that this equivalence proof is not in NP, without requiring this specific encoding? I would be very surprised to learn that there is no encoding where such proofs cannot be checked efficiently.

There are encodings where such proofs can be checked efficiently, such as the one that enumerates every step. Necessarily, however, transcribing the proof to such an encoding takes more than polynomial time to perform the conversion, so the problems aren't equivalent from a P vs. NP perspective (to see why this is important, note that there is a conversion from any 3SAT formula to DNF with a worst-case exponential size blowup, where the converted formula can be solved in LINEAR time! So the restriction to polynomial time convertible encodings is really really important in proofs of reductions in NP, and can't be glossed over to try to say two problems are the same in the context of NP-completeness).

zozbot234 · on Dec 8, 2023

Yes, executing a calculation as part of a proof after separately proving that it is the "right" calculation to solve the problem is quite a common way of proceeding. Even common things like algebraic simplifications of all kinds (including solving equations, etc.) can be seen as instances of this, but this kind of thing has notably come up in the solution of famous problems like the 4-color problem, or the Kepler conjecture - both of which have been computer-verified.

explaininjs · on Dec 8, 2023

> you could enumerate all possible proofs of a given length

That does not help us with proof search as we do not know the target length.

bawolff · on Dec 8, 2023

A proof longer than the size of the universe is also pretty useless and probably not something we need to worry about.

Like i guess you are saying we couldn't really use such a machine to determine whether a certain conjecture just has a very long proof/disproof or is actually undecidable. Which sure, but umm i think that is the sort of problem most mathematicians would love to have.

The real reason non-deterministic turing machines can't help us is that they dont actually exist.

dellamonica · on Dec 8, 2023

Of course it would, you would enumerate lengths too. If the lengths need to be larger than polynomially bounded then we can be sure it would never be found by a human anyway.

explaininjs · on Dec 9, 2023

> If the lengths need to be larger than polynomially bounded then we can be sure it would never be found by a human anyway.

We cannot, as a human might be able to intuitively devise proof that might be quite large in the target language but easily constructible in a separate one.

dellamonica · on Dec 9, 2023

Then in that target language, found by a clever human, you could do the same type of enumeration...

My whole point is that humans simply cannot process/create by themselves any truly long proof (we can obviously create a process for that). Therefore enumeration puts everything achievable by humans in the NP complexity. Feel free to disagree, this is not so much a theorem as more of a thesis.

Jweb_Guru · on Dec 9, 2023

> Therefore enumeration puts everything achievable by humans in the NP complexity.

This is a severe misunderstanding of NP. Hindley Milner type inference is worst case doubly exponential, for example, and we solve that all the time. We just happen to rarely hit the hard instances. I think the statement you're trying to make is something more along the lines of https://en.wikipedia.org/wiki/Skolem%27s_paradox, which is definitely fascinating but doesn't have much to do with P vs NP. Notably, this paradox does not apply to higher order logics like CiC with more than one universe, which lack countable models (which is why your intuitions about provability may not apply there).

dellamonica · on Dec 10, 2023

No misunderstanding about NP here for sure. As I said, this is about as much of a thesis as Church Turing is about what can be computed.

I have no clue about CiC, lean and whatnot. It was never my field and I don't doubt there can be some very weird things you can do with some fancy logic models that are rarely discussed by non logicians.

What I'm claiming is that anything a human could prove without a computer could be done by a non deterministic machine in poly time under a very reasonable assumption of proof length. I'm baking in the assumption of proof length, so I can claim something about NP...

Let me put it this way: if you can write a paper with your proof and I can check it with only my brain without a computer helping me, then a poly time algorithm could also check that proof. Unless you believe something exotic about how the brain computes, how would it not be the case?

Jweb_Guru · on Dec 11, 2023

> Let me put it this way: if you can write a paper with your proof and I can check it with only my brain without a computer helping me, then a poly time algorithm could also check that proof. Unless you believe something exotic about how the brain computes, how would it not be the case?

What does "without a computer" have to do with anything, and what do you mean by saying "in polynomial time" when restricted to a specific instance of the problem? To talk about solutions being checkable in polynomial time in the size of the input (which is what NP actually means, more or less, since every problem in NP can be encoded as a 3SAT formula with at most polynomial blowup), you have to actually specify some (infinite) class of problems first, otherwise "in polynomial time" doesn't really mean anything. Then, you have to specify a polynomial time verification algorithm for every instance in the class.

T he problem is that "proofs of length N" in CiC doesn't have a polynomial time verification algorithm (for checking proof correctness) in the size of the proof (and as I mentioned elsewhere, attempts to translate to an encoding that does can result in exponential or much worse blowup in the proof size, which does not allow you to put them in the same complexity class in this context). Moreover, it's very unclear how you'd even begin to carve out the complexity class of "proofs that can be verified quickly" because that pretty much requires you to write an algorithm to determine the complexity of the proof, which includes being able to determine the complexity of arbitrary functions in CiC (which as you can imagine, is not easy in such an expressive system! AFAIK, the totality proof requires axioms beyond ZFC, which means that the proof is not constructive and cannot provide any specific upper bound on the number of steps taken by an expression).

This means that there is no polynomial function f(n) such that a nondeterministic Turing Machine could always terminate after f(n) steps and (correctly) report "yes" when there is a proof with n or fewer steps and "no" otherwise, and it is rather unlikely that there is a decision procedure to restrict the proofs to "proofs that run in polynomial time in the size of the proof input" so you could practically apply your claim just to that subset of proofs (though I don't know whether the absence of such a decision procedure has actually been proven, but generally speaking problems like this are very rarely decidable for interesting programming languages like CiC, even if they are total). It does not mean that a human, or human with a computer, couldn't come up with a clever short proof in O(n) tokens that takes longer than f(n) steps to check on some particular instance of the problem for some particular f... because why would it?

I think what you're trying to say (to make your argument hopefully what you had in mind) is that if humans can verify a proof in "a reasonable amount of time", then a nondeterministic Turing Machine could produce the proof in "a reasonable amount of time." Which might well be true, but the point is that in CiC, how long the proof takes to verify has very little to do with the size of the proof, so this claim has nothing to do with proof verification being in P (and therefore, has nothing to do with proof search being in NP, which it isn't for CiC). More to the point, this is pretty much true of any problem of interest--if it takes more than a reasonable amount of time in practice to verify whether a solution is correct (where verification can include stuff like "I ran polynomial-time UNSAT on this instance of the problem" in a universe where P=NP), it isn't practically solvable, regardless of what fancy methods we used to arrive at the solution and regardless of the length of the solution. So I don't find this particularly insightful and don't think it says anything deep about humanity, mathematics, or the universe.

Anyway, I'm mostly just pointing out that even if P=NP, you can't really leverage that to discover whether a short proof exists in CiC in polynomial time. More concretely, there isn't an algorithm to convert an arbitrary CiC formula to a 3SAT formula without greater than polynomial size blowup (we know this because we can run non-elementary algorithms in CiC!). So even if we had a polynomial time 3SAT algorithm, which would solve a great many things, it wouldn't solve proof synthesis, which is another way to phrase "proof synthesis in CiC isn't in NP."

dellamonica · on Dec 11, 2023

First of all, thank you for a thorough response. I'll need to take time to read it (and the refs in your other reply) with the care it deserves.

Basically I'm talking about the subset of proofs that could be done with pen and paper and pass a careful refereeing process. And you got that interpretation in your reply. You say that it is not insightful and that is of course an opinion you are entitled to.

To me it is an interesting observation that NP is a complexity class powerful enough to basically obsolete us. The context of this conversation/thread is using Lean to formalize a proof and whether in the near future, integrating AI models would potentially give us novel tools for discovering new mathematics. We routinely solve NP hard problems in practice for many instances, so I think drawing this parallel here was relevant.

I do agree with you that there would still be out of reach proofs even with an efficient algo for NP, but as some other person replied, it would be a nice problem to have...

Jweb_Guru · on Dec 11, 2023

> Basically I'm talking about the subset of proofs that could be done with pen and paper and pass a careful refereeing process. And you got that interpretation in your reply. You say that it is not insightful and that is of course an opinion you are entitled to.

Well, that part was me being subjective, but the objective part is that this subset of proofs doesn't necessarily have a verification algorithm "in NP." A very short proof that took time doubly exponential (in some sense) in the size of the proof could still feasibly be solved with pen and paper, because the proof was short even though the verification time was very long (or the verification was long due to a technicality--I describe an example below), depending on the exponent and the proof size. While at the same time, a short proof that took time polynomial (in some sense) in the size of the proof could take longer than the age of the universe to verify and be out of reach for both humans and computers, depending again on the degree of the polynomial and the proof size (and, since again proof verification in CiC is not reducible to 3SAT, proof search that incorporates non-polynomial functions plausibly might not even benefit from any speedup if P = NP!). I say "in some sense" because it's unclear exactly what function f to fix, since pretty much any decidable, total function you can think of grows more slowly than "max steps needed to typecheck a term of length n in CiC" (outside of other meta-level functions that similarly don't provide practical bounds on running time).

(To give a concrete example of where human verification might be much faster than computer verification of the same step, human mathematicians can easily use "tricks" when they know two results will be the same, without actually proving them equivalent, modifying the proof, or imparting extra knowledge to a verifier. For example, in Coq, by default you use Peano naturals, which are convenient for proofs but have very poor complexity--which means exponentiation actually takes exponential time to compute with them. Humans verifying a proof step will recognize this and just do exponentiation the normal way regardless of which inductive definition of naturals is being used, while for a computer to do so the person writing the proof has to deliberately switch out Peano naturals for digital naturals. It is entirely possible that the proof size when using non-Peano nats will be larger than the proof size using Peano nats, since induction on non-Peano nats is much more tedious, while the verification time is exponentially lower using non-Peano nats! So this is an example where proof search for a short proof and proof search for a fast to verify proof may be in conflict. It's also an example of somewhere where even though it's not explicitly stated in the proof, and even though different formulations of nats can't exactly be proved "fully equivalent" in CiC, humans verifying a proof understand that "in spirit" it doesn't really matter whether you use Peano nats or not.

Indeed, if you work with formal verification a lot, you will find you spend significant amounts of time optimizing your proofs, and learning tricks like whether to make an input to an inductive definition indexed or parametric, which are things that human mathematicians don't care about at all but which are very significant to the machine verifier).

In other words, I don't think being practical to verify can be described completely by either the complexity of verification or the size of the proof, and I don't think it's that useful to conflate this with a problem being in NP since (1) CiC proof synthesis isn't actually in NP, or even in the polynomial hierarchy, and (2) being in NP has a much more precise technical definition that only somewhat overlaps with "practical to verify."

explaininjs · on Dec 9, 2023

There’s always another encoding, all you prove is that there doesn’t exist a proof in the encoding you selected less than the length you specified, it does nothing at all to disprove the existence of a short proof in general.

dellamonica · on Dec 10, 2023

Right, and this is also the current status of handmade mathematics. All we know is that we did not find a proof yet with everything that has been tried. This typically means that a problem is harder the more stuff has been thrown at it and failed.

A non deterministic Turing machine would absolutely crunch through math theorems, I really don't know why there is so much push back against what I am stating. Basically, if you had such a machine, there essentially would no longer be any room left for proving stuff the manual way, you would get completely outclassed by them.

The real consequence for me though is that this is a strong evidence (call it faith) against P=NP.

explaininjs · on Dec 11, 2023

> A non deterministic Turing machine would absolutely crunch through math theorems

Can you formalize that statement in any useful way? It seems you believe proof validation to be in NP, when it most certainly is not. Accordingly an N-TM would be of relatively little power against the problem.

dellamonica · on Dec 11, 2023

It's rather difficult to provide a good formalization but let me give it a shot.

Suppose that mathematicians write papers with pen and paper in a subset of natural language without ambiguity (you wish!). What they write as proofs can be checked thoroughly for correctness by some other human (referee) otherwise it will not be published (again, we wish!).

Now for a paper with N tokens (or chars or whatever measure of length) how much computation can the referee reasonably dedicate to the task of checking that proof. Unless you believe something special about the human brain, I'd claim that poly time on N is a reasonable assumption. Now this means that checking is poly(N) for human checkable proofs of length N.

This is my argument that non deterministic Turing machines would absolutely crunch through everything that we could do in the current model of mathemticsl research.

The class of instances is that of theorems which a human could write a hand made proof and have it verified by another human. Recall that we are discussing formalization and what AI could achieve in this space, so I think formulating what we currently can achieve is a nice contrast with what could come in the future.

waynecochran · on Dec 8, 2023

Neither RH nor P vs NP will succumb.