I knew Gödel's (first) incompleteness theorem pretty well already, but I really enjoyed how much that video managed to get across without getting too bogged down.
One thing I'd like to see in a companion video would be an explanation of why Turing machines represent computation. That video (like many others) skims over the why, and only talks about what they can/can't do after we've already decided to use them.
Turing's 1936 "On Computable Numbers" paper gives a nice philosophical justification for his model, which (from my reading) boils down to the following:
- Mathematicians can do their work within a finite volume of space (their brain, or a room, or the whole Earth if we want to be conservative). Also, they could (in principle) do their work using only written communication, on standard pieces of paper, each of which also has a finite volume.
- Finite volumes can only have finitely-many distinguishable states (having infinitely many states would require them to be infinitesimally similar, and hence indistinguishable from each other within a finite amount of time)
- Hence we can label every distinguishable state of a brain with a number; and likewise for every distinguishable piece of paper (at least in principle)
- Since these are both finite, any mathematician's behaviour could (in principle) be completely described by a table detailing how one state (brain + paper) leads to another
- Given such a table, the actual content of the states (e.g. the wavefunctions of particular protons, the electrical potential of particular synapses, the placement of ink molecules, etc.) is irrelevant for the behaviour; only the transitions from one numbered state to another matter
- Hence we could (in principle) build a machine with the same number of states as one of these tables, and the same transitions between the states, and it would exactly reproduce the behaviour of the mathematician
This is the philosophical justification for why a (Turing) machine can calculate anything a human can (in fact, the same argument shows that a Turing machine can reproduce the behaviour of any physical system).
However, this is still a rather hand-wavey "in principle" thought experiment about unimaginably huge numbers. Turing managed to take it further.
For simplicity we assume all the papers are arranged in a sequential "tape", we'll call the distinguishable states of the papers "symbols" and those of the mathematician/machine "states":
- One thing a mathematician can do is read a tape with one of these tables written on it, followed by a sequence of numbers representing the symbols of another tape, and emulate what the described machine would do when given the described tape (i.e. they could keep track of the current state and tape position, and look up the transitions in the table, to see what would happen)
- Since a mathematician can emulate any given table (in principle), so can a machine. This would be a "universal machine", able to emulate any other. (The video talks about such a machine, in the proof that the halting problem is undecidable)
So the question becomes: how unfathomably complicated would such a universal machine have to be?
- These transition tables and tapes can be very big, and may contain very large numbers, but we can write them down using only a small alphabet of symbols, e.g. "start table", "new row", "the digit 7", etc.
- Reading a sequence of such symbols, and emulating the described machine, can get tricky. Turing described a universal machine "U", but he did so in a rather indirect way, which also turned out to have some mistakes and glaring inefficiencies. Davies later worked through these and ended up with an explicit machine using only 147 states and 295 symbols.
Hence we can use a machine with only a few hundred states to exactly reproduce the behaviour of any mathematician (or any physical system), as long as it's given an appropriate description (i.e. "software"). Later work has found universal Turing machines with only 4 states and 6 symbols.
One reason Turing's justification for his model is important, rather than just proposing the model and seeing what happens (like in the video), is that Alonzo Church had already proposed a model of computation (called Lambda Calculus), but didn't have such a justification.
Gödel himself dismissed Lambda Calculus, proposing his own system (General Recursive Functions) as a better alternative. When Church proved they were equivalent, Gödel took that as reason to dismiss his own system too! Yet Turing's argument did convince Gödel that a fundamental limit on computation had been found. Turing proved his machines are also equivalent to Lambda Calculus, and hence General Recursive Functions; so all of these proposed models turned out to be 'correct', but it was only Turing who could explain why.
Personally I consider this reduction of physical behaviour to transition tables and then to machines, to be the main reason to care about Turing machines. Proving the undecidability of the halting problem was also a great achievement of Turing's 1936 paper (as shown in that video), but that can also be explained (I would argue more easily) using other systems like Lambda Calculus.
Without Turing's justification of his model, undecidability comes across as simply a flaw. Sure Turing machines may be (distantly) related to our laptops and phones, but if we found a better model without this 'undecidability bug' we could make better laptops and phones! Turing's argument shows that there is no better model (just equivalents, like Lambda Calculus).
> This is the philosophical justification for why a (Turing) machine can calculate anything a human can (in fact, the same argument shows that a Turing machine can reproduce the behaviour of any physical system).
I don't think this assertion follows. I don't think an argument like this can work without delving further into reasonable description of a "physical system".
If you just just use the mathematics we employ to describe physical systems unreservedly it is possible to construct "physical systems" that exhibit non-computable behavior. For instance you can have computable and continuous initial conditions to the wave equation that produces a solution that is non-computable. See : https://en.wikipedia.org/wiki/Computability_in_Analysis_and_...
I think it's important to emphasize that Turing stated his arguments in regards to "effective procedure" (which I see you mention in a different post). I don't think the substitution of "effective procedure" with "physical system" is justified.
> For instance you can have computable and continuous initial conditions to the wave equation that produces a solution that is non-computable.
Thanks, that's a really nice example which I hadn't come across before (or at least not spent too much time studying). I may have to refine the language I use in future; although a cursory look seems to be compatible with my own understanding (my mental model is roughly: "if we found a halting oracle, we would have no way to tell for sure")
> I think it's important to emphasize that Turing stated his arguments in regards to "effective procedure" (which I see you mention in a different post). I don't think the substitution of "effective procedure" with "physical system" is justified.
Yes, Turing did not say as much (at least in his 1936 paper). He was essentially abstracting over 'whatever it is that a person might be doing', in an incredibly general way. Others have since taken this idea and applied it more broadly.
Another useful caveat is that Turing machines are framed as (partial) functions over the Natural numbers. It's quite a leap from there to a "physical system". An obvious example is that no matter how cleverly we program a Turing machine, it cannot wash the dishes; although can simulate the washing of dishes to arbitrary precision, and it could wash dishes by controlling actuators if we decided to attach some (but even that would run into problems of time constraints; e.g. if calculating the first instruction took so long that the water had evaporated).
The problem with your assumption is that you're assuming there are a finite number of states. There might be an uncountably infinite number of states, for instance if the states were the reals between 0 and 1.
Which assumption are you referring to? If it's about the states in a finite region, note that I specifically limited this to distinguishable states.
Whether or not a finite region can have an infinite number of states (countable or otherwise) is irrelevant; we can only distinguish finitely many in finite time. Two states being indistinguishable means they'll give rise to the same output behaviour (e.g. from a mathematician in a room, carrying out some procedure).
Where in there did Turing prove computation can "exactly reproduce the behavior to...(any physical system)"? Because we humans seem to be able to describe and write down the behavior of any physical system? I'm not sure that jump is airtight.
I'm not sure we can say this part even though we might believe it to be true. I mean maybe now we can since we have no deeper theories than quantum field theories, and we can posit all of physical reality reduces to QFTs and GR and simulate all of QFT computationally either with quantum computers (in real time) or classical (not in real time + need a random number generator). But the we'd still have computational limits by which observables we can measure and how big of computations we can run (think size of computer limited by black hole density for one crude example - although I guess a blackhole could be used as a quantum computer. So then I'd say the speed of light is the other main limit, we've already lost the information to run certain calculations to the depths of space).
And to focus back in on the random number generator requirement for classical to simulate quantum...nothing fundamentally random can be simulated classically in principle. Yet randomness is needed to describe the most basic layer of reality. I can't get true randomness from classical no matter what.
> Where in there did Turing prove computation can "exactly reproduce the behavior to...(any physical system)"? Because we humans seem to be able to describe and write down the behavior of any physical system?
Turing's scenario essentially applies to a person in a room, with an unlimited amount of paper (this could be a movable tape extending through the walls, to prevent the room filling up with paper!).
However, the same argument works if we replace the person with any other physical system (e.g. some mechanical device, or some physics experiment), and replace the room with any finite region of space (e.g. the Milky Way galaxy).
I don't remember Turing himself making this argument (let alone in his 1936 paper), but it's known as the "physical Church-Turing thesis" https://en.wikipedia.org/wiki/Church%E2%80%93Turing_thesis#V... (the Church-Turing thesis is the argument that all 'effective methods of calculation' can be carried out by a Turing machine)
As for your points about quantum physics, etc. that's certainly true, and it's why the Church-Turing thesis isn't a theorem (and can't be, since we can never know if any particular model of reality is correct). However, it's certainly a valid physical law, akin to nothing moving faster than light, etc. i.e. it's empirically true, theoretically sound, and we have no better explanation for the moment.
Perhaps its origin in mathematics, where it is a second-class citizen compared to provable theorems, prevented the Church-Turing thesis getting the recognition it deserves, e.g from physicists.
Well I might add the quantum complexity-theory version of the Church–Turing thesis.
I believe that quantum computation is a richer more powerful theory. In essence BQP is a superset of BPP in your link.
I think this happens with all sorts of mathematical objects. Non commutative algebra is much richer than commutative algebra. There are also more interesting non commutative things than just deformations of classical objects. Its the classical commutative case that is the weird limit.
Yeah, there are all sorts of layers we can add, especially regarding complexity (i.e. resource usage limits, rather than simple yes/no questions of decidability).
It's interesting to consider P = NP as a physical law, although that's certainly more tenuous than the Church-Turing thesis, etc.
Can you please flesh out the following? I don’t quite follow it..
> having infinitely many states would require them to be infinitesimally similar, and hence indistinguishable from each other within a finite amount of time
Sure. Let's take a finite region, like a room. We don't want to get too bogged down in the details of human psychology, anatomy, etc. so we'll stick to something very low-level like particle physics (which, in principle, can describe a person in exact detail). One state of this region is that it's completely empty (maybe quantum effects rule that out, but we can still include it if we're being conservative ;) ). Another state could have, say, a single electron, at some position we'll call (0, 0, 0). Another could have a single electron at position (0, 0, 1). Another has an electron at (25, 5, 3) and a proton at position (0, 0, 0). And so on.
Let's consider the states which just contain a single electron, somewhere in the room (I'm not bothering with quantum effects, but we would do the same thing just in Hilbert space instead of 3D space). The region is finite, so the coordinates of this electron are bounded: if the boundaries are at, say, 0 and 100 in each axis, then putting the electron at position (101, 0, 0) would be the same state as the empty one we started with (since the electron is outside the region we care about, and there's nothing else).
Now let's say we do some measurement of this electron's position, which is only accurate to whole number coordinates. That gives us 100^3 = 1,000,000 distinguishable states with a single electron. We might imagine all sorts of different states 'in between', but they can't affect that measurement due to its limited resolution.
If we increase the resolution of our measurement by 10, we get 10^3 times as many distinguishable states; another factor of 10 gives 10^3 times as many states again; and so on. However, regardless of how finely we can resolve the electron's position, we can only distinguish between finitely-many states. Any differences finer than that limit are irrelevant to the measurement, and hence to any behaviour that depends on that measurement.
If we could resolve between infinitely-many positions for the electron, that would require distinguishing between states which are infinitesimally-close together, i.e. on an infinitely-fine grid; this would require distinguishing between two numbers whose decimals only differ after infinitely many digits. This doesn't seem like a reasonable capability.
The same principle applies when we have two electrons in the room, or trillions, or any combination of electrons, protons, photons, etc. in any arrangement we like.
A similar thing happens when we reach really large numbers too, e.g. trying to put as many particles in the region as we can: at some point the relative difference between, say, a googolplex and five particles versus a googolplex and six particles becomes too small to distinguish. There will eventually be a limit. (This is like the resolution problem, but instead of adding decimal places to the right of each number, we're adding factors of 10 to their left)
One way to get around this problem of limited resolution is to avoid an explicit 'measurement', and instead have the region itself behave differently for different states. A good example is chaotic behaviour, where tiny changes in an initial state will grow exponentially until those differences eventually become distinguishable. However, the infinitesimally-similar states described above will remain infinitesimally close for all finite lengths of time; in order to distinguish between them, we would need to wait an infinite amount of time.
So putting aside Plank and all the experimental stuff, even on a philosophical level there are problems with thinking of space as continuous. For, that with which we measure space is necessarily discrete and the time-frames in which we measure are necessarily finite. So, even if space was continuous, two ‘states’ that are infinitesimally similar could for ALL practical purposes be considered identical.
As a far less formal analogy, it's similar to how people complain that digital audio in general has lower quality than analogue, just from a discrete vs continuous standpoint. When in fact there's no limit to the quality of a digital representation (if we liked, we could store a gigabyte per sample, or a petabyte, or whatever). Yet we can always draw the line somewhere, beyond which there are no meaningful distinctions.
I know I'm pretty inept when it comes to this subject, but can someone help me understand how the first three points of Turing's philosophical justification listed above can be true? For example, in computing the sequence of n values of an infinite series or the first n digits of pi, how can the claim be made that this can be done in a finite amount of space or with a finite number of states?
Turing's machine includes an "infinite" tape. The number of states, the state transitions, and the number of symbols are finite, but an unbounded number of symbols can be written, one to a cell, on the tape.
We're allowed an unlimited amount of paper, but each piece has a finite size, and hence each piece is in one of only finitely-many distinguishable states. Turing called these piece-of-paper-states "symbols", and arranged the pieces of paper in a "tape" of unlimited length (a stack of sheets, or a library of books would also work; but moving between the pieces would be more complicated).
In fact, we only ever need 2 symbols, e.g. 0 and 1. This is because any set of symbols can be replaced by a set of numbers, and those numbers can be written in binary (with a fixed width, to avoid needing 'spacers'). This can require a lot more machine states, since those states allow the machine to 'remember' the bits it's read so far (e.g. if we're using 8 bits at a time, then we need enough states to "remember" what the first 7 bits were as we're reading the 8th).
For a calculation like pi, we can use the tape to (a) write down the digits (or bits) of pi, and (b) keep track of intermediate variables needed for the calculation. One simple approach to keeping track of intermediate variables is to store them in order on the tape, where each variable is a sequence of 1 symbols, separated by 0 symbols. For example a tape like: 111010010 represents four variables with values 3, 1, 0 and 1 (the number of 1s we see before we hit a 0). This actually corresponds to Peano arithmetic, if we use the symbols S and Z instead like SSSZSZZSZ.
Start
^
Initialise variables and output/variables separator
|VARIABLES
Move variables to make space for next digit
_|VARIABLES
Calculate and write next digit
3|VARIABLES
Move variables to make space for next digit
3_|VARIABLES
Calculate and write next digit
31|VARIABLES
and so on
It might be because pi isn’t infinite in practice. E.g. we could use all the available energy in the universe and only calculate up to a finite digit (ignoring the information storage issue!).
Whether the abstract infinite pi ‘exists’ is up for metaphysical debate :)
def pi_decimal_digits():
q, r, t, j = 1, 180, 60, 2
while True:
u, y = 3*(3*j+1)*(3*j+2), (q*(27*j-12)+5*r)//(5*t)
yield y
q, r, t, j = 10*q*j*(2*j-1), 10*u*(q*(5*j-2)+r-y*t), t*u, j+1
If you mean that pi can't be written as a number to infinite precision using place-value notation (as opposed to a more useful notation like Python) then I agree that only a finite prefix can ever be observed.
What you've written is not pi, it's a procedure for computing pi. Python expressions are not a notation, they fail basic requirements for a reasonable notation (for example you cannot generally determine whether two Python expressions are equal or not).
You may find it interesting to learn that one cannot generally determine whether two real numbers are equal or not: equality on real numbers is undecidable.
I'm well aware. It's also the case that there is no good notation for the real numbers; in particular since any given notation system can only represent countably many distinct entities, almost all real numbers will be are unrepresentable in that notation system. Therefore the notion that the full set of reals actually exists is extremely dubious.
Note that the Python version of pi requires an unbounded amount of memory and, more importantly, an infinite amount of time to produce pi to infinite precision.
Not at all, the reason you may think so is that you're very used to representing numbers using a base 10 number system and in that system it is inherently impossible to represent pi. But if you change how you represent numbers, you can represent pi exactly. What OP did was represent pi using a syntax inspired by a programming language, which is just as valid a number system as any other formal representation as far as correctness goes, although of course as a human I likely would not wish to express most of my use cases involving numbers using a full blown programming language.
That said, one can perform all of arithmetic using that system and represent any property of numbers that could be represented otherwise.
Ah, I see where you are going with this. However, I can offer a version which, without loss of generality, gives another 200-to-1 compression (not counting UTF-8 multibytes): π.
I’ve wondered recently whether the brain can be represented by a Turing machine if our computation mechanisms utilize something like back propagation through time via quantum effects.
Maybe we will make progress in our understanding this millennia :)
Quantum physics can be represented and calculated by Turing machines. There are many ways to do this, depending on preference: via symbolic algebra, via probabilistic inference, via random/pseudorandom numbers, via non-deterministic functions, etc.
Note that quantum computers are no more powerful than Turing machines; although it's thought they might be faster at certain tasks. For example, we know how to factorise integers quickly using Shor's algorithm on a quantum computer, but we don't yet know a quick algorithm for factorising integers on a classical computers. We can still factorise integers on a classical computers: we can do it with a non-quantum algorithm, or we can even do it by emulating a quantum computer and using that emulator to run Shor's algorithm (in this case the emulation is so slow that it exactly cancels out the speedup from Shor's algorithm).
Backpropagation through time is also a pretty simple algorithm, which is already implemented on classical computers.
> Quantum physics can be represented and calculated by Turing machines.
1) Where does the randomness of QM come in though? You need a random number generator alongside your Turing machine.
We coud debate that one can never prove true randomness as it may always be due to some deeper determinism. But if we take the wavefunction of QM as complete, which Many Worlds does - the most compsci-like interpretation - then randomness is fundamental according to theory.
I can have all the TM's I want, but I still need this extra component. Just like the human Turing machine could have all the pen and paper and math in the entire universe and beyond, a human can't produce a truly random number (or can we, then we get into a debate about true free will and whether free will can produce a "random" number).
2) Let's say we instead use quantum computation instead of classical. There are still physical systems we can't "compute", like the state of the entire observable universe, the state of the entire universe, or anything requiring information that has forever left our light cone. I know TM's and quantum versions of TM's are infinite, but they can't bypass the speed of light. So we need some softer version of saying all of physical reality is computable imo.
> But if we take the wavefunction of QM as complete, which Many Worlds does - the most compsci-like interpretation - then randomness is fundamental according to theory.
MWI is an entirely deterministic interpretation of quantum mechanics. Absolutely no randomness is involved which is one of its most appealing properties.
It's interesting that you mention a many-worlds approach, since that doesn't actually involve any randomness. Rather than choosing a single result probabilistically, many-worlds keeps them all. This can be represented quite nicely with what I referred to above as 'nondeterministic functions', i.e. functions which may return multiple values (in some sense, those values are literally the "worlds"). This would look like a logic language a la Prolog, but wouldn't be restricted to depth-first traversal: it could run breadth-first (very slow), concurrently (nice in principle, but would swamp any finite machine with overheads), or iterative deepening (probably the nicest implementation IMHO). Haskell programmers would call this a "breadth-first list monad" (AKA LogicT, AKA Omega). Instead of "returning multiple values", we could also do a continuation-passing transform and invoke the continuation multiple times (concurrently).
This would run exponentially slowly, but that's a whole separate issue ;)
> We coud debate that one can never prove true randomness as it may always be due to some deeper determinism.
We could get a Copenhagen-like approach by doing the same as above, but only returning one of the results, chosen "at random" according to the square of the wavefunction (plus some hand-wavey Copenhagen-craziness, like defining whether or not Wigner's Friend is an 'observer')
When it comes to "randomness" my own approach is to mentally substitute the word "unpredictable", since it tends to be more enlightening. Turing machines can certainly produce results which are "unpredictable", in the sense that the only way to know what will happen is to run the program; in which case that's more like a (repeatable) observation rather than a prediction. (BTW I also think that's a nice resolution of the "free will" paradox: my behaviour may be predicted by observing what an exact copy of me does; but it was still "decided by me", since that copy essentially is "me")
In any case, the outcome of a Turing machine using some pseudorandom algorithm is indistinguishable from "quantum randomness". There is certainly a limit to the unpredictability of any pseudorandom events (given by its Kolmogorov Complexity, which is upper-bounded by the Turing machine's program), but on the other hand I think it's quite presumptuous to think "quantum randomness" has infinite Kolmogorov Complexity.
> There are still physical systems we can't "compute", like the state of the entire observable universe, the state of the entire universe, or anything requiring information that has forever left our light cone.
That's a very different question, since it involves how we choose what to put on the tape, rather than what a Turing machine is capable of computing. It's like criticising a genie for being unable to grant a wish, when in fact it's we who can't think of what to ask for.
Besides this, there is a way for a Turing machine to compute the whole universe, including what's outside our visible lightcone: we simply run every program. That's actually a pretty straightforward task (and remarkably efficient), e.g. see the FAST algorithm in section 9 of https://people.idsia.ch/~juergen/fastestuniverse.pdf
Note that if we run such a program, there is no way to know which of the outputs corresponds to our universe. However, our ignorance of where to look does not imply that such a Turing machine has not computed it!
” Turing machines can certainly produce results which are "unpredictable", in the sense that the only way to know what will happen is to run the program.”
This doesn’t seem true since all predictions are ‘running the program’
Not necessarily; lots of systems have 'shortcuts', e.g. I can predict the result of sin(1000000τ) will be 0, without anything physical/virtual having to wiggle up and down a million times. Lots of systems can be described by equations involving a time variable 't', where we can set t = whenever we like, then solve the equation to get the predicted value.
When a system is Turing-complete, equations like this will inevitably involve some expression with t-1; and solving that involves some expression with t-2; and so on back to the initial condition. Hence there's no 'shortcut' (this is a consequence of Rice's theorem BTW, which itself is a consequence of the halting problem).
“ e.g. I can predict the result of sin(1000000τ) will be 0, without anything physical/virtual having to wiggle up and down a million times. “
This seems to still involves ‘a program’ in that there are foundational mathematical principals that build your confidence/logic that the periodicity of sin(x) is so well defined that any even integer multiple of pi will be zero. e.g. you could write out the logical ‘programatic’ steps that would also teach a math newbie how to make the conclusion you just made about the sin() function.
I agree, but such a "program" wouldn't necessarily look/act anything like the "program" 'sin(1000000τ)'.
What's interesting about systems like Turing machines is that the "program" is a specific, isolated thing like a tape of symbols. In contrast, those "foundational mathematical principles" you mention are more of a diffuse web, with lots of interchangable pieces, all mixed up with unrelated facts. This gives lots of different ways to arrive at the answer. With Turing-complete systems, we always have to come back to the code.
> Note that quantum computers are no more powerful than Turing machines;
But computers with self-consistent time travel are more powerful. It's perhaps physically possible to construct one of those, depending on whether we ever end up with a more-correct timeless theory of physics that makes different predictions to existing physics.
> Backpropagation through time is also a pretty simple algorithm, which is already implemented on classical computers.
I read “backprop through time” as meaning that you have the result of the backprop at the beginning of the process – that's more powerful, if we're using continuous computation. (With discrete computation, you're right that it's no more powerful.)
That would be a major revolution to the foundations of computer science if it were true, disproving the Church-Turing thesis. It seems unlikely that such a fundamental, world-shattering result would be so obscure, so I am much more likely to believe it is false.
As I said, if someone had convincingly proved this, it would shake the field to its core. Since that didn't happen, they can't have convincingly proven this.
With all due respect, it would be very hard for me to believe so. For the simplest case, one can trivially create a Turing machine that simulates a CPU, so I’m not sure the digital computation holds any water.
Well, not time-interrupted but there absolutely is a way to schedule n steps of each core, isn’t there? Similarly to how the universal Turing machine is constructed, or how the nondeterministic TM is made deterministic.
So unbounded nondeterminism is basically hypercomputation? Then I assume one can write a program for the theoretical Actor model that computes a function on every natural number, is that right? Or that it can solve the halting problem for Turing machines, since if it is stronger then it, the halting problem of TMs is no longer true for them (is replaced by their own halting problem).
Does this difference still hold if time were discrete? I may be out of my depth, but it seems intuitive that if time were discrete, you could enumerate all possible interleavings of actors' actions, and reproduce their behaviors in a Turing machine.
Turing machine can simulate multiple Turing Machines but cannot implement multiple Turing Machines communicating with each other because there is no means to communicate.
But since Turing machines can emulate multiple Turing machines, any problem that can be completed by multiple machines communicating can be solved by 1 Turing machine. As such, the computational power is exactly the same.
One thing I'd like to see in a companion video would be an explanation of why Turing machines represent computation. That video (like many others) skims over the why, and only talks about what they can/can't do after we've already decided to use them.
Turing's 1936 "On Computable Numbers" paper gives a nice philosophical justification for his model, which (from my reading) boils down to the following:
- Mathematicians can do their work within a finite volume of space (their brain, or a room, or the whole Earth if we want to be conservative). Also, they could (in principle) do their work using only written communication, on standard pieces of paper, each of which also has a finite volume.
- Finite volumes can only have finitely-many distinguishable states (having infinitely many states would require them to be infinitesimally similar, and hence indistinguishable from each other within a finite amount of time)
- Hence we can label every distinguishable state of a brain with a number; and likewise for every distinguishable piece of paper (at least in principle)
- Since these are both finite, any mathematician's behaviour could (in principle) be completely described by a table detailing how one state (brain + paper) leads to another
- Given such a table, the actual content of the states (e.g. the wavefunctions of particular protons, the electrical potential of particular synapses, the placement of ink molecules, etc.) is irrelevant for the behaviour; only the transitions from one numbered state to another matter
- Hence we could (in principle) build a machine with the same number of states as one of these tables, and the same transitions between the states, and it would exactly reproduce the behaviour of the mathematician
This is the philosophical justification for why a (Turing) machine can calculate anything a human can (in fact, the same argument shows that a Turing machine can reproduce the behaviour of any physical system).
However, this is still a rather hand-wavey "in principle" thought experiment about unimaginably huge numbers. Turing managed to take it further.
For simplicity we assume all the papers are arranged in a sequential "tape", we'll call the distinguishable states of the papers "symbols" and those of the mathematician/machine "states":
- One thing a mathematician can do is read a tape with one of these tables written on it, followed by a sequence of numbers representing the symbols of another tape, and emulate what the described machine would do when given the described tape (i.e. they could keep track of the current state and tape position, and look up the transitions in the table, to see what would happen)
- Since a mathematician can emulate any given table (in principle), so can a machine. This would be a "universal machine", able to emulate any other. (The video talks about such a machine, in the proof that the halting problem is undecidable)
So the question becomes: how unfathomably complicated would such a universal machine have to be?
- These transition tables and tapes can be very big, and may contain very large numbers, but we can write them down using only a small alphabet of symbols, e.g. "start table", "new row", "the digit 7", etc.
- Reading a sequence of such symbols, and emulating the described machine, can get tricky. Turing described a universal machine "U", but he did so in a rather indirect way, which also turned out to have some mistakes and glaring inefficiencies. Davies later worked through these and ended up with an explicit machine using only 147 states and 295 symbols.
Hence we can use a machine with only a few hundred states to exactly reproduce the behaviour of any mathematician (or any physical system), as long as it's given an appropriate description (i.e. "software"). Later work has found universal Turing machines with only 4 states and 6 symbols.
One reason Turing's justification for his model is important, rather than just proposing the model and seeing what happens (like in the video), is that Alonzo Church had already proposed a model of computation (called Lambda Calculus), but didn't have such a justification.
Gödel himself dismissed Lambda Calculus, proposing his own system (General Recursive Functions) as a better alternative. When Church proved they were equivalent, Gödel took that as reason to dismiss his own system too! Yet Turing's argument did convince Gödel that a fundamental limit on computation had been found. Turing proved his machines are also equivalent to Lambda Calculus, and hence General Recursive Functions; so all of these proposed models turned out to be 'correct', but it was only Turing who could explain why.
Personally I consider this reduction of physical behaviour to transition tables and then to machines, to be the main reason to care about Turing machines. Proving the undecidability of the halting problem was also a great achievement of Turing's 1936 paper (as shown in that video), but that can also be explained (I would argue more easily) using other systems like Lambda Calculus.
Without Turing's justification of his model, undecidability comes across as simply a flaw. Sure Turing machines may be (distantly) related to our laptops and phones, but if we found a better model without this 'undecidability bug' we could make better laptops and phones! Turing's argument shows that there is no better model (just equivalents, like Lambda Calculus).