This is quite interesting. In the beginning, there is this statement (assumption...

canjobear · on Nov 5, 2020

> Is it because there is a limit N and all natural sentences ever are less than N words, and thus you also have a bound on the stack depth?

It's true that you can't really find an N such that a natural language sentence with stack depth N is OK, but a sentence with stack depth N+1 is ungrammatical. But in practice, the stack depth of real sentences is small [1], and sentences that require deep stacks quickly become impossible for humans to understand.

For example,

Stack depth 1: The rat ate the cheese.

Stack depth 2: The rat [that the cat chased] ate the cheese.

Stack depth 3: The rat [that the cat [that the teacher liked] chased] ate the cheese

Stack depth 4: The rat [that the cat [that the teacher [that the student hated] liked ] chased] ate the cheese.

English speakers usually can't handle depths 3 or higher.

This kind of phenomenon has caused a lot of controversy in linguistics. There does seem to be some sense in which the stack-depth-4 sentence is grammatical English, even though basically no one can understand it. But for the purposes of practical NLP systems, this phenomenon means that if you can model stack depth of, say, 10, then you'll be able to model almost all the naturally-occurring sentences.

[1] https://www.jstor.org/stable/40057996

dragontamer · on Nov 5, 2020

But English speakers make games and songs out of this practice.

>> There's a fleck on the speck on the tail on the frog on the bump on the branch on the log in the hole in the bottom of the sea.

Which is a proper grammatical sentence. The key to the song is building it up. The song starts with: "There's a hole in the bottom of the sea", and the song builds the sentence up. Indeed: having a grossly complicated sentence is the "Fun" of the song, but that doesn't change the fact that its grammatically correct (and actually understood by the listener / singers).

skulk · on Nov 5, 2020

The examples in GP deepen and pop the stack before the sentence is over. Your example only deepens the stack-- it's like tail recursion vs non-tail recursion. I think that's what makes it a special case of deeply nested grammatical structure that's easy to understand.

There are no weird sequences of words "hated liked chased" that require backtracking to parse.

canjobear · on Nov 6, 2020

Exactly, "a fleck on the speck on the tail on the frog on the bump on the branch on the log in the hole in the bottom of the sea" only has stack depth 1 as long as you can do a kind of tail call optimization (aka left-corner parsing with an arc-eager transition, see https://www.aclweb.org/anthology/C92-1032.pdf)

pixl97 · on Nov 5, 2020

Is it only understood because of the particular build up?

Anything could be explained if it's broken into a ton of steps, with each step adding a new level, but very little daily language works like that.

jhardy54 · on Nov 5, 2020

> Once you have finite precision (standard floats), it has finite memory. Then stack depths are bounded of course, and also it is not Turing complete anymore.

My understanding is that Turing machines assume infinite memory -- could you help me understand how this differs? I'm guessing that I'm not understanding your point, but if "finite memory" is a disqualifier then Turing machines aren't Turing-complete.

albertzeyer · on Nov 5, 2020

Yes, for Turing machines (TM), or a machine which is as powerful as a Turing machine, you need infinite memory. A Turing machine always infinite memory (the tape is infinite). This is an important property of a TM. Once you have a finite tape only, it is strictly less powerful. That means, for any model / machine with finite memory, it is not as powerful as a Turing machine, i.e. not Turing complete.

So, (standard) RNNs with finite precision (standard floats) have finite memory, and thus are not Turing complete.

This paper (On the computational power of neural nets) shows that RNNs with infinite precision are Turing complete. But in practice, we never model RNNs that way. There are other ways we can add external (conceptually infinite) memory to a RNN. E.g. see neural turing machines (NTM) and differentiable neural computer (DNC).

gizmo686 · on Nov 5, 2020

I believe that the insights is that for practical purposes, you can restrict yourself to a bounded stack; under the assumption that a large enough constant means that you will not run into an input large enough to overrun your limit.