It's a shame that I like Y Combinator the organization so much, because I find the y combinator as a programming concept to be aesthetically displeasing.
It only makes sense in the untyped lambda calculus, where types are all conflated and errors are forbidden. It relies on the fact that you can take any x and "apply x to itself". These properties are essentially gimmicks of the untyped lambda calculus. It's like the saying "Everything should be made as simple as possible, but not simpler." The untyped lambda calculus, they made it even simpler.
Yes, you can reduce everything to a very tiny number of combinators, but at the cost of making everything ugly. It is much cleaner to use Lisp and reduce everything to a slightly larger set of primitives. Ironically (?) one of the best explanations of this is PG's own The Roots Of Lisp - https://languagelog.ldc.upenn.edu/myl/llog/jmc.pdf
Between lambda calculus and LISP, I find LISP to be the ugly one, not only because of its far greater complexity, but also its lack of referential transparency. David Turner / Ben Lynn make some good observations about LISP in section "History versus myth" of [1]. If lack of types is what you dislike of the plain lambda calculus, then Haskell is a much better solution than LISP.
You don't have a Lisp until you add to lambda calculus a representation of its syntax using nested lists of symbols, with APIs to manipulate the lists, a way to intercept and transform the lists that are being processed as code and an eval function.
This is actually a kind of double negative, because referential transparency is rather the lack of something: the lack of the ability to express a program in which an identifier refers to other than an obvious definition in an enclosing scope.
In Common Lisp, or a similar dialect, we can pose this question:
(let ((x 42))
(mystery-macro x))
can mystery-macro be written so that the enclosed x is other than 42? The answer is yes. To have referential transparency, we have to take away this power.
Referential transparency is not a "cool feature" we can add to a language, so then we can express new kinds of things. It just refers to something we are not allowed to do.
Referential transparency is a big feature in that it hugely aids in reasoning about your program. I agree that it takes away the power of making your code hard to reason about:-)
Ah right! The other aspect of referential transparency is the lack of side effects. If we can do x := x + 1, then the expression x isn't referentially transparent. If we relocate it from before the assignment to after, even though it refers to the same binding, it doesn't refer to the same value; the relocation doesn't changes its meaning. No imperative language can be referentially transparent, as such. But that's all the useful languages in which shit gets done.
All you need is compiler diagnostics about variable shadowing to make this academic issue entirely go away:
1> (defmacro mystery-macro (form) ^(let ((x 73)) ,form))
mystery-macro
2> (with-compile-opts ()
(compile-toplevel '(let ((x 42)) (mystery-macro x))))
** expr-2:2: warning: let: variable x unused
#<sys:vm-desc: a0db560>
3> (with-compile-opts (:warn shadow-var)
(compile-toplevel '(let ((x 42)) (mystery-macro x))))
** expr-3:2: warning: let: variable x shadows local variable
** expr-3:2: warning: let: variable x unused
#<sys:vm-desc: a0fd0f0>
4> (with-compile-opts (:error shadow-var)
(compile-toplevel '(let ((x 42)) (mystery-macro x))))
** expr-4:2: let: variable x shadows local variable
** during evaluation of form (compile-toplevel '(let ((x 42))
(mystery-macro
x)))
** ... an expansion of (progn (compile-toplevel '(let ((x 42))
(mystery-macro
x))))
** ... an expansion of (with-compile-opts
(compile-toplevel '(let ((x 42))
(mystery-macro
x))))
** which is located at expr-4:1
Moreover, this is a problem that mainly affects densely nested code. If you write small functions that are not nested beyond a couple of lexical levels or so, referentially opaque smells have nowhere to hide.
Example with macro generating code that unhygienically relies on list, which is redefined around the macro:
1> (defmacro mac (x) ^(list ,x))
mac
2> (with-compile-opts (:warn shadow-fun)
(compile-toplevel '(labels ((list ())) (mac 1))))
** expr-2:2: warning: sys:lbind: redefining list, which is a built-in defun
** expr-2:2: warning: sys:lbind: function list shadows global function
** expr-2:2: warning: list: too many arguments: max 0, given 1
#<sys:vm-desc: 9d0e4a0>
3> (with-compile-opts (:error shadow-fun)
(compile-toplevel '(labels ((list ())) (mac 1))))
** expr-3:2: warning: sys:lbind: redefining list, which is a built-in defun
** expr-3:2: sys:lbind: function list shadows global function
** during evaluation of form (compile-toplevel '(labels ((list ()))
(mac 1)))
** ... an expansion of (progn (compile-toplevel '(labels ((list ()))
(mac 1))))
** ... an expansion of (with-compile-opts
(compile-toplevel '(labels ((list ()))
(mac 1))))
** which is located at expr-3:1
The concept of referential transparency is meaningless if there is no shadowing. If an expression which refers to a certain binding is moved around in the program, it either continues to refer to that same binding, or at worst becomes a free reference (easily diagnosable). If there is no shadowing, constructs cannot surreptitiously intercept bindings.
If you consider it lacking in I/O, then it takes very little to add conventions for those. E.g. the Binary Lambda Calculus is a programming language [1].
This next a very simple Haskell program that just prints the first 8 non-negative integers:
main = putStrLn $ show $ take 8 $ iterate (+ 1) 0
The linked code is just an elaboration of that where instead of using the built-in "iterate" we define an equivalent "iterate2".
Usually to define iterate2 a Haskell programmer would use recursion (i.e., call iterate2 inside the definition of iterate2) but today we avoid the need for a recursive call by using the Y combinator.
This disproves the assertion that the Y combinator "only makes sense in the untyped lambda calculus" because the Haskell compiler gives a type to every entity in the linked code.
I would not call that the Y combinator. As traditionally defined the Y combinator is
Y = lambda f: (lambda x: f(x(x)))(lambda x: f(x(x)))
so it involves applying x to x. So clearly as shown here you can express in Python. The language has to not enforce sane types. But I would say it only makes sense in the untyped lambda calculus because in any other language there would be a more reasonable way to achieve the same effect. As you show in your example, because you achieve the same effect of finding a fixed point, by not using the Y combinator.
You can say that "fix" is not the Y combinator, but I've demonstrated in Haskell code that "fix" does the thing the Y combinator is famous for doing: namely, to eliminate the need for recursive calls. (In fact, I repeat the Haskell code in this comment.)
Just because your definition is syntactically different from mine does not mean the definitions are not equivalent.
My guess as to why Python programmers define the Y combinator the way you is that in Python a function's arguments are evaluated eagerly whereas in Haskell they are evaluated lazily with the result that my definition would tend to enter an infinite loop in Python (but even in a language with eager evaluation, I suspect a simpler definition than yours is possible, so I'm not sure what is going on with your definition, which is not surprising since I'm no Python expert).
That is all I will to say in reply to you, but I do want to take this opportunity to fix a problem with my previous comment, namely, the fact that although HN comments tend to persist forever, links to sites other than HN rot fairly quickly: I will now repeat here on HN the tiny Haskell program that in my previously comment I put behind a link. I'll also avoid the unnecessary use of "$", "putStrLn" and "show" to make the code easier to read by Haskell newbies.
Because the convention in Haskell is to reserve single-character variable names for local variables, we name the Y combinator "fix":
fix :: (a -> a) -> a
fix f = f (fix f)
Now to demonstrate the use of "fix". My go-to toy Haskell program is this next which prints the first 8 non-negative integers.
main = print (take 8 (iterate (+1) 0))
Unfortunately that is too much of a toy for our purpose here because I does not contain a recursive call, so we will pretend that the function "iterate" is not pre-defined in Haskell with the result that we would need to define it ourselves (and call it iterate2 because of course the name "iterate" is already taken). All the code in this comment has been tested; all four versions (the one above and the three below) of our toy program produce the same output.
main = print (take 8 (iterate2 (+1) 0))
iterate2 :: (a -> a) -> a -> [a]
iterate2 f x = x : iterate2 f (f x)
The final mention of "iterate2" above is a recursive call. We replace it with a (non-recursive) call to "fix" as follows:
main = print (take 8 (iterate2 (+1) 0))
iterate2 :: (a -> a) -> a -> [a]
iterate2 f = fix (\self -> \x -> x : self (f x))
And here is an intermediate form of our toy program that might help the reader understand how the form that uses "fix" was derived from the original form:
main = print (take 8 (iterate2 (+1) 0))
iterate2 :: (a -> a) -> a -> [a]
iterate2 f = loop where
loop x = x : loop (f x)
I have very similar feelings but in the exact opposite direction :)
The y combinator is a beautiful and satisfying thing that brings much joy when studied, the org is a bunch of cringe silicon valley vc types intent on owning the world…
> It only makes sense in the untyped lambda calculus,
You can use it in any language that has closures but doesn't have recursion to begin with. Granted, there's no reason to have a language like that, but fixing is a very common idiom in Haskell.
Note that with the standard representation of natural numbers in lambda calculus, the Church numerals [1], you don't even need the Y combinator to implement factorial:
fac = λn.λf.n(λf.λn.n(f(λf.λx.n f(f x))))(λx.f)(λx.x)
For example, applied to Church numeral 3 this gives (with F=(λf.λn.n(f(λf.λx.n f(f x))))):
Hey tromp, do you have any recommendations for grokking Lambda calculus properly?
I've read a few tutorials and introductions, and I understand the notions of alpha- and beta-reductions, abstractions vs application, but I still have issues with practical examples like the above.
Am I missing something here or should I just go back to the basics?
Have you tried actually implementing things with it? You're obviously not going to implement a web browser, but basic math, some sort of basic strings, lots of other things with a focus on "basic" because even with functions you're working at an awfully low level here?
You can read definitions and commentary on those definitions until the cows come home but until you actually use it you're not going to get it. Like most math.
> I still have issues with practical examples like the above.
What issues would those be? Do you see that λf.λx.n f(f x)) is the Church numeral n+1 ?
I think there's few better ways to grok lambda calculus than working through examples like these step-by-step.
> In "... a value that is mapped to itself by the function.", what do "mapped to itself", and, more specifically, "mapped" mean?
A mathematical function is something more general than a function in programming. In maths, a function consists of two sets known as the domain and codomain and a rule which "maps" things in the domain into the associated things in the codomain[1]. You can think about the domain as all the possible inputs to the function and the codomain being a set which at least contains all the possible outputs from the function.
So say the rule of my function is f(x) = sin(x) and for simplicity I say the domain is real numbers greater than or equal to zero but less than 2pi, then the function "maps" any value in that domain to a value in its codomain (which in this case will be real numbers from -1 to 1).
And what it's actually doing when it maps a number is it applies "sin" to that number to find the value in the codomain (which in computer science we would call the return value of the function).
So when the author talks about fixed points being values which are mapped onto themselves by the function they are values in the domain where you apply the rule of the function and you get back the same value. So for f(x) = sin(x), 0 would be such a value because sin(0) = 0. Say my function is f(x) = x^3, then 0, 1 and -1 are fixed points, because f(-1) = (-1)^3 = -1, and you can see the same applies to f(0) and f(1).
As a bonus, if my rule is f(x) = sin(x), then "x" is a "parameter" or "argument" to the function, because it is the name I give to a value that is passed into the rule of my function. It is also a variable because it's a name for a thing which varies but not all variables get used as parameters to functions. For example if I write y = f(x), then y is a variable which is not any kind of parameter for example.
Hope that helps.
[1] I suppose technically into the "image set" of the function, which is a subset of the codomain comprising the actual output values of the function only.
As a slight additional aside I realised it might be interesting for some to understand some parallels between functions in programming and functions in maths.
For programmers, you can think of the rule of the function as being like the function body when you write a function in a computer program. The codomain is like the return type of the function (except in maths you're really talking about a set and in maths a set and a type are somewhat different things).
The domain is like the type of the function inputs (although it's also a set not a type). So if you have a function where the rule is f(x) = x^2 you can hopefully see that if the domain is the real numbers that's a different function from if the domain is the complex numbers.
I mentioned the image set of the function in my previous post, so just to finish off the explanation, say we have f where the domain and codomain are the reals and the rule of f is f(x) = x^2. The image set of f is actually going to be all the real numbers greater than or equal to zero (even though the codomain is all real numbers). The reason for this is if you (hypothetically) went through every value in the domain and applied the rule you would always get a value from zero to positive infinity. None of the real numbers when mapped by the function (squared) will return a negative number.
The domain is the type of all the inputs, so if you have a function which takes multiple inputs, the domain will be a set of tuples (think of them like coordinates) and likewise for functions which have multiple outputs, the co-domain and image set will contain tuples.
This explains the basic structure and definition of terms pretty well, i.e. body, variable, expression, head etc. Has links to more comprehensive coverage:
A 'function' can't call itself (by name or by any other method), but since 'functions' can call each other, and can make each other to call each other, recursion is mostly trivial. You try it, then you'll get a working recursion one way or another. (Lambda calculus don't operate on functions, but on expressions/lambda terms.)
let odd'(n, odd, even) = if n == 0 then False else even(n-1, odd, even)
let even'(n, odd, even) = if n == 0 then True else odd(n-1, odd, even)
let odd(n) = odd'(n, odd', even')
let even(n) = even'(n, odd', even')
Look ma, no hands! Well, actually, it is explicit closure-conversion done by hand so... anyhow, there are more straightforward and performant ways to get recursion in practice.
But you don't need to solve these, they're solved already: these four definitions are non-recursive. Yet when evaluated, they will exhibit properly recursive behaviour.
The only reason to use Y combinator in practice is when you for some reason don't want to keep manually passing the function to itself like "func fact(self, n) { return (n < 1) ? 1 : n * self(self, n-1) }; print(fact(fact, 5))" — maybe because it's tedious and error-prone, — and don't have a sufficiently ergonomic term-rewrite system at hand that would do this for you.
Yes, I realize that. However, the alternative using a variadic fixed point combinator looks slightly cleaner and would (optimally) reduce to the same term. For example, using a list-based vfix:
even' _ odd n = if n == 0 then True else (odd (n - 1)))
odd' even _ n = if n == 0 then False else (even (n - 1))
even = head $ vfix [even', odd']
odd = tail $ vfix [even', odd']
Here, the functions don't need to be passed explicitly to the "recursive" calls. I prefer this a lot, it makes my lambda functions much more readable.
I had the chance to follow Goubault-Larrecq's very nice course on Logic and Computation, that went over this in a very similar way. Unfortunately, the whole material is in french, but might still interest some : http://www.lsv.fr/~goubault/Lambda/loginfoindex.html
(The relevant part is the "Lambda calcul pur" one")
I'd really like to understand why my brain screams NO when I try to grok lisp and especially the Y combinator. It's like it's anti-interesting to me or something.
I know it's cool, and useful to know, but I just can't get there. Maybe it's a stack overflow in my head?
Is it just me?
[Edit] Re:082349872349872 - I grew up in a world of assembler programming, BASIC and Pascal. It wasn't unusual to have all sorts of self-interacting code, and deliberate use of the same half of a routine in more than one function.
I get why Lambda is useful, why bother naming a function you're only going to use once. But it seemed like infinite pedantry to go about using it to do recursion... thanks!
[Edit2] Re:082349872349872 - I once came upon old code that lived in a S-100 disk controller, it seemed really weird until I realized they couldn't call a subroutine because they couldn't assume there was a properly set stack. It used one of the 8080 registers as a return pointer.
If you grew up programming in a world where self-reference is (due to linking pass or interpretation) trivial, the Y combinator is nearly useless: why should a function not be able to call itself, like we do every day?
However, if you're a pedant, and only accept definitions that are in terms of things other than themselves, then you'll need something like the Y combinator to even define, let alone invoke, recursive functions.
(there was a period* of a decade or two where systems people wrote recursive functions and theory people said "yes, it works in practice, but does it work in theory?", but now even theorists have a pedantic way to both describe functions in terms of themselves [syntactically] and agree that such a description denotes a unique function [semantically], so now we have our cake and eat it: a toolchain will go right ahead and generate fixups for the object that point back into itself, while a thesis will drop names like Tarski or Brouwer and maybe typeset another greek character or two)
* which may have overlapped with the period where people preferred one-pass to two-pass tools because manually stuffing paper tape back into the reader between passes was a pain?
EDIT: extreme pedantry: it surprises me that there was also a decade or so between LABEL and LABELS; going from single recursion to mutual recursion is a very small change in the underlying interpreters, yet even lispers were too pragmatic to have bothered to do so.
EDIT2: upon reflection, this was also the period of time where having a stack was viewed as a complexity which was certainly convenient but not exactly necessary. With the benefit of hindsight, I'd argue that mutual tail recursion is useful even without a stack, but can see that at the time that might've been viewed as architecture astronaut talk.
To the other fine replies, I'd add what I posted elsewhere: Have you actually tried to implement things in lambda calculus? Until you do it won't ever really click.
That said, to be clear, I haven't and I don't plan to, so don't read this as a moralistic claim that you "should" do this. To the extent that I care about lambda calculus itself at all, it is that I accept it as yet another Turing-complete representation, and then move on, because it's only marginally more useful to write things in lambda calculus than in Turing machines directly. This is a true logical if-then statement: If you want to understand this, then you'll need to actually work with it for a bit. But I have no comment as to whether you should want that.
No, you're probably not alone. I myself enjoy functional programming quite a lot but Y combinator always stood out to me as a particularly uninteresting trick. Sure, it's nice to have it around when you try to prove some theoretical stuff but for practical work letrec is just better.
I mostly feel the same about the Church encoding of e.g. lists: yeah, it's a nice theoretical trick that you can equate a list xs with a function that gives your its fold but... that's just so inconvenient to actually use! Why not equate it with either a nil or a tuple of head and tail instead?
But your profile says you have been programming since 1979, so I humbly defer to your experience. If you have made it this far without Lisp, I guess you don't need it, friend :-)
Take it from a younger programmer, you don't have to grok lambda calculus to understand Lisp. I like Lisp, but I only have a vague notion of lambda calculus and all that academic gobbledygook. I still do not understand the Y combinator, nor I very much care to. CS theory is good and all, but it's importance is often overstated. Physicists are not engineers and engineers are not physicists; you can be a very effective programmer without knowing about Peano numbers and what monads are.
The Y combinator is not really a "lisp thing", it's more of a "functional programming" thing. In Lisp, symbols and binding values to a symbol are basic elements of the language, so it usually wouldn't be natural to use the Y combinator.
Well, I have to confess I was misled by the title! I was under the impression that the article was about YC, but it turned out to be about mathematics. Ironically, the below quote is still relevant to YC, I believe:
> 8. Applications of the Y combinator
> You might be wondering what makes the Y combinator so special.
The name Y Combinator is a metaphor: startup growth is "recursive" in a sense and Y Combinator builds such "recursive" companies out of a non-recursive input (the founders themselves)
Also Paul Graham is a Lisp nerd
(Note, I tried to find an authoritative source on how the name was chosen - the only thing I found was https://paulgraham.com/ycstart.html that lists when, but not how the name came to be; but I think this metaphor is somewhat obvious)
> Well, I have to confess I was misled by the title! I was under the impression that the article was about YC, but it turned out to be about mathematics.
What gave it away for me was the define article "the". The "the" would have been absent had the article been about the company.
That was very difficult to write while avoiding puns!
(1) Black on white is much easier to read than white on black. That includes code snippets in blocks.
(2) If you insist on ignoring (1), at least make your white text bright enough that it can be read without serious eye strain. Particularly if the white on black is code snippets surrounded by black on white article text.
As the last sentence of my (2) shows, the article itself is black on white for me too, but the code snippets are white (very faint white that I can't read without serious eyestrain) on black.
It only makes sense in the untyped lambda calculus, where types are all conflated and errors are forbidden. It relies on the fact that you can take any x and "apply x to itself". These properties are essentially gimmicks of the untyped lambda calculus. It's like the saying "Everything should be made as simple as possible, but not simpler." The untyped lambda calculus, they made it even simpler.
Yes, you can reduce everything to a very tiny number of combinators, but at the cost of making everything ugly. It is much cleaner to use Lisp and reduce everything to a slightly larger set of primitives. Ironically (?) one of the best explanations of this is PG's own The Roots Of Lisp - https://languagelog.ldc.upenn.edu/myl/llog/jmc.pdf