λ - the everything¶
If Alonzo Church were asked “how far does the computational rabbit
hole go?”, he might have answered “It’s λ
all the way through.”.
Alonzo Church, in the 1930s (yes, that’s getting to be a century ago!), and his students formulated what could be called the essence of anything computable. The insight they had was that given functions and function applications, anything that can be computed is expressible. Note that we aren’t talking about functions of numbers or strings or booleans or anything else. The only entities in Church’s world are functions, and the only thing you can do with them is to apply them to other things (which are … functions) … and he claimed that all of what’s computable is expressible using these! What’s known as the Church-Turing thesis is the proof that Church’s “λ calculus” is equivalent in its universality to Alan Turing’s “Turing machine”.
Take a moment to digest that large claim, for the λ
construct in
Scheme is directly based on that foundational insight. In studying programming
languages, we repeatedly encounter this idea of “we can express X in terms of
Y” and where we make larger sets of these “X”s for our convenience where a much
smaller set of “Y”s will suffice for the job. This “extra”, we call “syntactic
sugar”, which draws from Alan Perlis’ joke - “Syntactic sugar causes cancer of
the semicolon.” – made in reference to unessential syntactic cruft often added
on to programming languages by their creators without a taste for minimalism.
The process of taking an expression that’s rich in “syntactic sugar” and producing an equivalent expression using fewer “primitives” is something we’ll refer to as “desugaring”. Such a reduction is useful because it is usually easier to reason about properties of a program if we have to deal with fewer basic entities in it, than if we complicate it with … well, “syntactic sugar”. Since desugaring is a kind of translation, it could also be taken as a kind of “compilation”, but in case that scares you, don’t worry and just stick to “desugaring”.
The point of λs is that λs are at the end of the entire stack of sugarings. The buck stops with them.
So what’s λ
?¶
In Scheme, a “λ expression” is an expression of the general form shown below –
(λ (..<list-of-var-names>..) <expr-using-vars-in-the-list>)
It represents a packaged computation that can be performed when given concrete
values for the variables listed after λ
in the above expression.
For example, (λ (x) (* x x))
is the function that squares numbers.
When given a number 2
, by substituting x
with 2
, you can
compute the result of what the λ expression denotes – which is (* 2
2) = 4
.
There are two main rules you need to know when working with λ
expressions (the abstract ones).
- α-renaming
If
(λ (x) E[x])
is a λ expression in one argument whereE[x]
denotes some expression involving the variablex
, then you can change the name ofx
to anything else and what the expression denotes is considered to be the same. i.e. You can rewrite it to(λ (y) E[y])
(where we use the sameE
) and it means the same thing … as long as E doesn’t already usey
as a free variable.
- β-reduction
If you have an expression of the form
((λ (x) E1[x]) E2)
whereE1[x]
is an expression that (optionally) uses the variablex
andE2
is some other expression, then it is equivalent toE1[E2]
. i.e.((λ (x) E1[x]) E2) => E1[E2]
It is important to note that when we say “is equivalent to”, it means you can rewrite a sub-expression that looks like one side to the other form anywhere. We refer to the above left-to-right rewrite as “β-reduction” and the corresponding right-to-left rewrite as “β-abstraction”.
Just as λs offer a conceptual basis of all of computation, all abstraction in computing can be seen through β-abstraction.
Note
“β-reduction” can be considered a fancy term for “substitution”,
for that’s what it is. We don’t have a corresponding simple term for the
opposite transformation though. So we’ll continue to call it
“β-abstraction”. We’ll refer to the transformation E1[E2] =>
((λ (x) E1[x]) E2)
as “β-abstracting over E2
”. In most cases,
when we’re performing such a transformation, we’re no longer really
interested in the E2
and will usually focus on the preceding
(λ (x) E1[x])
and loosely talk about that as the β-abstracted
expression.
Warning
When performing a β-reduction step in Scheme, you need to
be careful not to substitute symbols within a quote
sub-expression.
For example, ((λ (x) (quote (+ x x))) 3)
reduces to the list
'(+ x x)
whereas ((λ (x) (+ x x)) 3)
reduces to (+
3 3) = 6
.
We’ll use λ instead of lambda
for brevity.
Take the expression ((λ (x) (* x ((λ (x) (- x 1)) x))) 10)
and
try to apply the reduction rules. If you took the “β-reduction” rule in the
naive way, you might end up with (* 10 ((λ (10) (- 10 1)) 10))
and
then scratch your head about what you have at hand and what to do with it next!
To do this correctly, you must see that the original expression is the same as
((λ (x) (* x ((λ (y) (- y 1)) x))) 10)
.. where we’ve “α-renamed”
the inner λ’s x
variable to y
, because, well they’re supposed to
be equivalent right? If you now do β-reduction on this equivalent expression,
you won’t be left with the confused expression.
Also, a β-abstraction step must be performed in a manner that is compatible
with α-renaming. This means that abstracting over (+ x x)
in ((λ
(x) (> (+ x x) 25)) 42)
does not get you ((λ (y) ((λ (x) (> y 25)) 42))
(+ x x))
. This is because in the original λ
expression, you can replace
the x
with any other symbol and the meaning of the expression remains
unchanged. However, in the wrongly β-abstracted form, the expression itself
doesn’t make sense as x
in the outermost position is actually undefined.
So the thumbrule here is – “you can’t pull a bound variable outside of its
introduction point”. The correct way to β-abstract over that expression is to
first perform the abstraction within the λ
to get ((λ (x) (> ((λ
(y) (+ y y)) x) 25)) 42)
and then β-abstract the inner λ
to get
((λ (f) ((λ (x) (> (f x) 25)) 42)) (λ (y) (+ y y)))
. Now you can see
that the expression (λ (f) ((λ (x) (> (f x) 25)) 42))
is independent of
the addition calculation in (+ x x)
and therefore we’ve generalized the
expression over the (+ x x)
calculation.
So the two rules are taken to be always applicable in evaluating an expression and all correct applications of the rules must evaluate to the same result no matter the sequence in which they’re applied.
Tall claims need taller evidence¶
Back to Church, what he made was a tall claim – that all computable functions
are expressible in terms of λs. When we make such a claim, we have to back
it up though. To recap, he’s saying that you don’t need cons
, car
,
cdr
, if
, let
, cond
, booleans or numbers or strings or whatever
we’re used to in normal programming. He claimed that all of these are
representable using λs alone .. and showed how to do it.
We’ll now work through how to represent basic things in terms of which we can build a whole computational edifice.
Pairs¶
Pairs are the simplest of data structures. Once you can make a pair of
two things like (pair a b)
or the equivalent in a programming language,
you can get lists using –
(pair a (pair b (pair c ... (pair x sentinel))) ...)
where we use a sentinel
to indicate end of the list. You can also make
trees using nested structures like –
(pair (pair a b) (pair (pair c d) (pair e f)))
Or tables as a list of lists. Or even graphs. So if we can show we can
represent pairs using just λ
, we’re good with the other structures.
(define pair (λ (x y) ...))
What should we put within the ...
? In fact, what can we put in there
when all we have are functions (i.e. λ expressions)? So we’re now
looking at –
(define pair (λ (x y) (λ (p) ...)))
Again, what can we put in there? We have a p
and some two arbitrary values
x
and y
that we’re expected to “store” in the pair. About the only
thing we can do (apart from nesting λ once more, which would seem
pointless) is to apply p
to the x
and y
.
(define pair (λ (x y) (λ (p) (p x y))))
We can now make “pairs” like below –
(define p1 (pair 12 100))
(taking the liberty to use numbers just to illustrate). Since p1
is a function
that takes one argument, the only thing we can do with it is call it. Since its
argument is also a function that’s applied to 2 arguments, let’s consider
some simple 2-argument functions shown below –
(define .first (λ (x y) x))
and
(define .second (λ (x y) y))
The functions ignore one of their arguments and just evaluate to the other.
Now what happens when you apply p1
to these two functions.
(p1 .first)
=> ((pair 12 100) (λ (x y) x)) ; substitute their definitions
=> (((λ (x y) (λ (p) (p x y))) 12 100) (λ (x y) x))
; β-reduce the first term
=> ((λ (p) (p 12 100)) (λ (x y) x))
; β-reduce the expression again
=> ((λ (x y) x) 12 100)
; β-reduce the expression again
=> 12
Exercise
Work it out similarly and show that (p1 .second)
results in 100
.
So we have a function now named pair
that can make so-called “pair objects”
and we can get the individual values out of the pair object using the
“accessor” functions .first
and .second
.
Many of you are familiar with “object oriented languages” like Python and will
see the reasoning behind naming the accessor functions that way .. since the
expression (p1 .first)
looks very similar to p1.first
typical of such
languages.
Note
The ones with a careful eye might’ve noticed that while we claimed to
only use λs, we ended up using define
in the above definitions. We
use it here only as a substitute for writing the mathematical definitional
equality \(pair = (λ\ (x\ y)\ (λ\ (p)\ (p\ x\ y)))\) and because it
actually permits you to type it into Racket and check things out for
yourself. We therefore lose no generality by using define
in the above
code. Also, λ calculus deals only with one-argument functions and we’ve
used two here. However, \((λ\ (x\ y)\ E[x,y])\) can be mechanically
rewritten to \((λ\ (x)\ (λ\ (y)\ E[x,y]))\) with corresponding changes
to substitution steps without loss of logical correctness. So we’ll take
that additional liberty here too.
Booleans¶
From this section on, it will be valuable for us to use the #lang lazy
language instead since we’re going to be doing equational reasoning which will
work only in a lazy scheme and not when using eager evaluation. The syntax and
meaning are generally the same, except that the values of expressions will be
computed only when they are needed and not before.
Exercise
The only place we use boolean values is to do a branch within an if
condition. So if we can implement if
purely using λ
, we’re
good. For this exercise, you’ll need to consider “lazy evaluation” instead
of “eager evaluation” to keep things simple. In fact, for the rest of this
demonstration, we’ll use lazy evaluation with #lang lazy
. The earlier
ones will also work with #lang lazy
. So complete the definition below –
(define IF (λ (bool then-expr else-expr) ....))
Remember the trick we used with pair
. You have all you need in that
code.
let
¶
It is quite easy to see that we can rewrite let
expressions using λ
.
(let ([var1 expr1]
[var2 expr2]
...
[varn exprn])
<body-using-var1..n>)
Can be rewritten as –
((λ (var1 var2 ... varn)
<body-using-var1..n>)
expr1 expr2 ... exprn)
So let
is just “syntactic sugar” on top of λ - i.e. is for our
convenience without offering additional “expressive power”. These notions will
become clearer (and more formal) as we go along. For now, if you have a sense
of what they are, that’s sufficient.
Numbers¶
Numbers are a big one to claim to be representable using λ
alone!
Numbers (i.e. basic arithmetic with whole numbers) hold a “threshold” place in
mathematical logic too – that every “formal system” [1] is
representable using numbers.
In λ calculus, all we have are functions and function application. What can we apply functions to? The answer to that question is also “functions”! So how can we capture the idea of natural numbers using functions alone?
Given a function, what can we do with it? We can apply it to some value. What kind of a value can we apply it to (at least within λ calculus)? We can apply it to another function. So Alonzo Church came up with a representation for numbers as the idea of applying a function a certain number of times.
If we consider applying a function f
to a value x
a number
of times, we could write that sequence as –
x ; 0
(f x) ; 1
(f (f x)) ; 2
(f (f (f x))) ; 3
;... and so on
But we don’t know what these f
and x
are. The only thing we know
about them is that the function f
must have the property that its domain
and co-domain are the same. The nice thing here is that you can “β-abstraction”
on the two in order to postpone the problem of what values we want them to take
on. So instead of the above, we can consider the sequence below as a
representation of numbers –
(λ (f) (λ (x) x)) ; 0
(λ (f) (λ (x) (f x))) ; 1
(λ (f) (λ (x) (f (f x)))) ; 2
(λ (f) (λ (x) (f (f (f x))))) ; 3
;... and so on
Observe by reading the λ expression for each “number” that a Church
numeral n
stands for the idea of “n applications of f on x” given some
f
and x
.
We can’t exhaustively list all such numbers. Even if we could, that wouldn’t capture the structure inherent in the numbers that’s laid out in Peano’s axioms -
“Zero” is a number
Every number has a “successor”.
Let’s now try to apply Peano’s axioms to capture the idea of successorship for Church numerals.
(define ch-zero (λ (f) (λ (x) x))
(define ch-succ (λ (n) ...))
How should we now define ch-succ
? Before we get there, let’s pull in
some preparatory functions that we encountered before –
(define pair (λ (x y) (λ (p) (p x y))))
(define .first (λ (x y) x))
(define .second (λ (x y) y))
(define swap (λ (p) (pair (p .second) (p .first))))
; The function composition operation .. as a function
(define comp (λ (f g) (λ (x) (f (g x)))))
Note
Try to define ch-succ
yourself before reading on, for you have spoilers below.
Let’s write out in words what the expression (ch-succ n)
for some specific
Church numeral n
is supposed to mean – “n+1 applications of some function f on an x”.
In other words, if we have “n applications of some function f on an x”, we need to apply
f once more on that to get “n+1 applications of some function f on an x”.
To make things concrete, let’s look at the definition for “3” and see if we can express it in terms of our definition for “2”.
(define ch-two (λ (f) (λ (x) (f (f x)))))
(define ch-three (λ (f) (λ (x) (f (f (f x))))))
; See that the expression (f (f x)) is ((ch-two f) x)
; Replacing the inner most (f (f x)) in ch-three with ((ch-two f) x)
(define ch-three (λ (f) (λ (x) (f ((ch-two f) x)))))
It’s not hard to see now that we could do that for any pair of \((n,n+1)\).
(define ch-nplus1 (λ (f) (λ (x) (f ((ch-n f) x)))))
What we want for our ch-succ
function is for the relation “(ch-succ ch-n) = ch-nplus1
to hold. So if we β-abstract over ch-n
, we get –
(define ch-nplus1 ((λ (n) (λ (f) (λ (x) (f ((n f) x))))) ch-n))
; Then due to the equality which we just stated above, we have
(define ch-succ (λ (n) (λ (f) (λ (x) (f ((n f) x))))))
; We can simplify it further though. Notice that
; (λ (x) (f ((n f) x)))
; is just the function composition of f and (n f).
; i.e. (λ (f) (comp f (n f))) = (λ (f) (λ (x) (f ((n f) x))))
; Therefore we can also write -
(define ch-succ (λ (n) (λ (f) (comp f (n f)))))
I hope it is much easier to read the last definition as “n applications of f followed by one more” (reading the function composition from right-to-left).
We’ll take a break here and define two utility functions outside of Church’s λ calculus that will help us make Church numerals and display them in notation we understand - i.e. as decimal numbers.
(define (i->ch i)
(if (equal? i 0)
ch-zero
(ch-succ (i->ch (sub1 i)))))
(define (ch->i n)
((n add1) 0))
We can now use i->ch
to make Church numerals given Scheme numbers and
ch->i
to make Scheme numbers given Church numerals.
Ok how about adding two Church numerals? Again, try to figure it out yourself before reading on.
(define ch-add (λ (m n) ...))
Given an n
(a Church numeral), we can express the idea of “m+n” as
“m applications of ch-succ
on n”. This translates easily enough to
a λ expression like below –
(define ch-add (λ (m n) ((m ch-succ) n)))
Let’s up the game now. How do we implement multiplication of Church numerals? i.e.
a two argument function ch-mul
used as (ch-mul m n)
.
If (n f)
(for a given f
) yields n
applications of f
,
then we need to do this m
times. That’s an easy enough expression too.
(define ch-mul (λ (m n) (λ (f) (m (n f)))))
However, the inner part of that (λ (f) (m (n f)))
looks very familiar
doesn’t it? It is simple (comp m n)
. So we have.
(define ch-mul (λ (m n) (comp m n)))
Or to put it even more simply, (define ch-mul comp)
!! i.e. the multiplication
operation for Church numerals is simply the function composition operation!
I’ve been avoiding a problem so far though – how would we do subtraction? To
do that, we’ll need to implement (ch-pred n)
which behaves such that
(ch-succ (ch-pred n)) = n
. Since we don’t have the capability to check
for equality yet, we cannot search the natural numbers starting from
ch-zero
and work our way upwards until we find a value k
such
that (ch-succ k) = n
. We also don’t know how to compute the “inverse of
a given function f
” in the general case, so we can apply the inverse
after n
applications.
This problem apparently stumped Church too. However, his student Stephen Kleene came up with a solution to it. His solution was to use pairs of Church numerals in a particular sequence - the first number in the sequence is \((0,0)\) and if an entry is \((m,n)\), the next entry in the sequence is \((n,n+1)\). This gives us the following sequence –
(0,0) ; 0
(0,1) ; 1
(1,2) ; 2
(2,3) ; 3
(3,4) ; 4
...
In the above sequence, the first value of the pair gives the predecessor of the second value which is the same as the row number. The only irksome bit in this that we have to put up with is that we have to assume that “the predecessor of 0 is 0”.
So if we define k-zero
as (define k-zero (pair ch-zero ch-zero))
and (define k-succ (λ (kp) (pair (kp .second) (ch-succ (kp .second)))))
,
we can produce the sequence through repeated applications of k-succ
on k-zero
. That’s a concept we already understand. So to produce
the row corresponding to number n
, we need to do ((n k-succ) k-zero)
.
Thereafter, all that remains is to pick the first value of the pair to get the
predecessor of n
. So …
(define k-zero (pair ch-zero ch-zero))
(define k-succ (λ (kp) (pair (kp .second) (ch-succ (kp .second)))))
(define ch-pred (λ (n) (((n k-succ) k-zero) .first)))
Exercise:
Define (ch-sub m n)
for \(m >= n\) using ch-pred
.
Exercise:
Can you come up with a representation for integers? – i.e. numbers that can be positive or negative or zero. You’ll also have to implement the corresponding addition, subtraction, multiplication and division operators. You can throw in a “negation” too.
Interlude on β-abstraction¶
You’ve seen above how useful β-abstraction turns out to be when
exploring representations that we do not initially fully understand. We were
able to postpone specific choices of functions until we understood things
better, we could transform expressions to extract common patterns, etc. As
mentioned earlier, all abstractions boil down to β-abstractions at the end.
This means you can use β-abstraction to great effect when when working with
domains that you’re just about beginning to understand. That’s useful even if
you are not using a functional programming language, because once you construct
those abstractions, it is usually a mechanical matter to translate them into
other languages that may not be functional. How can we be sure of that? That’s
what this whole section is about – that λ
is enough to represent
all of computation, so any general purpose language (i.e. “Turing complete
language”) can be understood in terms of it.
The key to exploiting β-abstraction is practice.
Recursion¶
Recursion underlies all repetition in Scheme – in the sense that you
can express any looping construct using recursion. However, we have a problem
at this point. We typically define a recursive function such as sqrt/rec
using define
like this –
(define sqrt/rec
(λ (n xk eps)
(if (< (abs (- (* xk xk) n)) eps)
xk
(sqrt/rec n (* 0.5 (+ xk (/ n xk))) eps))))
Note
We’ll use the sqrt/rec
function to illustrate, but whatever
we’re doing with that we can also do to any other recursive function
definition you may want to solve. I picked this ‘cos I gave this function
to you to practice recursion.
Scheme works with this definition just fine, but that’s because it already provides
a mechanism for you to assume the existence of the inner sqrt/rec
function
when typing to evaluate a particular call. Somehow, the repeated unfolding of the
code is avoided by using names to tie the function’s structure to itself. We don’t
have that concept in λ calculus and so will need to show that we can do this
without such a naming+delayed-binding trick.
So, for our purposes, we do not know what function to use to effect the recursive
call within the body of the above sqrt/rec
definition.
By now, you should’ve already guessed what we’re going to do when we’re faced
with an unknown like this. Yup - we’ll β-abstract over sqrt/rec
!
(define cheat
(λ (f)
(λ (n xk eps)
(if (< (abs (- (* xk xk) n)) eps)
xk
(f n (* 0.5 (+ xk (/ n xk))) eps)))))
Now, we can see how sqrt/rec = (cheat sqrt/rec)
, provided we know
sqrt/rec
already (hence the name “cheat”). To find out sqrt/rec
given cheat
, we need to “solve” the above equation. Because applying
cheat
to sqrt/rec
produces the same function, sqrt/rec
is
called the “fixed point” of cheat
. In mathematics, a fixed point of a
function \(f(x)\) is a value \(x\) such that \(x = f(x)\).
However, our cheat
function is not of much use though it captures the
essentials of the algorithm. We called it “cheat” because to get the
sqrt/rec
function from it, you have to pass it to it in the first place,
which seems to defeat the point. What we really want is for the whole machinery
of the (λ (n xk eps) ...)
part of cheat
to be available in place
of f
when we’re calling it. Since cheat
is fully defined (we do
not refer to it recursively), what if we could just pass it to itself as an
argument (bound to f
)?
Another way to ask that question is “what if we just had an extra argument to
sqrt
function and we just passed sqrt
itself in its place – like
this –
(define sqrt/norec
(λ (f n xk eps)
(if (< (abs (- (* xk xk) n)) eps)
xk
(f f n xk eps))))
So you can calculate square-roots using (sqrt/norec sqrt/norec 64 64
0.1)
. This actually lets us do recursive function calls without using a
recursive definition! However, it is somewhat awkward to pass this additional
argument all the time. Let’s see how we can improve it. First, we can
lift that f
argument out so we can “Curry” it like this –
(define good
(λ (f)
(λ (n xk eps)
(if (< (abs (- (* xk xk) n)) eps)
xk
((f f) n xk eps)))))
… and we can now do our square-root algorithm using good
like this –
((good good) 64 64 0.1)
; Prints out 8.005147977880979
; which is an approximate square root indeed.
Now, you see that sqrt/rec = (good good)
.. which is … good as we have
an explicit function that behaves exactly as our original recursive definition
… without any extra arguments.
To summarize, we’ve now figured out a trick by which we can turn a recursively defined function into one that isn’t recursive but can effectively accomplish the same result.
(define some-function/rec (λ (a) ... (some-function/rec next-arg) ...))
; can be transformed into
(define some-function/norec (λ (f) (λ (a) ... ((f f) next-arg) ...)))
; .. so that some-function/rec can now be defined in terms of
; some-function/norec as --
(define some-function/rec (some-function/norec some-function/norec))
; Note that the number of sites at which the recursive call happens
; does not matter. We replace all of them with (f f).
The journey isn’t finished yet¶
We’ve now shown that you can express recursive calls using λ
alone.
Mission accomplished! However, don’t forget our larger claim that anything
computable can be expressed using λ
. In this case, what we just saw
is how we can start with a recursively defined function (given as a spec
similar to cheat
) and mechanically transform it into the true
recursive function. If we’ve truly “mechanized” it, then we should be able to
express that transformation as a function, right?
Though we called our original funciton “cheat”, we’re being a bit unfair to it,
because it serves as a specification for how the recursion is to proceed. It
captures all the details of the algorithm we intended to write down, except for
exactly which function to use to recurse. Furthermore, our desired
sqrt/rec
is a fixed point of this function, which is simple enough to
write. So we can now ask – “If I give you such a cheat
function, can
you calculate sqrt/rec
mechanically?”
We can also see that (good f) = (cheat (f f))
through simple β-reduction.
In fact, we got (good good) = (cheat (good good))
from that in the
first place.
We can now rewrite that way of stating good
as –
(define good (λ (f) (cheat (f f))))
And we can now express our desired sqrt/rec
function as just –
(define sqrt/rec (good good))
If we then β-abstract on good
, we get –
(define sqrt/rec ((λ (f) (f f)) good))
; =>
(define sqrt/rec ((λ (f) (f f)) (λ (g) (cheat (g g)))))
; => β-abstract on "cheat" =>
(define sqrt/rec ((λ (s) ((λ (f) (f f)) (λ (g) (s (g g))))) cheat))
So, we actually now have a function that we can apply to our easy-to-define “spec” function in order to get our recursive result! This function that we’ve figured out above is called the “Y combinator”.
(define Y (λ (s) ((λ (f) (f f)) (λ (g) (s (g g))))))
(We’re using s
as the variable name to suggest “spec function” for
cheat
.)
A way the Y combinator is usually presented is with one β-reduction applied which gives us a nice symmetric form –
(define Y (λ (s) ((λ (g) (s (g g))) (λ (g) (s (g g))))))
And we have (Y cheat) = (cheat (Y cheat))
. This is why the Y combinator
is said to be a “fixed point combinator” because it calculates the
fixed point of the given function. So all you have to do now is to express your
recursive function using an “unknown f
” and then have the Y-combinator
figure out what f
to pass to it.
Can we just solve for the combinator?¶
While we originally tried to solve for sqrt/rec
given the equation
sqrt/rec = (cheat sqrt/rec)
, we turned the problem into finding a
function of cheat
that can produce sqrt/rec
. i.e. we were
actually looking for a function F
such that -
(F cheat) = (cheat (F cheat))
; i.e.
(define F (λ (cheat) (cheat (F cheat))))
Now we have a recursive “solution” for F
. If we now apply the same
technique/trick that we used to turn cheat
into a non-recursive function
good
, we have –
(define G (λ (f) (λ (cheat) (cheat ((f f) cheat)))))
; and
(define F (G G))
Now, let’s look at the full expression for (G G)
with less judgemental
variable names –
(define F ((λ (f) (λ (s) (s ((f f) s)))) (λ (f) (λ (s) (s ((f f) s))))))
(We’re again using the variable named s
to denote the “spec function”
cheat
.)
This looks like a different function compared to Y
we figured out
earlier that also has the property (F s) = (s (F s))
just like (Y
s) = s (Y s)
. The difference between the two is this – since we only used
β-abstraction to come up with F
, we can see how evaluating (F s)
simply β-reduces to (s (F s))
, whereas with the Y
combinator,
(Y s)
and (s (Y s))
give us the same expression. F
is
therefore a valid combinator in its own right and is called the “Turing
combinator”, usually denoted by \(\Theta\).
Can we not be lazy please?¶
We’ve so far been using the #lang lazy
for all the above work on
recursion. If you want to, you can try to see if the Y combinator as defined
above will work with eager evaluation by switching the language to #lang
racket
. You’ll find that you’ll get a stack overflow as Y
tries to
repeatedly expand itself without stopping. The benefit of laziness for the
definition of Y
is that the expansion only happens when it is needed,
i.e. in the part of the spec function that actually makes a recursive call.
When the termination condition is hit, no further expansion of Y
is needed and the recursion stops.
We can achieve the same effect in the eager evaluation mode by wrapping the
expansion in another λ
. To do this, we need to see that (λ (x) (f
x)) = f
for a function f
whose expression does not make use of the
outer variable x
– i.e. it does not contain x
as a “free
variable”, with “free” meaning “unbound”.
Note
The transformation (λ (x) (f x)) => f
when f
does not
contain x
as a free variable is called η-reduction (“eta-reduction”).
This transformation can be done both ways. I haven’t traced the history of
λ-calculus to figure out why Church chose to call it η-reduction and not
γ-reduction as one might expect to follow β-reduction. I’d like to think he
tried many intermediate rules to complete the λ-calculus until he finally
settled on the one he named η-reduction. At least, that fictitious
explanation would capture the labour necessary for mathematical insight.
We apply this transformation to the inner (g g)
call, turning it into
(λ (v) ((g g) v))
. We can now rewrite the Y combinator as –
(define Y (λ (s) ((λ (f) (f f)) (λ (g) (s (λ (v) ((g g) v)))))))
Exercise:
Check out whether this way of specifying the Y combinator works in eager mode. Do you understand why?
Exercise:
Perform a similar η-substitution in the expression for the Turing combinator and see if you can get a version of it that works in eager Racket. Do you encounter any problems? Think through which expression you’ll need to substitute.