λ - the everything

If Alonzo Church were asked “how far does the computational rabbit hole go?”, he might have answered “It’s λ all the way through.”.

Alonzo Church, in the 1930s (yes, that’s getting to be a century ago!), and his students formulated what could be called the essence of anything computable. The insight they had was that given functions and function applications, anything that can be computed is expressible. Note that we aren’t talking about functions of numbers or strings or booleans or anything else. The only entities in Church’s world are functions, and the only thing you can do with them is to apply them to other things (which are … functions) … and he claimed that all of what’s computable is expressible using these! What’s known as the Church-Turing thesis is the proof that Church’s “λ calculus” is equivalent in its universality to Alan Turing’s “Turing machine”.

Take a moment to digest that large claim, for the λ construct in Scheme is directly based on that foundational insight. In studying programming languages, we repeatedly encounter this idea of “we can express X in terms of Y” and where we make larger sets of these “X”s for our convenience where a much smaller set of “Y”s will suffice for the job. This “extra”, we call “syntactic sugar”, which draws from Alan Perlis’ joke - “Syntactic sugar causes cancer of the semicolon.” – made in reference to unessential syntactic cruft often added on to programming languages by their creators without a taste for minimalism.

The process of taking an expression that’s rich in “syntactic sugar” and producing an equivalent expression using fewer “primitives” is something we’ll refer to as “desugaring”. Such a reduction is useful because it is usually easier to reason about properties of a program if we have to deal with fewer basic entities in it, than if we complicate it with … well, “syntactic sugar”. Since desugaring is a kind of translation, it could also be taken as a kind of “compilation”, but in case that scares you, don’t worry and just stick to “desugaring”.

The point of λs is that λs are at the end of the entire stack of sugarings. The buck stops with them.

So what’s λ?

In Scheme, a “λ expression” is an expression of the general form shown below –

(λ (..<list-of-var-names>..) <expr-using-vars-in-the-list>)

It represents a packaged computation that can be performed when given concrete values for the variables listed after λ in the above expression.

For example, (λ (x) (* x x)) is the function that squares numbers. When given a number 2, by substituting x with 2, you can compute the result of what the λ expression denotes – which is (* 2 2) = 4.

We’ll now look at the “stuff”, “structure” and “properties” that make up λ-calculus.

The stuff of λ calculus

We’ll use λ instead of lambda for brevity even when writing expressions in Racket.

The “stuff” of Church’s λ-calculus are called “λ-terms” - a.k.a. “λ functions” or simply “functions” if the context is not ambiguous.

<λ-term> => <identifier>
<λ-term> => (λ (<identifier>) <λ-term>)
<λ-term> => (<λ-term> <λ-term>)

We can use Racket’s symbols to represent “identifiers” and its lambda construct to represent the second form. The third is the usual Racket “applicative” form. We can therefore write a predicate that tests whether a particular form is a valid λ-term.

(define (λ-term? expr)
    (or (symbol? expr)
        (and (list? expr)
             (= (length expr) 2)
             (λ-term? (first expr))
             (λ-term? (second expr)))
        (and (list? expr)
             (= (length expr) 3)
             (list? (second expr))
             (= (length (second expr)) 1)
             (symbol? (first (second expr)))
             (λ-term? (third expr)))))

Let’s construct some example λ-terms to get a feel for this -

x
(x y)
(λ (a) a)
(λ (a) b)
(λ (a) (a b))
((λ (x) (x (λ (a) (a b)))) y)

Some observations -

  1. Notice that that’s all we have in this “mini language”. No numbers, booleans, strings, lists, symbols, vectors, structs, etc. Church’s claim is that all of those can be expressed as λ-functions.

  2. Our :rkt`λ-term?` predicate’s recursive nature fully captures the recursive term construction of lambda calculus.

  3. So far, they are only terms. We’ve called them “functions”, but there is nothing so far to warrant that.

We’ll permit ourselves a little syntactic extension for convenience - i.e. some syntactic sugar.

<λ-term> => (λ (<id1> <id2> ... <idN>) <λ-term>)
<λ-term> => (<λ-term> <λ-term> <λ-term> ... <λ-term>)

where we take those forms to mean the following –

(λ (<id1> <id2> ... <idN>) <λ-term>)
= (λ (<id1>) (λ (<id2>) (... (λ (<idN>) <λ-term>))...))

(<λ-term1> <λ-term2> <λ-term3> ... <λ-termN>)
= (...((<λ-term1> <λ-term2>) <λ-term3>) ... <λ-termN>)

Given this equivalence, we don’t have a conceptual problem when we encounter such extended terms.

Terminology: free variables

An identifier in a λ-term that is not mentioned as an argument in an enclosing (λ (<id>) ...) form is called a “free variable” in the context of that expression. We denote an expression which contains a free variable x as E[x] in the text below. Notice that we’re calling them “identifiers” but this term calls them “variables”. This difference is due to the mathematical origin of λ-calculus. In our context, we’ll continue to call them “identifiers” but understand that we mean the same thing by both words here. For example, in (λ (x) (x y)), y is a free variable while x is not.

Homework

Extend the λ-term? predicate to support these extended forms. Write a expand-λ procedure that will take a λ form containing such extended forms and translate them all into their equivalent core forms.

The structure and properties of λ-calculus

We’ve just defined some forms in the previous section without saying anything about what they might mean. There is only one structure to know about in λ-calculus - which is “application”, known more formally as “β reduction” - which lets us “calculate” with an expression through a simplification rule.

((λ (x) E[x]) <y>) =β=> E[<y>]

where E[x] refers to any λ-term that may refer to the identifier x in any number of places as a “free variable”. If we think of E[.] as having holes, then E[<y>] refers to pasting <y> into all those holes. Some simple examples -

((λ (x) x) y) =β=> y
((λ (x) (λ (y) y)) z) =β=> (λ (y) y)
((λ (x) (λ (y) x)) z) =β=> (λ (y) z)

This structure defines the notion of “calculating” with lambda terms.

  1. α-renaming: For any given expression with a hole E[.], all λ terms of the form (λ (<id>) E[<id>]) for any identifier <id> are considered to be equal to each other – meaning they can be substituted for one another. So (λ (x) x), (λ (y) y) etc all refer to “the same thing”.

  2. η-reduction: (λ (x) (F x)) = F where F does not contains x as a free variable. Since the only thing we can do with a λ term is to apply it to another one, it is easy to see that by β-reduction, both the LHS and RHS will result in the same expression with one β-reduction step if applied to any term. Therefore this equivalence abstracts over all such possible λ-terms by saying (forall x: (F x) = (G x)) => (F = G). It is difficult to prove the antecedent for arbitrary F and G, but in this special case it is easy and it is an equivalence worth establishing.

When we say “equal”, we mean that the RHS can substitute for the LHS and vice versa … anywhere. Note that it means that any interpretation of an expression must be consistent with all possible ways to rewrite the expression based on these equivalences. For example, ((λ (x) (λ (x) x)) y) β-reduces to (λ (x) x) and not to (λ (x) y), because that is the only interpretation that is consistent with the inner λ being rewritten using α-renaming to (λ (z) z) before performing the β-reduction. The “free variable” conditions for these equivalences help ensure that.

Similarly, ((λ (x) (λ (y) (x y))) y) β-reduces to (λ (z) (y z)) (or any of its α-renamed equals) and not to (λ (y) (y y)), as a naive substitution might suggest.

A first useful step to working with a λ term therefore is to perform α-renaming so that all identifiers “introduced” within (λ (<id>) ...) forms are α-renamed using unique identifier names so as to avoid any confusion during reducing substitutions.

β-abstraction

We defined the step of β-reduction above as an LHS to RHS rewrite. If we rewrite the other way (since they are equal), we call it “β-abstraction”.

This is a central topic in this course as much of what we learn about useful concepts developed for real world programming languages can be understood using β-abstraction steps. To put it another way, just as Church claims that all computations can be expressed as manipulations of λ-terms using these rules, all abstractions in programming are in essence β-abstractions.

Recap

α-renaming

If (λ (x) E[x]) is a λ expression in one argument where E[x] denotes some expression involving the variable x, then you can change the name of x to anything else and what the expression denotes is considered to be the same. i.e. You can rewrite it to (λ (y) E[y]) (where we use the same E) and it means the same thing … as long as E doesn’t already use y as a free variable.

β-reduction

If you have an expression of the form ((λ (x) E1[x]) E2) where E1[x] is an expression that (optionally) uses the variable x and E2 is some other expression, then it is equivalent to E1[E2]. i.e.

((λ (x) E1[x]) E2) => E1[E2]

It is important to note that when we say “is equivalent to”, it means you can rewrite a sub-expression that looks like one side to the other form anywhere. We refer to the above left-to-right rewrite as “β-reduction” and the corresponding right-to-left rewrite as “β-abstraction”.

Just as λs offer a conceptual basis of all of computation, all abstraction in computing can be seen through β-abstraction.

Note

β-reduction” can be considered a fancy term for “substitution”, for that’s what it is. We don’t have a corresponding simple term for the opposite transformation though. So we’ll continue to call it “β-abstraction”. We’ll refer to the transformation E1[E2] => ((λ (x) E1[x]) E2) as “β-abstracting over E2”. In most cases, when we’re performing such a transformation, we’re no longer really interested in the E2 and will usually focus on the preceding (λ (x) E1[x]) and loosely talk about that as the “β-abstracted form”.

Warning

When performing a β-reduction step in Scheme, you need to be careful not to substitute symbols within a quote sub-expression. For example, ((λ (x) (quote (+ x x))) 3) reduces to the list '(+ x x) whereas ((λ (x) (+ x x)) 3) reduces to (+ 3 3) = 6.

Take the expression ((λ (x) (* x ((λ (x) (- x 1)) x))) 10) and try to apply the reduction rules. If you took the “β-reduction” rule in the naive way, you might end up with (* 10 ((λ (10) (- 10 1)) 10)) and then scratch your head about what you have at hand and what to do with it next! To do this correctly, you must see that the original expression is the same as ((λ (x) (* x ((λ (y) (- y 1)) x))) 10) .. where we’ve “α-renamed” the inner λ’s x variable to y, because, well they’re supposed to be equivalent right? If you now do β-reduction on this equivalent expression, you won’t be left with the confused expression.

Also, a β-abstraction step must be performed in a manner that is compatible with α-renaming. This means that abstracting over (+ x x) in ((λ (x) (> (+ x x) 25)) 42) does not get you ((λ (y) ((λ (x) (> y 25)) 42)) (+ x x)). This is because in the original λ expression, you can replace the x with any other symbol and the meaning of the expression remains unchanged. However, in the wrongly β-abstracted form, the expression itself doesn’t make sense as x in the outermost position is actually undefined. So the thumbrule here is – “you can’t pull a bound variable outside of its introduction point”. The correct way to β-abstract over that expression is to first perform the abstraction within the λ to get ((λ (x) (> ((λ (y) (+ y y)) x) 25)) 42) and then β-abstract the inner λ to get ((λ (f) ((λ (x) (> (f x) 25)) 42)) (λ (y) (+ y y))). Now you can see that the expression (λ (f) ((λ (x) (> (f x) 25)) 42)) is independent of the addition calculation in (+ x x) and therefore we’ve generalized the expression over the (+ x x) calculation.

So the two rules are taken to be always applicable in evaluating an expression and all correct applications of the rules must evaluate to the same result no matter the sequence in which they’re applied.

Tall claims need taller evidence

Back to Church, what he made was a tall claim – that all computable functions are expressible in terms of λs. When we make such a claim, we have to back it up though. To recap, he’s saying that you don’t need cons, car, cdr, if, let, cond, booleans or numbers or strings or whatever we’re used to in normal programming. He claimed that all of these are representable using λs alone .. and showed how to do it.

We’ll now work through how to represent basic things in terms of which we can build a whole computational edifice.

Let’s look first at how to represent boolean values #t and #f.

Booleans

To tackle that, we need to be clear about what exactly is a boolean. What do we use booleans for and where do we use them? Yes we may combine booleans using logical operators to compute a boolean result, but what would we then use that boolean result for? We might print it out to the console. But then we’re using that printed out boolean in our minds for something.

That something is a branch. The only place a boolean is useful to make a choice about which of two expressions to use, by using it in the conditional slot of an (if <condition> <then-expr> <else-expr>) expression.

For the moment, we’re considering a language where there is no notion of printing or state or mutation. We’ll also constrain if to accept only boolean values in the <condition> position instead of Racket’s generalized booleans. So the if expression obeys the following equivalences –

(if #t <expr1> <expr2>) = <expr1>
(if #f <expr1> <expr2>) = <expr2>

Using our multiple argument equivalences -

(if #t <expr1> <expr2>) = ((if #t) <expr1> <expr2>) = <expr1>
(if #f <expr1> <expr2>) = ((if #f) <expr1> <expr2>) = <expr2>

It looks like #t is functionally then equivalent to (if #t) for all practical purposes and likewise #f is equivalent to (if #f). We can therefore define our booleans like so -

(define true (λ (a b) a))
(define false (λ (a b) b))
(define if (λ (c) c))
; or equivalently (define (if c a b) (c a b))

Take moment to think about this

Our “booleans” are being represented as functions that perform the branching for which we need them. The if construct in our language then is simply the identity function!

Usually, if the condition happens to be true or false, we only want one of the two branches to be evaluated, whereas in this case, it looks like both branches will be evaluated and one of the values will be discarded.

While the selective evaluation is how Racket works due to it being a “larger” programming language in the Guy Steele sense, it makes no difference in our minimal λ language so far which way we choose, because we’re so far only dealing with mathematical equivalents that we calculate using the substitution rule (i.e. β-reduction).

So booleans are down for now.

Pairs

Pairs are the simplest of data structures. Once you can make a pair of two things like (pair a b) or the equivalent in a programming language, you can get lists using –

(pair a (pair b (pair c ... (pair x sentinel))) ...)

where we use a sentinel to indicate end of the list. You can also make trees using nested structures like –

(pair (pair a b) (pair (pair c d) (pair e f)))

Or tables as a list of lists. Or even graphs. So if we can show we can represent pairs using just λ, we’re good with the other structures.

(define pair (λ (x y) ...))

What should we put within the ...? In fact, what can we put in there when all we have are functions (i.e. λ expressions)? So we’re now looking at –

(define pair (λ (x y) (λ (p) ...)))

Again, what can we put in there? We have a p and some two arbitrary values x and y that we’re expected to “store” in the pair. About the only thing we can do (apart from nesting λ once more, which would seem pointless) is to apply p to the x and y.

(define pair (λ (x y) (λ (p) (p x y))))

We can now make “pairs” like below –

(define p1 (pair 12 100))

(taking the liberty to use numbers just to illustrate). Since p1 is a function that takes one argument, the only thing we can do with it is call it. Since its argument is also a function that’s applied to 2 arguments, let’s consider some simple 2-argument functions shown below –

(define .first (λ (x y) x))

and

(define .second (λ (x y) y))

The functions ignore one of their arguments and just evaluate to the other. Now what happens when you apply p1 to these two functions.

(p1 .first)
=> ((pair 12 100) (λ (x y) x)) ; substitute their definitions
=> (((λ (x y) (λ (p) (p x y))) 12 100) (λ (x y) x))
; β-reduce the first term
=> ((λ (p) (p 12 100)) (λ (x y) x))
; β-reduce the expression again
=> ((λ (x y) x) 12 100)
; β-reduce the expression again
=> 12

Exercise

Work it out similarly and show that (p1 .second) results in 100.

So we have a function now named pair that can make so-called “pair objects” and we can get the individual values out of the pair object using the “accessor” functions .first and .second.

Many of you are familiar with “object oriented languages” like Python and will see the reasoning behind naming the accessor functions that way .. since the expression (p1 .first) looks very similar to p1.first typical of such languages.

Note

The ones with a careful eye might’ve noticed that while we claimed to only use λs, we ended up using define in the above definitions. We use it here only as a substitute for writing the mathematical definitional equality \(pair = (λ\ (x\ y)\ (λ\ (p)\ (p\ x\ y)))\) and because it actually permits you to type it into Racket and check things out for yourself. We therefore lose no generality by using define in the above code. Also, λ calculus deals only with one-argument functions and we’ve used two here. However, \((λ\ (x\ y)\ E[x,y])\) can be mechanically rewritten to \((λ\ (x)\ (λ\ (y)\ E[x,y]))\) with corresponding changes to substitution steps without loss of logical correctness. So we’ll take that additional liberty here too.

let

It is quite easy to see that we can rewrite let expressions using λ.

(let ([var1 expr1]
      [var2 expr2]
      ...
      [varn exprn])
  <body-using-var1..n>)

Can be rewritten as –

((λ (var1 var2 ... varn)
    <body-using-var1..n>)
 expr1 expr2 ... exprn)

So let is just “syntactic sugar” on top of λ - i.e. is for our convenience without offering additional “expressive power”. These notions will become clearer (and more formal) as we go along. For now, if you have a sense of what they are, that’s sufficient.

Numbers

Numbers are a big one to claim to be representable using λ alone! Numbers (i.e. basic arithmetic with whole numbers) hold a “threshold” place in mathematical logic too – that every “formal system” [1] is representable using numbers.

In λ calculus, all we have are functions and function application. What can we apply functions to? The answer to that question is also “functions”! So how can we capture the idea of natural numbers using functions alone?

Given a function, what can we do with it? We can apply it to some value. What kind of a value can we apply it to (at least within λ calculus)? We can apply it to another function. So Alonzo Church came up with a representation for numbers as the idea of applying a function a certain number of times.

If we consider applying a function f to a value x a number of times, we could write that sequence as –

x                     ; 0
(f x)                 ; 1
(f (f x))             ; 2
(f (f (f x)))         ; 3
;... and so on

But we don’t know what these f and x are. The only thing we know about them is that the function f must have the property that its domain and co-domain are the same. The nice thing here is that you can “β-abstraction” on the two in order to postpone the problem of what values we want them to take on. So instead of the above, we can consider the sequence below as a representation of numbers –

(λ (f) (λ (x) x))               ; 0
(λ (f) (λ (x) (f x)))           ; 1
(λ (f) (λ (x) (f (f x))))       ; 2
(λ (f) (λ (x) (f (f (f x)))))   ; 3
;... and so on

Observe by reading the λ expression for each “number” that a Church numeral n stands for the idea of “n applications of f on x” given some f and x.

We can’t exhaustively list all such numbers. Even if we could, that wouldn’t capture the structure inherent in the numbers that’s laid out in Peano’s axioms -

  1. “Zero” is a number

  2. Every number has a “successor”.

Let’s now try to apply Peano’s axioms to capture the idea of successorship for Church numerals.

(define ch-zero (λ (f) (λ (x) x))

(define ch-succ (λ (n) ...))

How should we now define ch-succ? Before we get there, let’s pull in some preparatory functions that we encountered before –

(define pair (λ (x y) (λ (p) (p x y))))
(define .first (λ (x y) x))
(define .second (λ (x y) y))
(define swap (λ (p) (pair (p .second) (p .first))))

; The function composition operation .. as a function
(define comp (λ (f g) (λ (x) (f (g x)))))

Note

Try to define ch-succ yourself before reading on, for you have spoilers below.

Let’s write out in words what the expression (ch-succ n) for some specific Church numeral n is supposed to mean – “n+1 applications of some function f on an x”. In other words, if we have “n applications of some function f on an x”, we need to apply f once more on that to get “n+1 applications of some function f on an x”.

To make things concrete, let’s look at the definition for “3” and see if we can express it in terms of our definition for “2”.

(define ch-two (λ (f) (λ (x) (f (f x)))))
(define ch-three (λ (f) (λ (x) (f (f (f x))))))

; See that the expression (f (f x)) is ((ch-two f) x)
; Replacing the inner most (f (f x)) in ch-three with ((ch-two f) x)
(define ch-three (λ (f) (λ (x) (f ((ch-two f) x)))))

It’s not hard to see now that we could do that for any pair of \((n,n+1)\).

(define ch-nplus1 (λ (f) (λ (x) (f ((ch-n f) x)))))

What we want for our ch-succ function is for the relation “(ch-succ ch-n) = ch-nplus1 to hold. So if we β-abstract over ch-n, we get –

(define ch-nplus1 ((λ (n) (λ (f) (λ (x) (f ((n f) x))))) ch-n))

; Then due to the equality which we just stated above, we have
(define ch-succ (λ (n) (λ (f) (λ (x) (f ((n f) x))))))

; We can simplify it further though. Notice that
; (λ (x) (f ((n f) x)))
; is just the function composition of f and (n f).
; i.e. (λ (f) (comp f (n f))) = (λ (f) (λ (x) (f ((n f) x))))
; Therefore we can also write -
(define ch-succ (λ (n) (λ (f) (comp f (n f)))))

I hope it is much easier to read the last definition as “n applications of f followed by one more” (reading the function composition from right-to-left).

We’ll take a break here and define two utility functions outside of Church’s λ calculus that will help us make Church numerals and display them in notation we understand - i.e. as decimal numbers.

(define (i->ch i)
   (if (equal? i 0)
       ch-zero
       (ch-succ (i->ch (sub1 i)))))

(define (ch->i n)
    ((n add1) 0))

We can now use i->ch to make Church numerals given Scheme numbers and ch->i to make Scheme numbers given Church numerals.

Ok how about adding two Church numerals? Again, try to figure it out yourself before reading on.

(define ch-add (λ (m n) ...))

Given an n (a Church numeral), we can express the idea of “m+n” as “m applications of ch-succ on n”. This translates easily enough to a λ expression like below –

(define ch-add (λ (m n) ((m ch-succ) n)))

Let’s up the game now. How do we implement multiplication of Church numerals? i.e. a two argument function ch-mul used as (ch-mul m n).

If (n f) (for a given f) yields n applications of f, then we need to do this m times. That’s an easy enough expression too.

(define ch-mul (λ (m n) (λ (f) (m (n f)))))

However, the inner part of that (λ (f) (m (n f))) looks very familiar doesn’t it? It is simple (comp m n). So we have.

(define ch-mul (λ (m n) (comp m n)))

Or to put it even more simply, (define ch-mul comp)!! i.e. the multiplication operation for Church numerals is simply the function composition operation!

I’ve been avoiding a problem so far though – how would we do subtraction? To do that, we’ll need to implement (ch-pred n) which behaves such that (ch-succ (ch-pred n)) = n. Since we don’t have the capability to check for equality yet, we cannot search the natural numbers starting from ch-zero and work our way upwards until we find a value k such that (ch-succ k) = n. We also don’t know how to compute the “inverse of a given function f” in the general case, so we can apply the inverse after n applications.

This problem apparently stumped Church too. However, his student Stephen Kleene came up with a solution to it. His solution was to use pairs of Church numerals in a particular sequence - the first number in the sequence is \((0,0)\) and if an entry is \((m,n)\), the next entry in the sequence is \((n,n+1)\). This gives us the following sequence –

(0,0)   ; 0
(0,1)   ; 1
(1,2)   ; 2
(2,3)   ; 3
(3,4)   ; 4
...

In the above sequence, the first value of the pair gives the predecessor of the second value which is the same as the row number. The only irksome bit in this that we have to put up with is that we have to assume that “the predecessor of 0 is 0”.

So if we define k-zero as (define k-zero (pair ch-zero ch-zero)) and (define k-succ (λ (kp) (pair (kp .second) (ch-succ (kp .second))))), we can produce the sequence through repeated applications of k-succ on k-zero. That’s a concept we already understand. So to produce the row corresponding to number n, we need to do ((n k-succ) k-zero). Thereafter, all that remains is to pick the first value of the pair to get the predecessor of n. So …

(define k-zero (pair ch-zero ch-zero))
(define k-succ (λ (kp) (pair (kp .second) (ch-succ (kp .second)))))
(define ch-pred (λ (n) (((n k-succ) k-zero) .first)))

Exercise:

Define (ch-sub m n) for \(m >= n\) using ch-pred.

Exercise:

Can you come up with a representation for integers? – i.e. numbers that can be positive or negative or zero. You’ll also have to implement the corresponding addition, subtraction, multiplication and division operators. You can throw in a “negation” too.

Interlude on β-abstraction

You’ve seen above how useful β-abstraction turns out to be when exploring representations that we do not initially fully understand. We were able to postpone specific choices of functions until we understood things better, we could transform expressions to extract common patterns, etc. As mentioned earlier, all abstractions boil down to β-abstractions at the end. This means you can use β-abstraction to great effect when when working with domains that you’re just about beginning to understand. That’s useful even if you are not using a functional programming language, because once you construct those abstractions, it is usually a mechanical matter to translate them into other languages that may not be functional. How can we be sure of that? That’s what this whole section is about – that λ is enough to represent all of computation, so any general purpose language (i.e. “Turing complete language”) can be understood in terms of it.

The key to exploiting β-abstraction is practice.

Recursion

Recursion underlies all repetition in Scheme – in the sense that you can express any looping construct using recursion. However, we have a problem at this point. We typically define a recursive function such as sqrt/rec using define like this –

(define sqrt/rec
   (λ (n xk eps)
      (if (< (abs (- (* xk xk) n)) eps)
          xk
          (sqrt/rec n (* 0.5 (+ xk (/ n xk))) eps))))

Note

We’ll use the sqrt/rec function to illustrate, but whatever we’re doing with that we can also do to any other recursive function definition you may want to solve. I picked this ‘cos I gave this function to you to practice recursion.

Scheme works with this definition just fine, but that’s because it already provides a mechanism for you to assume the existence of the inner sqrt/rec function when typing to evaluate a particular call. Somehow, the repeated unfolding of the code is avoided by using names to tie the function’s structure to itself. We don’t have that concept in λ calculus and so will need to show that we can do this without such a naming+delayed-binding trick.

So, for our purposes, we do not know what function to use to effect the recursive call within the body of the above sqrt/rec definition.

By now, you should’ve already guessed what we’re going to do when we’re faced with an unknown like this. Yup - we’ll β-abstract over sqrt/rec!

(define cheat
   (λ (f)
      (λ (n xk eps)
         (if (< (abs (- (* xk xk) n)) eps)
             xk
             (f n (* 0.5 (+ xk (/ n xk))) eps)))))

Now, we can see how sqrt/rec = (cheat sqrt/rec), provided we know sqrt/rec already (hence the name “cheat”). To find out sqrt/rec given cheat, we need to “solve” the above equation. Because applying cheat to sqrt/rec produces the same function, sqrt/rec is called the “fixed point” of cheat. In mathematics, a fixed point of a function \(f(x)\) is a value \(x\) such that \(x = f(x)\).

However, our cheat function is not of much use though it captures the essentials of the algorithm. We called it “cheat” because to get the sqrt/rec function from it, you have to pass it to it in the first place, which seems to defeat the point. What we really want is for the whole machinery of the (λ (n xk eps) ...) part of cheat to be available in place of f when we’re calling it. Since cheat is fully defined (we do not refer to it recursively), what if we could just pass it to itself as an argument (bound to f)?

Another way to ask that question is “what if we just had an extra argument to sqrt function and we just passed sqrt itself in its place – like this –

(define sqrt/norec
    (λ (f n xk eps)
       (if (< (abs (- (* xk xk) n)) eps)
           xk
           (f f n xk eps))))

So you can calculate square-roots using (sqrt/norec sqrt/norec 64 64 0.1). This actually lets us do recursive function calls without using a recursive definition! However, it is somewhat awkward to pass this additional argument all the time. Let’s see how we can improve it. First, we can lift that f argument out so we can “Curry” it like this –

(define good
    (λ (f)
       (λ (n xk eps)
          (if (< (abs (- (* xk xk) n)) eps)
              xk
              ((f f) n xk eps)))))

… and we can now do our square-root algorithm using good like this –

((good good) 64 64 0.1)
; Prints out 8.005147977880979
; which is an approximate square root indeed.

Now, you see that sqrt/rec = (good good) .. which is … good as we have an explicit function that behaves exactly as our original recursive definition … without any extra arguments.

To summarize, we’ve now figured out a trick by which we can turn a recursively defined function into one that isn’t recursive but can effectively accomplish the same result.

(define some-function/rec (λ (a) ... (some-function/rec next-arg) ...))
; can be transformed into
(define some-function/norec (λ (f) (λ (a) ... ((f f) next-arg) ...)))
; .. so that some-function/rec can now be defined in terms of
; some-function/norec as --
(define some-function/rec (some-function/norec some-function/norec))

; Note that the number of sites at which the recursive call happens
; does not matter. We replace all of them with (f f).

The journey isn’t finished yet

We’ve now shown that you can express recursive calls using λ alone. Mission accomplished! However, don’t forget our larger claim that anything computable can be expressed using λ. In this case, what we just saw is how we can start with a recursively defined function (given as a spec similar to cheat) and mechanically transform it into the true recursive function. If we’ve truly “mechanized” it, then we should be able to express that transformation as a function, right?

Though we called our original funciton “cheat”, we’re being a bit unfair to it, because it serves as a specification for how the recursion is to proceed. It captures all the details of the algorithm we intended to write down, except for exactly which function to use to recurse. Furthermore, our desired sqrt/rec is a fixed point of this function, which is simple enough to write. So we can now ask – “If I give you such a cheat function, can you calculate sqrt/rec mechanically?”

We can also see that (good f) = (cheat (f f)) through simple β-reduction. In fact, we got (good good) = (cheat (good good)) from that in the first place.

We can now rewrite that way of stating good as –

(define good (λ (f) (cheat (f f))))

And we can now express our desired sqrt/rec function as just –

(define sqrt/rec (good good))

If we then β-abstract on good, we get –

(define sqrt/rec ((λ (f) (f f)) good))
; =>
(define sqrt/rec ((λ (f) (f f)) (λ (g) (cheat (g g)))))
; => β-abstract on "cheat" =>
(define sqrt/rec ((λ (s) ((λ (f) (f f)) (λ (g) (s (g g))))) cheat))

So, we actually now have a function that we can apply to our easy-to-define “spec” function in order to get our recursive result! This function that we’ve figured out above is called the “Y combinator”.

(define Y (λ (s) ((λ (f) (f f)) (λ (g) (s (g g))))))

(We’re using s as the variable name to suggest “spec function” for cheat.)

A way the Y combinator is usually presented is with one β-reduction applied which gives us a nice symmetric form –

(define Y (λ (s) ((λ (g) (s (g g))) (λ (g) (s (g g))))))

And we have (Y cheat) = (cheat (Y cheat)). This is why the Y combinator is said to be a “fixed point combinator” because it calculates the fixed point of the given function. So all you have to do now is to express your recursive function using an “unknown f” and then have the Y-combinator figure out what f to pass to it.

Can we just solve for the combinator?

While we originally tried to solve for sqrt/rec given the equation sqrt/rec = (cheat sqrt/rec), we turned the problem into finding a function of cheat that can produce sqrt/rec. i.e. we were actually looking for a function F such that -

(F cheat) = (cheat (F cheat))
; i.e.
(define F (λ (cheat) (cheat (F cheat))))

Now we have a recursive “solution” for F. If we now apply the same technique/trick that we used to turn cheat into a non-recursive function good, we have –

(define G (λ (f) (λ (cheat) (cheat ((f f) cheat)))))
; and
(define F (G G))

Now, let’s look at the full expression for (G G) with less judgemental variable names –

(define F ((λ (f) (λ (s) (s ((f f) s)))) (λ (f) (λ (s) (s ((f f) s))))))

(We’re again using the variable named s to denote the “spec function” cheat.)

This looks like a different function compared to Y we figured out earlier that also has the property (F s) = (s (F s)) just like (Y s) = s (Y s). The difference between the two is this – since we only used β-abstraction to come up with F, we can see how evaluating (F s) simply β-reduces to (s (F s)), whereas with the Y combinator, (Y s) and (s (Y s)) give us the same expression. F is therefore a valid combinator in its own right and is called the “Turing combinator”, usually denoted by \(\Theta\).

Can we not be lazy please?

We’ve so far been using the #lang lazy for all the above work on recursion. If you want to, you can try to see if the Y combinator as defined above will work with eager evaluation by switching the language to #lang racket. You’ll find that you’ll get a stack overflow as Y tries to repeatedly expand itself without stopping. The benefit of laziness for the definition of Y is that the expansion only happens when it is needed, i.e. in the part of the spec function that actually makes a recursive call. When the termination condition is hit, no further expansion of Y is needed and the recursion stops.

We can achieve the same effect in the eager evaluation mode by wrapping the expansion in another λ. To do this, we need to see that (λ (x) (f x)) = f for a function f whose expression does not make use of the outer variable x – i.e. it does not contain x as a “free variable”, with “free” meaning “unbound”.

Note

The transformation (λ (x) (f x)) => f when f does not contain x as a free variable is called η-reduction (“eta-reduction”). This transformation can be done both ways. I haven’t traced the history of λ-calculus to figure out why Church chose to call it η-reduction and not γ-reduction as one might expect to follow β-reduction. I’d like to think he tried many intermediate rules to complete the λ-calculus until he finally settled on the one he named η-reduction. At least, that fictitious explanation would capture the labour necessary for mathematical insight.

We apply this transformation to the inner (g g) call, turning it into (λ (v) ((g g) v)). We can now rewrite the Y combinator as –

(define Y (λ (s) ((λ (f) (f f)) (λ (g) (s (λ (v) ((g g) v)))))))

Exercise:

Check out whether this way of specifying the Y combinator works in eager mode. Do you understand why?

Exercise:

Perform a similar η-substitution in the expression for the Turing combinator and see if you can get a version of it that works in eager Racket. Do you encounter any problems? Think through which expression you’ll need to substitute.