Functions in PicLang

Having seen how to implement proper lexical scoping in our stack-machine in the section Stacks and scope, we’re now well placed to do that within our expression interpreter for PicLang.

We’re going to need corresponding notions of “blocks” in this language and the notion of “do” as well. We’ll also have to have our interpreter keep track of bindings as we go along, so we can eliminate the whole substitution process and replace it with a single pass interpreter that does effectively the same (and correctly).

We’ll call the equivalent of a “block” a FunC in PicLang, defined as below –

(struct FunC (argname expr))

We’ll call the equivalent of the do instruction ApplyC here.

(struct (ApplyC (funexpr valexpr)))

An important thing to note is that we’re now considering FunC and ApplyC terms to be valid PicExprC terms. While we can expect ApplyC to produce pictures, evaluating FunC clearly does not produce a picture. Just as the (block (def x) ...) instruction produces a Block object on the stack in our stack-machine, we’ll take a FunC expression to produce a … “function value”.

This means we need to expand the notion of what our PicLang interpret function can return to include such function values. Incidentally, we’ve now expanded the scope of what ApplyC can produce to include function values too. In other words, we’ve in two strokes admitted first class functions into our language!

We’ll now formalize the return type of interp using a type consisting of a set of “value terms” - denoted by the V suffix.

(struct PicV (pic))
(struct FunV (argname bindings expr))

No more subst

Next comes a very important change to the interpreter. The substitution process walks the expression tree once and then our interpreter does it again. These two are also tied by the fact that they both use the same expression tree structure (i.e. the “abstract syntax tree”).

The only job of the subst function was to replace the IdC terms with their corresponding values. So if we’re keeping track of “current bindings” as we’re evaluating a (sub) expression, then we no longer need a separate subst function. Our interpreter can do that job on its own, since it can lookup the value that is expected to go in the (IdC sym) term’s place and treat that as the result of “interpreting” that term. This would effectively perform a “substitution”.

Interpreter with bindings

What we’ve been loosely calling a “set of bindings” where “binding” refers to a name-value pair association has a name – the “environment”. We need some practice to translate “environment” in our minds to a data structure that associates some meaning (i.e. value) with a set of symbols. So we’ll continue to use both until that settles.

Note

Personally, I find “environment” a little too broad though it is a standard term used in this context within interpreters.

Here is the modified interpreter …

; bindings takes on the same list of two-element lists
; structure we used in stack-machine.
(define (interp picexprC bindings)
    (match picexprC
        ; ...
        [(FunC argname expr)
         ; Store away the definition time bindings
         ; along with the FunV value structure.
         ;             v------ One more field compared to FunC
         (FunV argname bindings expr)]
        [(ApplyC funexpr valexpr)
         (let ([fun (interp funexpr bindings)]
               [val (interp valexpr bindings)])
            (match fun
                [(FunV argname definition-time-bindings expr)
                 ; Now to apply the function, we add a new
                 ; binding for its argument name and evaluate
                 ; the body.
                 (interp expr (extend-bindings argname val definition-time-bindings))]
                [_ (raise-argument-error 'interp
                                         "Function value for application"
                                         funexpr)]))]
        [(IdC id)
         (lookup-binding bindings id)]
        ; ...
        ))

We’ve chosen to keep the “bindings” structure abstract by referring to the two things we need to do with it - “name lookup” and “extension” separate, so we can defer the actual choice of structure.

Exercise

Write appropriate test expressions for this revised interpreter to check whether the scoping behaviour is indeed lexical. Use any simple representation you want to for the “bindings” parameter by implementing the extend-bindings, lookup-binding and make-empty-bindings functions.

Question

Consider the way we’re evaluating FunC terms in the interpreter. We store the current state of the bindings argument in the resulting FunV structure .. which is one of the possible result values of our interpreter. Do we need to store all the bindings in the FunV structure? What are the consequences of storing all the bindings? Can we trim it down? If we can, how should we determine the set of bindings to trim down to?

A standard library

Notice that we’ve gotten rid of the fundefs argument which was a list of FunDefC terms. Technically, we do not need it any more, as we can pass any functions as arguments to an ApplyC term and refer to it within the value expression of the function term via the argument name. However, doing this for even 20 functions is cumbersome.

Note

What exactly is cumbersome about it? Is it simply that we have to nest many ApplyC terms?

From a user interface perspective, all we need to do is to provide a starter set of bindings to the interpreter that expressions can use. These bindings must be mappings from symbols to FunV terms. However, note that our FunV terms also have their own list of bindings. The question then is “what should these bindings lists be?”.

We’re calling this initial set of bindings a “standard library” of functions that may be useful when writing PicLang expressions. This also means that this standard library is also available to the standard library functions themselves. So one way to resolve this conundrum is to make these FunV values given the standard library! The snake must eat its own tail!

; definitions is a list of bindings where a "binding" is given
; as a list of two values - the first being the symbol to be bound
; and the second being the :rkt:`FunC` expression to bind to it.
(define (make-standard-library definitions)
    (if (empty? definitions)
        (make-empty-bindings)
        (let ([def (first definitions)])
            (extend-bindings
                (first def)
                (interp (second def) (make-standard-library definitions))
                (make-standard-library (rest definitions))))))

There, in all its naïvette, is a fully self referential expression of what we actually want to produce. But we kind of know how to deal with such self reference. Let’s rewrite the above self referential function.

(define (make-standard-library definitions stdlib)
    (if (empty? definitions)
        (make-empty-bindings)
        (let ([def (first definitions)])
            (extend-bindings
                (first def)
                (interp (second def) stdlib)
                (make-standard-library (rest definitions) stdlib)))))

Here we’re assuming that our function will be passed the result stdlib we’re constructing and it makes use of this to construct its result!! That’s something we did earlier as well in Lambda - the everything. We want to solve for the fixed point in the following “equation”, given a list of definitions -

stdlib = (make-standard-library definitions stdlib)

If we now rewrite the RHS –

spec = (λ (stdlib) (make-standard-library definitions stdlib))
stdlib = (spec stdlib)

; Using the Turing combinator approach ...
(define (Θ spec) (spec (Θ spec)))
stdlib = (Θ spec)

We can now apply our “function is called with itself to enable the function to call itself” trick to get -

(define G (λ (f) (λ (spec) (spec (λ (g) (((f f) spec) g))))))
(define Θ (G G))
(define stdlib (Θ spec))

We’ve used the Turing combinator here to capture the process of computing such a “fixed point”. Note that we’ve modified the trick above so it would work with eager evaluation strategy instead of the original #lang lazy choice. The expression (λ (g) (((f f) spec) g)) is logically equivalent to ((f f) spec) by “η-reduction”, but helps delay the evaluation of the recursive parts by enough so we don’t get stuck in an infinite loop.

But this is still pending a specific representation for “bindings”. Lets do a simple one – where the idea is that we capture the meaning of “an environment can be used to lookup a value given a name” as a lambda function –

(define (make-empty-bindings)
    (λ (name)
       (raise-argument-error 'lookup-binding
                             "Valid name"
                             name)))

(define (lookup-binding name bindings)
    (bindings name))

(define (extend-bindings name value bindings)
    (λ (n)
        (if (equal? n name)
            value
            (lookup-binding n bindings))))

Exercise

Check that the above way of defining the standard library works. Add functions to print out the various intermediate structures to see how this actually works.

Exercise

Reflect on the efficiency of this approach to making the “standard library” of definitions. Can we do something simpler if we allow ourselves some new feature of the language (Racket, not PicLang) that we haven’t used so far?

Question

An extension to the question in Stacks and scope – we got an additional super power apart from ordinary functions with the approach to FunC and ApplyC and IdC above. Can you recognize it? You’re so familiar with it by now it probably slipped past you without your notice.

A superpower

When we interpret a (FunC argname fexpr) term within the interpreter and produce a (FunV argname bindings fexpr) value, we’ve also captured the current set of bindings that can give meaning to IdC references within the fexpr part of the FunC term and stored it away for future reference – for use when we’re applying the function to a value in an ApplyC term.

This FunV structure is called a “closure”. In the Scheme/LiSP family, we usually don’t make it a point to distinguish between “functions” and “closures” and freely interchange the two terms. There is no meaningful distinction in programs since where functions are acceptable, closures can be passed and vice versa. However, while this is true in the LiSP family languages, many languages like Objective-C and C++ do differentiate between the two, with the idea that the programmer can choose what they want according to the performance criteria they need to meet.

For example, you already understand what the following Racket code is expected to do .. and what we’ve done is to model how we do it so our understanding of the scope implications of making closures is now complete.

(define adder (lambda (y)
                    (lambda (x) (+ x y))))
(define three-more-than (adder 3))
(define ten-more-than (adder 10))
(display (three-more-than 8))      ; Prints 11
(display (ten-more-than 8))        ; Prints 18

Closures are a powerful feature in any language that provides them. We’ve already seen this in the work we did with Church’s lambda calculus – that anything computable can be expressed using λ. Looking back, it is both a surprise and not a surprise that “mainstream” languages put off adopting closures for a long time under the perception that they’re inefficient to implement, when the lisp family has had them from (pretty much) day one. There is some truth to the “inefficiency” myth in the early days, but on today’s machines and with the heavy use of interpreted languages like Python and PHP, the inefficiency argument no longer holds ground. In particular, compilers for Common LiSP and Scheme have long implemented proper lexical closures resulting in programs more efficient than the interpreted ones.

Where does this myth of inefficiency come from?

A part of it you can see in the way we made the FunV object. We just stored away the entire definition environment in the object, without any consideration for whether the function expression actually uses any values from it. One “optimization” here is to scan the function expression, collect the set of references that are bound in the definition environment, and make a new bindings field that only keeps the used ones. Here, we’re trading off the computational cost of constructing a closure against the memory cost of storing unnecessary bindings. Such optimizations are not essential for us to consider in this course as they don’t (and indeed must not) change the meanings of our programs.