Functions in PicLang ==================== Having seen how to implement proper lexical scoping in our :rkt:`stack-machine` in the section :doc:`stacks-and-scope`, we're now well placed to do that within our expression interpreter for PicLang. We're going to need corresponding notions of "blocks" in this language and the notion of "do" as well. We'll also have to have our interpreter keep track of bindings as we go along, so we can eliminate the whole substitution process and replace it with a single pass interpreter that does effectively the same (and correctly). We'll call the equivalent of a "block" a :rkt:`FunC` in PicLang, defined as below -- .. index:: FunC, ApplyC .. code-block:: racket (struct FunC (argname expr)) We'll call the equivalent of the :rkt:`do` instruction :rkt:`ApplyC` here. .. code-block:: racket (struct (ApplyC (funexpr valexpr))) An important thing to note is that we're now considering :rkt:`FunC` and :rkt:`ApplyC` terms to be valid :rkt:`PicExprC` terms. While we can expect :rkt:`ApplyC` to produce pictures, evaluating :rkt:`FunC` clearly does not produce a picture. Just as the :rkt:`(block (def x) ...)` instruction produces a :rkt:`Block` object on the stack in our :rkt:`stack-machine`, we'll take a :rkt:`FunC` expression to produce a ... "function value". This means we need to expand the notion of what our PicLang :rkt:`interpret` function can return to include such function values. Incidentally, we've now expanded the scope of what :rkt:`ApplyC` can produce to include function values too. In other words, we've in two strokes admitted first class functions into our language! We'll now formalize the return type of :rkt:`interp` using a type consisting of a set of "value terms" - denoted by the :rkt:`V` suffix. .. code-block:: racket (struct PicV (pic)) (struct FunV (argname bindings expr)) No more subst ------------- .. index:: Abstract Syntax Tree, AST Next comes a very important change to the interpreter. The substitution process walks the expression tree once and then our interpreter does it again. These two are also tied by the fact that they both use the same expression tree structure (i.e. the "abstract syntax tree"). The only job of the :rkt:`subst` function was to replace the :rkt:`IdC` terms with their corresponding values. So if we're keeping track of "current bindings" as we're evaluating a (sub) expression, then we no longer need a separate :rkt:`subst` function. Our interpreter can do that job on its own, since it can lookup the value that is expected to go in the :rkt:`(IdC sym)` term's place and treat that as the result of "interpreting" that term. This would effectively perform a "substitution". Interpreter with bindings ------------------------- What we've been loosely calling a "set of bindings" where "binding" refers to a name-value pair association has a name -- the "environment". We need some practice to translate "environment" in our minds to a data structure that associates some meaning (i.e. value) with a set of symbols. So we'll continue to use both until that settles. .. note:: Personally, I find "environment" a little too broad though it is a standard term used in this context within interpreters. Here is the modified interpreter ... .. code-block:: racket ; bindings takes on the same list of two-element lists ; structure we used in stack-machine. (define (interp picexprC bindings) (match picexprC ; ... [(FunC argname expr) ; Store away the definition time bindings ; along with the FunV value structure. ; v------ One more field compared to FunC (FunV argname bindings expr)] [(ApplyC funexpr valexpr) (let ([fun (interp funexpr bindings)] [val (interp valexpr bindings)]) (match fun [(FunV argname definition-time-bindings expr) ; Now to apply the function, we add a new ; binding for its argument name and evaluate ; the body. (interp expr (extend-bindings argname val definition-time-bindings))] [_ (raise-argument-error 'interp "Function value for application" funexpr)]))] [(IdC id) (lookup-binding bindings id)] ; ... )) We've chosen to keep the "bindings" structure abstract by referring to the two things we need to do with it - "name lookup" and "extension" separate, so we can defer the actual choice of structure. .. admonition:: **Exercise** Write appropriate test expressions for this revised interpreter to check whether the scoping behaviour is indeed lexical. Use any simple representation you want to for the "bindings" parameter by implementing the :rkt:`extend-bindings`, :rkt:`lookup-binding` and :rkt:`make-empty-bindings` functions. .. admonition:: **Question** Consider the way we're evaluating :rkt:`FunC` terms in the interpreter. We store the current state of the :rkt:`bindings` argument in the resulting :rkt:`FunV` structure .. which is one of the possible result values of our interpreter. Do we need to store *all* the bindings in the :rkt:`FunV` structure? What are the consequences of storing all the bindings? Can we trim it down? If we can, how should we determine the set of bindings to trim down to? A standard library ------------------ Notice that we've gotten rid of the :rkt:`fundefs` argument which was a list of :rkt:`FunDefC` terms. Technically, we do not need it any more, as we can pass any functions as arguments to an :rkt:`ApplyC` term and refer to it within the value expression of the function term via the argument name. However, doing this for even 20 functions is cumbersome. .. note:: What exactly is cumbersome about it? Is it simply that we have to nest many :rkt:`ApplyC` terms? From a user interface perspective, all we need to do is to provide a starter set of bindings to the interpreter that expressions can use. These bindings must be mappings from symbols to :rkt:`FunV` terms. However, note that our :rkt:`FunV` terms also have their own list of bindings. The question then is "what should these bindings lists be?". We're calling this initial set of bindings a "standard library" of functions that may be useful when writing PicLang expressions. This also means that this standard library is also available to the standard library functions themselves. So one way to resolve this conundrum is to make these :rkt:`FunV` values given the standard library! The snake must eat its own tail! .. code-block:: racket ; definitions is a list of bindings where a "binding" is given ; as a list of two values - the first being the symbol to be bound ; and the second being the :rkt:`FunC` expression to bind to it. (define (make-standard-library definitions) (if (empty? definitions) (make-empty-bindings) (let ([def (first definitions)]) (extend-bindings (first def) (interp (second def) (make-standard-library definitions)) (make-standard-library (rest definitions)))))) There, in all its naïvette, is a fully self referential expression of what we actually want to produce. But we :doc:`kind of know how` to deal with such self reference. Let's rewrite the above self referential function. .. code-block:: racket (define (make-standard-library definitions stdlib) (if (empty? definitions) (make-empty-bindings) (let ([def (first definitions)]) (extend-bindings (first def) (interp (second def) stdlib) (make-standard-library (rest definitions) stdlib))))) Here we're assuming that our function will be passed the result :rkt:`stdlib` we're constructing and it makes use of this to construct its result!! That's something we did earlier as well in :doc:`Lambda - the everything`. We want to solve for the fixed point in the following "equation", given a list of :rkt:`definitions` - .. index:: Fixed point .. code-block:: racket stdlib = (make-standard-library definitions stdlib) If we now rewrite the RHS -- .. code-block:: racket spec = (λ (stdlib) (make-standard-library definitions stdlib)) stdlib = (spec stdlib) ; Using the Turing combinator approach ... (define (Θ spec) (spec (Θ spec))) stdlib = (Θ spec) We can now apply our "function is called with itself to enable the function to call itself" trick to get - .. index:: Turing combinator .. code-block:: racket (define G (λ (f) (λ (spec) (spec (λ (g) (((f f) spec) g)))))) (define Θ (G G)) (define stdlib (Θ spec)) We've used the :index:`Turing combinator` here to capture the process of computing such a "fixed point". Note that we've modified the trick above so it would work with eager evaluation strategy instead of the original :rkt:`#lang lazy` choice. The expression :rkt:`(λ (g) (((f f) spec) g))` is logically equivalent to :rkt:`((f f) spec)` by ":index:`η-reduction`", but helps delay the evaluation of the recursive parts by enough so we don't get stuck in an infinite loop. But this is still pending a specific representation for "bindings". Lets do a simple one -- where the idea is that we capture the meaning of "an environment can be used to lookup a value given a name" as a lambda function -- .. code-block:: racket (define (make-empty-bindings) (λ (name) (raise-argument-error 'lookup-binding "Valid name" name))) (define (lookup-binding name bindings) (bindings name)) (define (extend-bindings name value bindings) (λ (n) (if (equal? n name) value (lookup-binding n bindings)))) .. admonition:: **Exercise** Check that the above way of defining the standard library works. Add functions to print out the various intermediate structures to see how this actually works. .. admonition:: **Exercise** Reflect on the efficiency of this approach to making the "standard library" of definitions. Can we do something simpler if we allow ourselves some new feature of the language (Racket, not PicLang) that we haven't used so far? .. admonition:: **Question** An extension to the question in :doc:`stacks-and-scope` -- we got an additional super power apart from ordinary functions with the approach to :rkt:`FunC` and :rkt:`ApplyC` and :rkt:`IdC` above. Can you recognize it? You're so familiar with it by now it probably slipped past you without your notice. A superpower ------------ .. index:: Closure When we interpret a :rkt:`(FunC argname fexpr)` term within the interpreter and produce a :rkt:`(FunV argname bindings fexpr)` value, we've also captured the current set of bindings that can give meaning to :rkt:`IdC` references within the :rkt:`fexpr` part of the :rkt:`FunC` term and stored it away for future reference -- for use when we're applying the function to a value in an :rkt:`ApplyC` term. This :rkt:`FunV` structure is called a "closure". In the Scheme/LiSP family, we usually don't make it a point to distinguish between "functions" and "closures" and freely interchange the two terms. There is no meaningful distinction in programs since where functions are acceptable, closures can be passed and vice versa. However, while this is true in the LiSP family languages, many languages like Objective-C and C++ do differentiate between the two, with the idea that the programmer can choose what they want according to the performance criteria they need to meet. For example, you already understand what the following Racket code is expected to do .. and what we've done is to model how we do it so our understanding of the scope implications of making closures is now complete. .. code-block:: racket (define adder (lambda (y) (lambda (x) (+ x y)))) (define three-more-than (adder 3)) (define ten-more-than (adder 10)) (display (three-more-than 8)) ; Prints 11 (display (ten-more-than 8)) ; Prints 18 Closures are a powerful feature in any language that provides them. We've already seen this in the work we did with Church's lambda calculus -- that anything computable can be expressed using λ. Looking back, it is both a surprise and not a surprise that "mainstream" languages put off adopting closures for a long time under the perception that they're inefficient to implement, when the lisp family has had them from (pretty much) day one. There is some truth to the "inefficiency" myth in the early days, but on today's machines and with the heavy use of interpreted languages like Python and PHP, the inefficiency argument no longer holds ground. In particular, compilers for Common LiSP and Scheme have long implemented proper lexical closures resulting in programs more efficient than the interpreted ones. Where does this myth of inefficiency come from? A part of it you can see in the way we made the :rkt:`FunV` object. We just stored away the **entire** definition environment in the object, without any consideration for whether the function expression actually uses any values from it. One "optimization" here is to scan the function expression, collect the set of references that are bound in the definition environment, and make a new :rkt:`bindings` field that only keeps the used ones. Here, we're trading off the computational cost of constructing a closure against the memory cost of storing unnecessary bindings. Such optimizations are not essential for us to consider in this course as they don't (and indeed must not) change the meanings of our programs.