Reed CSCI 384 Fall 2021

Homework 05: a MiniML interpreter

The deadlines are::
• Code for *, -, div, mod, orelse, <=, = due Friday, 10/15, 7pm.
• Written semantics of pairs, output/sequencing due Friday, 10/15, 7pm.
• Pairs, print, and ; code due Wednesday, 10/27, 3pm.
• Function application, fn, let fun code due Wednesday, 10/27, 3pm.


Download the interpreter code


In an earlier homework you worked out a parser for a (very partial) grammar for the MiniML language. In class the last weeks we worked out the semantics for that language. We chose to develop a “big step” semantics, defining the meaning of a three-place relation:

E |- e V v

(here, we are using capital V in place of a “double” downward arrow) which asserts that the abstract syntax term e has the value v when evaluated in the variable-value binding context E.

We added functions to the language, along with function application, with the grammar productions

<exp> ::= fn <name> => <expn> 
        | <expn> <expn>

These correspond to a lambda expression with a formal parameter on the left side of the arrow, and its rule for evaluation on the right. For example

fn x => x + 1

would be the integer successor function. The second is the “silent juxtaposition” binary operation for function application. For example

(fn x => fn y => x + y) 3 4

is a fancy way of writing 7.

We then added semantic rules for evaluating these two constructs, having added their AST constructors ‘fn’ and ‘ap’. To do so we needed to invent new kind of value, a “closure”. Closures are essentially the same as their lambda expression, namely, they consist of their function’s formal parameter, along with the AST of their rule for evaluation, but also the variable binding environment that was the context within which that rule was described. For example, the expression

let val y = 6 in (fn x => y*10 + x) end

evaluates to a closure triple

("x", r, C) 

where r is the AST

plus(times(var("y"),num(10)),var("x"))

and the C is the environment

[("y",6)]

That way, should that function get applied to some value, we can evaluate the rule knowing the actual parameter value to be substituted for the formal parameter “x”, along with the bound value 6 to be substituted for “y”.

ASSIGNMENT

The code in the file miniml.py contains a parser for MiniML, including parsing of the function syntax, producing ASTs that contain the constructors fn/app with the grammar rules:

<exp> ::= fn <name> => <expn> 
        | <expn> . <expn>

NOTE THE SLIGHT VARIATION: The parser expects an explicit . when code performs a function application. To the left of the period is a function expression. To the right of the period is the argument being fed. I may have time to fix the parser so that the . isn’t needed— it’s not hard to— but that is its current state. (And I didn’t want to introduce any last-minute bugs.)

The code, in addition, is built as a simple interpreter for the MiniML language, rather than just a parser. Most of that code sits in the definition of the function def eval(env,ast) described more below, but there are also a few supporting functions I’ve written.

The code is incomplete. It handles let, var, if, num, plus, and, less, true, and false. Your job is to finish the code. In the end you’ll hand in a working MiniML interpreter, along with a significant series of test programs that you used to convince yourself that the code works.

This means that you’ll need to write the eval cases for times, div, mod, minus, or, fn, app, leq, equals, and not. The exercises also ask you to handle fst, snd, pair, seq and print, too. You’ll want to consider adding a unit literal, as well, as this is the return value of print.

Below I make a few notes about the implementation, offer some bonus exercises, and also request that you complete some written exercises.

Note 1: AST nodes with code locations

My parser now returns richer AST nodes. Like before, each node is a Python list ast where ast[0] is a string that indicates the constructor for that node, and ast[1], ast[2], etc. are the needed information components attached to that node, including the ASTs of subexpressions. But I’ve also added a last item, ast[-1] which is a string indicating the line # and column # where that node’s expression was parsed within the source.

Throughout the code, you’ll see me report an error by grabbing that location information, usually as a variable named where. For example, evaluation of a plus makes sure that its two sub-expressions each evaluate to an integer value. It raises an error indicating where the + occurred in the source code, should either of those sub-expressions result in a non-integer.

Your eval code should also raise “run-time” errors when they should arise (e.g. type errors, divisions by 0, failure to define a name in the MiniML program, failure to apply a function in ap, etc.)

Note 2: tagged values

I made a design decision within my interpreter to represent values returned by eval as more than just a Python value. Instead, I return a list that describes the type of the value, along with the value itself. For example, the MiniML expression 35 gets returned as the Python list

["Int",35]

and the MiniML expression true gets returned as the Python list

["Bool",True]

I use these “tagged values” for type checking, as mentioned in Note 1, and also for reporting the result of an evaluated expression to the MiniML user/programmer.

The starter code expects function values to have the tag "Clos". You’ll want to return a list with that tag when you evaluate an fn. It should result in a closure value, and in this form:

["Clos", ...the information needed for a closure... ]

Note 3: provided code

Here is a summary of the functions I’ve added for the interpreter:

  • def interpreter(tokens): takes a token stream, parses it, evaluates the AST from the parse, then reports the resulting (tagged) value.

  • def eval(env,ast): incomplete code that evaluates a MiniML term ast within a variable-value binding environment env.

  • def lookUpVar(x,env,err): my implementation of variable look-up (our notation E(x)) that seeks a binding of x within the environment env. The extra parameter err is expected to be a string, an error message raised when the variable isn’t found in the given environment.

  • getIntValue(tval,err): this is a helper function that’s used to pick apart the tagged value tval, expecting it to be an integer, and returning that integer value. It raises an error with message err if tval is something else.

  • getBoolValue(tval,err): similar, for boolean values.



Exercise: shortcircuited orelse

Write the case for evaluating the orelse logical operator. The AST will have the tag "Or". Like the way we evaluate "And" for andalso, this logical construct should be “short-circuited” by the interpreter. This means that there should be cases where only the left expression gets evaluated by the interpreter. For orelse, if the left expression evaluates to true, then the right expression should not be evaluated because, at that point in the evaluation, the full expression will be already known to be true.



Exercise: add pairs to MiniML

Consider adding the additional grammar:

<expn> ::= fst <atom> | snd <atom> | <atom>
<atom> ::= ( <expn> , <expn> )

The bottom of the two is the syntax for constructing a pair of values in MiniML. The top two are used to extract either the first component (with fst) or the second component (with snd) of a pair value.

Add these to the parser, producing their AST nodes. Add cases to eval that handle programs that use pairs. You’ll need to add a new tagged value type, one that represents pairs, to make these constructs work.



Written exercise: semantics of pairs

Give the written rules for evaluating these three pair constructs.



Exercise: adding printing; sequencing

Add a sequencing operation to the MiniML language. This has the syntax

<atom> ::= ( <expn> ; <expn> ; ... ;  <expn> )

Also add a construct

<expn> ::= print <atom>

The second, when evaluated, should evaluate the expression parsed as , and then print that value as a single line on the console. The first when evaluated, should evaluate each of the expressions in the sequence, from left to right, and then ultimately return the value of the last expression.

Thus, for example,

(print 5; print 7; print 2; 8)

would print three lines with 5, 7, and 2, in that order, and then should evaluate to 8.

The value returned by print should be a new value (), one of type unit. (You might consider adding a parse of the expression “()” to the grammar/parser, just to be complete.)

Parse the semicolon for sequencing as a left-heavy parse tree of seq AST nodes. So the expression above parses as

seq(seq(seq(print(num(5)),print(num(7))),print(num(2))),num(8))

Add print and seq execution to the interpreter.



Written exercise: semantics of output

With print and seq, the semantics of MiniML changes fundamentally. To model output, we instead want a 4-place relation like

E |- e V v => <v1,v2,...,vk>

This says that “evaluating e within the context E yields the value v along with the output value sequence <v1,v2,…,vk>.” If the expression has no output, then k=0, and we’d say that => <> instead.

So our new evaluation relation is

E |- e V v => s

where E is a sequence of (Id x Val), e is a Trm, v is a Val, and s is a sequence of Val. An example s could be <5,7,2> or it could be <>.

Consider a subset of the MiniML language that includes only num, var, let, if, less, plus, seq, and print. Give the full semantics for this language by defining the inference rules for the “eval with output” relation.

For example if we evaluate seq(e1,e2) and evaluating e1 outputs s1 = <v1,v2,…,vk> and then evaluating e2 outputs s2 = <u1,u2,…,un> then our rules should say that seq(e1,e2) outputs their concatenation s1 • s2, that is, the sequence <v1,v2,…,vk,u1,u2,…,un>.



Exercise: adding functions and function application

Okay you are now ready to handle closures. You need to extend eval to handle the fn construct, which gets parsed as an "Fn"-tagged AST. Recall the rule for evaluation

----------------------
E |- fn(x,r) V (x,r,E)

This is just an axiom in our rules, and the value listed as a triple is a closure. Your eval code, when it sees an fn, should give back a "Clos"-tagged value that holds the formal parameter name of the function, the AST rule of the function, and the environment env.

The key test of this work is with the "App"-tagged AST node, which corresponds to function application. You’ll need to handle a case for eval that mimics this rule from class:

E |- e1 V (x,r,C)   E |- e2 V a   [x -> a]C |- r V v
----------------------------------------------------
               E |- app(e1,e2) V v

We evaluate the function, and make sure it is a closure value. We then evaluate the argument. And then we make a recursive call to eval to evaluate the function body (r in the above) with that argument value.

Suddenly, you have a functional interpreter for a functional language!



Exercise: adding let fun

In Standard ML, there is a construct let fun that allows its programmers define functions without using the fn notation. For example, here is the code for describing and using the integer squaring function

let fun sqr x = x * x
in
  sqr 9
end

This expression evaluates to 81. It binds the name sqr to the function value equivalent to fn x => x*x.

Modify the parsing code and interpreter code to handle function definitions. Have it work for multi-argument functions, for example

let fun twoDigit tens ones = 10*tens + ones
in
  twoDigit 3 7
end

Note that Standard ML allows complicated patterns in place of the formal parameter variables. It also allows several cases to be defined using the | syntax. Your code definitely doesn’t need to deal with that. Your code should just handle a single case where the patterns are only variable names.

The fun construct is used by Standard ML programmers to define recursive functions, so it is not mere “syntactic sugar” for let val f = fn .... Here, for example is code for computing the 10th Fibonacci number:

let fun fib n = 
    if n = 0 then 0
    else if n = 1 then 1
    else fib (n-1) + fib (n-2)
in
    fib 10
end

For this to work, the name fib needs to be bound to a function value, but one that somehow knows the definition of itself when its body gets evaluated. So somehow its value should be a closure, say, with a binding for the name fib included within the context carried around by that closure.

Parse this kind of let fun construct and then figure out a way to get recursion to work. You don’t have to handle mutually recursive functions defined with a series of the and connector. It is not too hard to get mutual recursive functions to work—to have a collection of functions each know about all the other companion functions in their mutual definition— but it is a BONUS if you get these kinds of definitions working, too.