CSCI 384 FALL 2021


HOMEWORK 3:

Towards a Parser for MiniML

Exercises 1 and 2 Due: Monday, 9/20 at 3pm.
Exercises 3 and 4 Due: Wednesday, 9/22 at 3pm.


In class I showed you a recursive descent parser for a calculator language that had the following grammar:

<expn> ::= <expn> + <expn> | <expn> * <expn> | x | 5 | ( <expn> )   

To build a deterministic parser, we developed the following unambiguous grammar, making multiplication have higher precedence than addition, and making each left associative:

<expn> ::= <addn>
<addn> ::= <addn> + <mult> | <addn> - <mult> | <mult>
<mult> ::= <mult> * <expt> | <expt>
<expt> ::= <atom> ** <expt> | <atom>
<atom> ::= x | 5 | ( <expn> )

I’d like you to make a few additions to the language, and then modify the parser accordingly.

Each of the language extensions appear in the language Standard ML. In fact, all the language constructs we are defining, excepting the ** operator we added in class, are in Standard ML. This means that you can test for things like operator precedence and associativity (i.e. “grouping”) using the SML interpreter. There is one on patty. On patty, you can type sml, like so:

jimfix@patty:~$ sml
Standard ML of New Jersey v110.79 [built: Sat Oct 26 12:27:04 2019]
- 3 + 4;
val it = 7 : int
-

There you can type expressions like 3 + 4, ending with a semi-colon, and have them checked and evaluated in the SML interpreter.

You can obtain the parser by the link below:

    • parser code

You will extend the code according to the exercises below. Note that there are two deadlines and there will be two submission sites. When you’ve completed the first two exercises, you should upload your modified parser.py file at the first Gradescope site for this homework. And then when you finish all the exercises you should submit parser.py to the second Gradescope site.

Please work to meet the first checkpoint. Do not wait until Monday to start writing this code!


Exercise 1: integer division

Let’s extend the syntax to flesh out the “muliplicative” parts of the grammar. We’ll use syntax that we borrow from Standard ML (or just SML), one of the dialects of Robin Milner’s ML functional language. In SML there are two integer division operations div and mod. These each sit at the same “precedence level” as *, so they have higher precedence than addition and subtraction, and they have lower precedence than exponentiation. Also, they are left associative. This means that an expression like:

x mod 10 div 3 * x - 25 * x + x mod 6

should be read as

(((((x mod 10) div 3) * x) - (25 * x)) + (x mod 6))

We can just enhance our grammar above like so:

<expn> ::= <addn>
<addn> ::= <addn> + <mult> | <addn> - <mult> | <mult>
<mult> ::= <mult> * <expt> | <mult> div <expt> | <mult> mod <expt> | <expt>
<expt> ::= <atom> ** <expt> | <atom>
<atom> ::= x | 5 | ( <expn> )

Modify the parser so that it parses instead according to this grammar. You’ll need to generate abstract syntax tree nodes (AST nodes) labeled with Div, and Mod, the division analogues for a muliplication node labelled with Times.

Alternatively, if you like, you could instead invent a new AST label "BinOp" for binary operations, and have ASTs like, say,

["Binop", '+', ..., ...]
["Binop", 'div', ..., ...]

instead of

["Plus", ..., ...]
["Div", ..., ...]

for these and the other binary operations.


Exercise 2: logic and comparison

Now we’ll imagine that our calculator language also has a boolean value type. This will be in preparation for adding an if expression in a later exercise. It will also allow us to have variables bound to boolean values so that we can reason about program conditions.

We add two constants true and false. These are “atomic” expressions just like integers and variable names. We also add comparison operators = and <. To keep things simple, let’s not allow them to be “strung out.” So we won’t allow

3 = 4 = 5

or

3 < 4 = 5

etc. And they sit at the same precedence level, lower than plus and minus, so that

3 + 4 < 5 * 6        3 + 4 = 5 * 6

are interpreted as

(3 + 4) < (5 * 6)    (3 + 4) = (5 * 6)

Finally we add logical AND, OR, and NOT to the syntax. Following SML, we name these binary connectives andalso, orelse, and not.

This leads to the following changes to the grammar. We (1) replace

<expn> ::= <addn>

with

<expn> ::= <disj>

and then (2) we include the new productions

<disj> ::= <disj> orelse <conj> | <conj>
<conj> ::= <conj> andalso <cmpn> | <cmpn>
<cmpn> ::= <addn> = <addn> | <addn> < <addn> | <addn>

Lastly, we (3) add the productions

<nega> ::= not <atom> | <atom>
<atom> ::= true | false

and have the <mult> productions go to <nega> instead of <atom>.

Modify the parser so that it parses this updated grammar accordingly. Make sure you also tell me in your Python header comments what you chose to use as the AST nodes for these additions.

You can either represent true and false with their own leaf nodes

["True"]
["False"]

or with something similar to what I did with integers

["Boolean", true]  ["Boolean", false]

Just let me know in the header comment.

Note: This is going to introduce some nonsensical expressions that we will consider syntactically correct. For example not 3 and 3 + true and 3 + 5 andalso x are all legal syntax and parsable in this grammar. We’ll fix this weirdness later when we do semantic analysis (with typechecking) rather than syntactic analysis (what we are doing here with parsing). Though it seems wrong, this division of labor is typical in the staging of SML-like languages’ program analysis. The typically staging is:
   • tokenization, typically relying on regular expressions
   • syntactic analysis, i.e. parsing according to a grammar
   • semantic analysis, e.g. type-checking
So let’s just parse according to this grammar that allows some untype-able expressions.

Here’s the grammar now, in full glory.

<expn> ::= <disj>
<disj> ::= <disj> orelse <conj> | <conj>
<conj> ::= <conj> andalso <cmpn> | <cmpn>
<cmpn> ::= <addn> = <addn> | <addn> < <addn> | <addn>    
<addn> ::= <addn> + <mult> | <addn> - <mult> | <mult>
<mult> ::= <mult> * <nega> | <mult> div <nega> | <mult> mod <nega> | <nega>
<nega> ::= not <atom> | <atom>
<atom> ::= x | 5 | true | false | ( <expn> )

Exercise 3: pretty printing

In class I’ll show you a version of my parser that “pretty prints” (this is a standard phrase!) the AST constructed by the parser as a result of a successful parse. Mine was fairly elaborate but, roughly, indented the contents of a subtree’s node labels so that they were further right of their parent. It also put “sibling” nodes at the same indentation level.

Write similar code for reporting the AST of a successful parse. It need not have node links, just the indenting. And it’s also fine to just have one each node level sit on its own line. Things to the left in the expression should be on earlier lines than things to their right.

For example, a parse of 3 + 4 * 5 could be depicted as

Plus
    Num
       3
    Times
         Num
            4
         Num
            5

In my code, I distinguished “branching” nodes from “leaf” nodes by whether they were a list of length one that contained only a label (for example, some of you might have chosen to represent the boolean literals as [“True”] and [“False”].

The code should be recursive, and is essentially an “in order” traversal of the AST. It will have two base cases: (1) labelled subtrees that have no subtrees, that is, are lists of length one and (2) data values in the trees (i.e. non-lists). So a Python condition like

 isinstance(ast,list) and len(ast) >= 2

would indicate a recursive case,

 isinstance(ast,list) and len(ast) == 1

would be the one base case, and

  not isinstance(ast,list)

would be the other.

Hint: your pretty-printer function will need to be recursive. In addition to having a parameter for the subtree to be output, it should have a second parameter descrbing, saym the length of the indentation for that subtree’s output.

For a BONUS version of this exercise, you could instead try to mimic my fancier output. Mine produced these lines of text

Plus-+-Num-+-3
     |
     +-Times-+-Num-+-4
             |
             +-Num-+-5

I’m willing to give lots of hints on how to do this.


Exercise 4: if and let

Let’s now add two “top-level” syntactic constucts, also found in Standard ML, one for a “conditional” if-then-else expression and one for let-binding of variable names. These have the expression syntax:

<expn> ::= let val <name> = <expn> in <expn> end

and

<expn> ::= if <expn> then <expn> else <expn>

Modify the parser to also parse these two kinds of expressions.


BONUS Exercise 5: evaluation

Now that we have booleans with if expressions, and let for introducing definitions of variables, the language’s “programs” can be “executed”. Given the simplicity of this language, a program is just an expression and “running it” involves evaluating it to determine its value.

Write a function def eval(ast) that takes a parsed string’s abstract syntax tree, and computes its value. To make it work, you’ll probably want to write it recursively. This means that you’ll want a two-argument helper function

def evalInCtxt(ast,ctxt):

that is given a context ctxt of variable definitions. A context is a collection of variable-value pairs. For example, in the expression

let val x = 5 in x + 10 end

we’ll need to know that x is bound to 5 when we are evaluating x + 10.