CSCI 384 FALL 2021

HOMEWORK 2: Parsing Trees

The three exercises below (plus a bonus one) are all themed around a grammar for a language describing unlabelled binary trees. Each asks you to write some code that either wrestles with its strings, or else with their internal representation within a program.

Exercise 1 asks you to generate tree strings, Exercise 2 asks you to parse them, and Exercise 3 (in essence) helps you debug your parser. The natural thing to do is to write these all as one program, or as two programs where maybe the output of one is fed to the other.

Regardless, you can write the code in Python or in C++, your choice.


[HW2 EX1] describe a tree

Consider the following grammar, somewhat like balanced brackets:

     tree ::= ( tree @ tree ) | .

It produces strings that correspond to binary trees. The string . corresponds to a tree consisting only of a leaf. The string ( . @ . ) corresponds to a tree with a branched, internal node whose two children are leafs. The string ( ( . @ . ) @ ( . @ . ) ) corresponds to a balanced binary tree with two levels of internal, branching nodes (a total of three such nodes) and with four leaves. These trees are depicted below as ASCII art:

.

  @
 / \
.   .

      @
     / \
    /   \
   @     @
  / \   / \
 .   . .   .

As a final example, the binary tree

     @
    / \
   @   .
  / \  
 .   . 

corresponds to the string ( ( . @ . ) @ . ).

• Write (recursive) code that generates a random string that is generated by this grammar. Each string generated by this grammar should have a non-zero probability of being produced. It should not output a string that is not in this grammar.

Note that my strings have spaces in them, just to make things readable here. Yours can just contain the characters @, ., (, and ) if you like.


[HW2 EX2] parse a tree

There are several ways to represent binary trees, as depicted in the first exercise, within computer programs. In Python, for example, we could invent a class hierarchy whose abstract superclass is Node, and with subclasses Branch and Leaf. The constructor for Branch would take two subtrees as parameters, and the constructor for Leaf would take none. And so the trees for ., ( . @ . ), and ( ( . @ . ) @ . ) would be represented by these three constructions of Node instances:

Leaf()
Branch(Leaf(),Leaf())
Branch(Branch(Leaf(),Leaf()),Leaf())

Alternatively, and also in Python, we could use lists to represent these tree objects. A tree with two subtrees l and r could be represented by the Python expression

 ["Branch", l, r]

and, mimicking the above, a leaf could be the expression

 ["Leaf"]

A list of length three and a list of length one. And so ( ( . @ . ) @ . ) would be the expression

 ["Branch", ["Branch", ["Leaf"], ["Leaf"]], ["Leaf"]]

This may seem like a gratuitous use of lists, but it hints at a more general structure of “labelled trees” that we will use for representing parsed terms in programming languages.

• Write a program that is fed as input a string generated by the grammar from the first exercise. It should build an internal representation of the tree structure that it parses. It should also report an error if the string it parses is not generated by the grammar.

If the code is Python, use either of the two representations I describe above. If in, say, C++, use a representation that mimics either of those two representations. (E.g. the latter could be a new class Term that consists of a string along with a vector of references to Term objects.


[HW2 EX3] print a tree

• Okay, now write code that, when given a representation of a binary tree like you used in Exercise 2 just above, outputs a string from the grammar that represents it.

It should traverse the tree data structure (recursively) and build a string from it.


[HW2 EX4 BONUS] pretty printing trees

• As an alternative to Exercise 3 just above, have your code instead produce ASCII art that depicts the binary tree using several lines of console text. The trees can grow upward, downward, sideways, however you choose.