CSCI 384 Fall 2021

SML algebraic data types

Below I attempt to give a grammar for introducing new data types using SML’s datatype declaration. I have several examples of these in the sample source code from lecture

A datatype declaration invents a type by listing out the variants of its type. Here is an example describing a type named arith for representing ASTs of arithmetic expressions:

datatype arith =
    Plus of arith * arith
  | Times of arith * arith
  | Negate of arith
  | Var of string
  | Num of int

A datatype definition devines several constructors, one for each of the variants. In the above, for example, Plus, Times, Negate, etc. are the constructors of terms of type arith. Each constructor can each be thought of as a function whose parameter(s) have the type indicated by the singleton type, or tuple type, given after the of. For example, we can think of Var and Times being functions of type

Var : string -> arith 
Times : arith * arith -> arith 

These constructors allow us to build terms like

Negate (Plus (Times (Num 3, Var "x"), Num 4))

This could express the result of a parse of the string "- ( 3 * x + 4 )".

Here also is a definition of integer-keyed binary search trees:

datatype bstree =
    Lf
  | Br of int * bstree * bstree

Note that the constructor Lf takes no arguments, and instead is a constant of type bstree. In summary

Lf : bstree
Br : int * bstree * bstree -> bstree

This would allow us to express a tree using the SML term

Br (5, Br (1, Lf, Br (4, Lf, Lf)), Lf)

This is a search tree with 3 nodes in it that has the shape

        5
       / .
      1
     . \
        4
       . .

To work with these data types, SML provides a case expression that allows you to reason about a value’s possible variants. For example, the code below would find all the operations used in an arithmetic expression (with repeats):

fun opsOf ae = let fun opsLR le re = (opsOf le) @ (opsOf re)
               in
                 case ae of
                  | (Num _) => nil
                  | (Var _) => nil
                  | (Plus (l,r)) => "+"::(opsLR l r)
                  | (Times (l,r)) => "*"::(opsLR l r)
                  | (Negate e) => "*"::(opsOf e)
               end

The case syntax relies on a pattern syntax that allows you to match a term of the data type, determining which variant it was constructed as, and picking out and naming the components of its subterm(s) with variables. The code can then do something with those subterms’ variables in the code after the =>.

This kind of pattern-matching can also be used in function definitions. For example the code below is a rewrite of the function above:

fun opsOf (Num _) = nil
  | opsOf (Var _) = nil
  | opsOf (Plus (l,r)) = "+"::(opsLR l r)
  | opsOf (Times (l,r)) = "*"::(opsLR l r)
  | opsOf (Negate e) = "*"::(opsOf e)
and opsLR l r = (opsOf l) @ (opsOf r)

These programmer-defined datatypes hint at the way that SML thinks about some of its built-in datatypes. For example, you could see the below as the definitions of bool and the polymorphic list type:

datatype bool = true | false
datatype 'a list = nil | op:: of 'a * ('a list)

Below we give the grammar for the syntax of datatype, of case, and of the patterns we can match against. In the grammar rules, any | character that does not serve as the start of a line is actually a literal token. It is not being used as a grammar meta-symbol in those cases.

—– datatype definition —–

<defn> ::= datatype <name> = <vrnt> | ... | <vrnt>
<defn> ::= datatype <tyvr> <name> = <vrnt> | ... | <vrnt>
<defn> ::= datatype (<tyvr>,...) <name> = <vrnt> | ... | <vrnt>

<tyvr> ::= 'a
         | 'b
         | 'c
         | ...
         
<vrnt> ::= <name>
         | <name> of <type>

—– patterns and case —–

<patt> ::= <name>
         | _
         | <name> as <patt>
         | <literal>
         | <cnst>
         | ( <cnst> <patt> )
         | ( <patt> , ... , <patt> )
         | ( <patt> :: <patt> )

<cnst> ::= <name>

<expn> ::= case <expn> of <patt> => <expn> | ... | <patt> => <expn>
<defn> ::=
  fun <name> <patt> ... = <expn> | ... | <name> <patt> ... = <expn>
<defn> ::= val <patt> = <expn>