CSCI 384 Fall 2021
SML algebraic data types
Below I attempt to give a grammar for introducing new data types using SML’s datatype
declaration. I have several examples of these in the sample source code from lecture
A datatype
declaration invents a type by listing out the variants of its type. Here is an example describing a type named arith
for representing ASTs of arithmetic expressions:
datatype arith =
Plus of arith * arith
| Times of arith * arith
| Negate of arith
| Var of string
| Num of int
A datatype
definition devines several constructors, one for each of the variants. In the above, for example, Plus
, Times
, Negate
, etc. are the constructors of terms of type arith
. Each constructor can each be thought of as a function whose parameter(s) have the type indicated by the singleton type, or tuple type, given after the of
. For example, we can think of Var
and Times
being functions of type
Var : string -> arith
Times : arith * arith -> arith
These constructors allow us to build terms like
Negate (Plus (Times (Num 3, Var "x"), Num 4))
This could express the result of a parse of the string "- ( 3 * x + 4 )"
.
Here also is a definition of integer-keyed binary search trees:
datatype bstree =
Lf
| Br of int * bstree * bstree
Note that the constructor Lf
takes no arguments, and instead is a constant of type bstree
. In summary
Lf : bstree
Br : int * bstree * bstree -> bstree
This would allow us to express a tree using the SML term
Br (5, Br (1, Lf, Br (4, Lf, Lf)), Lf)
This is a search tree with 3 nodes in it that has the shape
5
/ .
1
. \
4
. .
To work with these data types, SML provides a case
expression that allows you to reason about a value’s possible variants. For example, the code below would find all the operations used in an arithmetic expression (with repeats):
fun opsOf ae = let fun opsLR le re = (opsOf le) @ (opsOf re)
in
case ae of
| (Num _) => nil
| (Var _) => nil
| (Plus (l,r)) => "+"::(opsLR l r)
| (Times (l,r)) => "*"::(opsLR l r)
| (Negate e) => "*"::(opsOf e)
end
The case
syntax relies on a pattern syntax that allows you to match a term of the data type, determining which variant it was constructed as, and picking out and naming the components of its subterm(s) with variables. The code can then do something with those subterms’ variables in the code after the =>
.
This kind of pattern-matching can also be used in function definitions. For example the code below is a rewrite of the function above:
fun opsOf (Num _) = nil
| opsOf (Var _) = nil
| opsOf (Plus (l,r)) = "+"::(opsLR l r)
| opsOf (Times (l,r)) = "*"::(opsLR l r)
| opsOf (Negate e) = "*"::(opsOf e)
and opsLR l r = (opsOf l) @ (opsOf r)
These programmer-defined datatypes hint at the way that SML thinks about some of its built-in datatypes. For example, you could see the below as the definitions of bool
and the polymorphic list
type:
datatype bool = true | false
datatype 'a list = nil | op:: of 'a * ('a list)
Below we give the grammar for the syntax of datatype
, of case
, and of the patterns we can match against. In the grammar rules, any |
character that does not serve as the start of a line is actually a literal token. It is not being used as a grammar meta-symbol in those cases.
—– datatype definition —–
<defn> ::= datatype <name> = <vrnt> | ... | <vrnt>
<defn> ::= datatype <tyvr> <name> = <vrnt> | ... | <vrnt>
<defn> ::= datatype (<tyvr>,...) <name> = <vrnt> | ... | <vrnt>
<tyvr> ::= 'a
| 'b
| 'c
| ...
<vrnt> ::= <name>
| <name> of <type>
—– patterns and case —–
<patt> ::= <name>
| _
| <name> as <patt>
| <literal>
| <cnst>
| ( <cnst> <patt> )
| ( <patt> , ... , <patt> )
| ( <patt> :: <patt> )
<cnst> ::= <name>
<expn> ::= case <expn> of <patt> => <expn> | ... | <patt> => <expn>
<defn> ::=
fun <name> <patt> ... = <expn> | ... | <name> <patt> ... = <expn>
<defn> ::= val <patt> = <expn>