Week 01

A C++ grammar

In the lines below I give the syntax for writing C++ programs. I describe structure of C++ code using only what we’ve covered in lecture. C++ is a vast programming system, and it has evolved in a way that there now are several ways of expressing nearly the same thing. Throughout the semester I will work to keep things simple, showing you the key features (from my perspective) so that you can focus on mastering a certain core of the language.

For now, we focus on programs that work with integer, boolean, and (“double-precision”) floating point values. Soon we’ll look at representing character strings and characters.

I give the C syntax as general rules, rather than give specific examples, using a context-free grammmar notation that is fairly standard for programming languages. It is called BNF (for Backus-Naur form). Each line describes the form of a sequence of terms in the C language. For example, I have the rule

    cndl ::= if ( expn ) { blck } else { blck }

which tells you about the expected structure of “if-then-else” conditional statements in C. This rule tells you the ways you can “fill-in” an if-then-else to make a valid statement. So, for example, all these have that form:

if (false) { w = x * y - z; } else { w = 8; }
if (f && chk) { c = (x >=1) } else { c = !c; i++; }
if (x < 1) { cout << "Y"; } else { no = true; cout << "N"; }

This is because, if you squint at them, they all look like

 if (??) { ???? } else { ??? }
 if (??????) { ??? } else { ???? }
 if (????) { ????? } else { ??????????????????? }

and, furthermore, the ? portions are filled in with code that follow the rules for expn and blck.

Grammar rules like this are sometimes called productions because they can be seen as a kind of code-generating game: we start maybe with stmt, choose one of the rules for expanding that symbol, replacing stmt with the right hand side of the rule. This yields a sequence of terms where further replacement can be done, and so we use the rules to fill in those term “holes”, for example, consider these successive steps in that game:

  1. stmt
  2. if ( expn ) { stmt } else { stmt }
  3. if ( lit-val ) { stmt } else { stmt }
  4. if (false) { stmt } else { stmt }
  5. if (false) { stmt } else { name = expn ; }
  6. if (false) { stmt } else { w = expn ; }
  7. if (false) { stmt } else { w = lit-val ; }
  8. if (false) { stmt } else { w = 8; }
  9. if (false) { name = expn ; } else { w = 8; }
  10. if (false) { w = expn ; } else { w = 8; }
  11. if (false) { w = expn op-2 expn ; } else { w = 8; }
  12. if (false) { w = expn - expn ; } else { w = 8; }
  13. if (false) { w = expn - name ; } else { w = 8; }
  14. if (false) { w = expn - z; } else { w = 8; }
  15. if (false) { w = expn op-2 expn - z; } else { w = 8; }
  16. if (false) { w = expn * expn - z; } else { w = 8; }
  17. if (false) { w = name * expn - z; } else { w = 8; }
  18. if (false) { w = name * name - z; } else { w = 8; }
  19. if (false) { w = x * name - z; } else { w = 8; }
  20. if (false) { w = x * y - z; } else { w = 8; }

With each line we replace a grammar symbol—these are the angle-bracketed terms that appear on the left-hand side of some production in our grammar—with a right-hand side chosen from among the rules. Repeating this kind of substitution can eventually lead to a term sequence where no more replacements can be performed. At this point, we’ll have C code that is allowed by our rules, one that consists only of terms that come from the C++ language, terms like else, 8, and *, giving a particular if-then-else statement that can appear in a C++ program.

Notice that, in the above, we don’t explicitly specify formatting with tabs, spaces, and line breaks. That is because the C++ language doesn’t require much specific “whitespace” formatting to read and understand your code. In Python, as you know, tabs and line breaks are very important, and you sometimes have to be careful of spaces too. In C++, you lay out code with tabs, line breaks, and spacing simply to make code readable by other people. You should definitely work to do that, but C++ doesn’t actually require it. (See the Obfuscated C Code Contest to see how creatively code styling can be abused!)

Below here, then, is a portion of the grammar, one that I’ll flesh out throughout the semester. It defines the syntatic rules for writing a program, starting at the top-level symbol prgm. It describes the include syntax, the syntax for main, for blocks of statements, for the different kinds of statements including loops, conditionals, and variable declarations, for arithmetic expressions, logical expressions, comparison expressions, and literal values. Each gives a syntactic rule of the form

    construct ::= option1 | option2 || optionx

Here those rules are:

    prgm ::= incls defns main
    incl ::= #include <iostream> | #include <string> |
    defn ::= func | proc
    func ::= type name(var-dec,) {blck}
    proc ::= void name(var-dec,) {blck}
    main ::= int main(void) {blck}

    stmt ::=
           name = expn;
         | var-dec; | var-dec = expn; | var-dec = { expn,};
         | cndl | loop | updt
        | std::cout << expn << << expn;
        | std::cin >> name;
        | return expn; | return;

    var-dec ::= type name
    type ::= int | bool | double | char | std::string
    name ::= x | i | y0 | sum | doThis |

    updt ::=
           name op-2= expn;         | name++; | name--;
        | ++;name | --;name

    cndl ::=
           if (expn) {blck}
        | if (expn) {blck} else {blck}
        | if (expn) {blck}
        | if (expn) {blck} else if (expn) {blck}
        | if (expn) {blck} else if (expn) {blck} else {blck}
        |  

    loop ::=
           while (expn) {blck}
        | do {blck} while (expn);
        | for (stmt;expn;stmt) {blck}

    expn ::=
            expn op-2 expn
         | op-1 expn
         | lit-val
         | (expn)
    op-2 ::= arit | logc | cmpn | shft
    arit ::= + | - | * | / | %
    logc ::= && | ||
    shft ::= << | >>
    cmpn ::= = | != | > | >= | < | <=
    op-1 ::= - | !
    lit-val ::= int-val | bool-val | str-val | chr-val | dbl-val
    int-val ::= | -2 | -1 | 0 | 1 | 2 |
    bool-val ::= true | false
    str-val ::= "hello" |
    char-val ::= 'h' |
    dbl-val ::= 42.0 | 3.14159 |

I also have a version of this as a text file.