3.1 Example

This section presents the parser for the functional language Lambda for which a scanner was specified in the last chapter.

3.1.1 Writing a Parser Specification

Program 3.1 shows the parser specification which will serve as an example to demonstrate the basic concepts of the Gump Parser Generator. This example will be examined in detail in the following.

Class Descriptors

Again, a Gump specification resembles a class definition introduced by a special keyword, parser, and augmented by additional declarations. The usual class descriptors from and meth are also used in this specification in lines 2 to 8. The switches \switch +gumpparseroutputsimplified and \switch +gumpparserverbose simply cause additional information to be output at parser generation time; we will see this in the next section.

The error method will be called upon detection of parse errors . Its parameter is a virtual string describing the error. We redefine this method (which has a default implementation in the super class GumpParser.'class') since we want to provide the user with the line number information we maintain in the scanner.

Token Declarations

In line 10 begin the token declarations. All token classes (which must be atoms) that the scanner can produce are listed after the token keyword. Additionally, some tokens are assigned an associativity (here: leftAssoc) and a precedence value (a nonzero positive integer) after a colon. These are used to resolve ambiguities in the syntax rules. The reason for the assignments in our example are explained below. (You may notice that one of the listed tokens cannot be produced by the scanner, the 'APPLY' token. This is called a pseudo-token and is solely defined for its associativity and precedence information.)

Syntax Rules

Line 19 marks the start of the syntax rules themselves. For each nonterminal, a syntax rule (introduced by the keyword syn) must be given. Nonterminals may be named by atoms or variables.

Start Symbols

An atom means that this nonterminal is a start symbol. Several start symbols may be defined - the one to reduce to is selected when a concrete parse is initiated.

Formal Parameter Lists

Following the nonterminal is its parameter list, consisting of zero or more variables in parentheses. The start symbol program has two parameters: a list of definitions and a list of terms. These are both output parameters, as is indicated by the commentary ?.

EBNF Phrases

The body of each syntax rule is an EBNF phrase (EBNF is an abbreviation of Extended Backus-Naur-Formalism). As in Oz, we distinguish between statements and expressions: Some EBNF phrases carry values and may thus only stand at expression position, others don't and must be used at statement position.

The basic building blocks of EBNF expressions are grammar symbol applications, denoted by the name of a terminal or nonterminal followed by the actual parameter list in parentheses. An example of this is the Definition($) in line 20, which is an application of the nonterminal Definition with a single actual parameter. Since this is the nesting marker, the application is an expression (as opposed to a statement) with the value of the corresponding actual parameter as its value. This application is written inside the repetition symbols { ... }*, which means that the application is to be repeated 0 to n times. The repetition construct builds a list of its argument's values at each iteration, since it is used in expression position. This list is assigned to the formal parameter Definitions.

The next line, line 21, is similar: Here, a nonempty list (note the +) of Terms is expected, seperated by semicolons. The values computed by each Term are collected in a list, which is assigned to the formal parameter Terms.

Local Variables

The next syntax rule introduces a new feature: local variables. All variables in pattern position in syntax rules are implicitly declared local. EBNF pattern positions are the left side of an assignment (such as in line 20) and the actual parameters of grammar symbol applications. If in any of these places a single non-escaped variable (i. e., written without !) is used, it is implicitly declared local to the EBNF construct it is used in. Such is the case for the variables I and T in line 24. The formal parameter variables assigned to in lines 20 and 21 had to be escaped to avoid their implicit (re-)declaration.

The syntax rule for Definition in line 23 has a single parameter. Since this is the nesting marker, an EBNF expression is expected as body of this rule. The value of a sequence of EBNF expressions is the value of the last expression (as in Oz, where the value of a sequential composition is the value of the composition's second argument).

Semantic Actions

The last EBNF expression in line 23 is the semantic action, introduced by the arrow =>. This action constructs an abstract syntax tree node (represented as a tuple).

Alternatives

Lines 26 to 32 show the rule for Term. This rule has several alternatives, separated by the choice operator []. These alternatives also imply the need for the given token precedences and associativities mentioned above: Not all inputs have a unique parse tree. If, for example, we wrote

lambda x.y z

this could be parsed as either

(lambda x.y) z

lambda x.(y z)

We want to enforce the second meaning (that is, the application has a higher precedence than the abstraction); furthermore, the application should be left-associative (i. e., x y z means (x y) z).

Resolving Conflicts

This is why the pseudo-token 'APPLY' was introduced. Each alternative may also have, like the tokens, a precedence and an associativity. If the alternative contains a terminal, than the values of the last terminal are used. Alternatively, a special precedence token may be specified via prec(terminal); then the values of this are used instead. Thus, the application Term Term is left-associative. Higher precedence values mean tighter binding of operators. Thus, the application (token 'APPLY' of precedence 2) has precedence over the abstraction (token '.' of precedence 1).

However, one anomaly remains because the application has no (visible) operator - to resolve conflicts, the precedence/associativity values of the lookahead token are compared to the values of the (potentially) applicable rules. So if the lookahead is one of the tokens with which a Term can begin, it is in fact an application we have to parse. This is why all these tokens are assigned the same precedence as the application. (For a more detailed description of how operator precedence information is used to resolve conflicts, consult the bison manual [DS95].

Epsilon Productions

The last nonterminal, Line in line 33, is actually only introduced for the semantic value it computes. The empty sequence of grammar symbols is denoted by skip.

\switch +gumpparseroutputsimplified +gumpparserverbose 
 
declare 
parser LambdaParser from GumpParser.'class' 
   meth error(VS) Scanner in 
      GumpParser.'class', getScanner(?Scanner)
      {System.showInfo 'line '#{Scanner getLineNumber($)}#': '#VS}
   end 
 
   token 
      'define' ';' '=' ')' 
      '.': leftAssoc(1)
      'APPLY': leftAssoc(2)
      'lambda': leftAssoc(2)
      '(': leftAssoc(2)
      'id': leftAssoc(2)
      'int': leftAssoc(2)
 
   syn program(?Definitions ?Terms)
      !Definitions={ Definition($) }* 
      !Terms={ Term($) // ';' }+ 
   end 
   syn Definition($)
      'define' 'id'(I) '=' Term(T) ';' => definition(I T)
   end 
   syn Term($)
      'lambda' 'id'(I) '.' Term(T)     => lambda(I T)
   [] Term(T1) Term(T2) prec('APPLY')  => apply(T1 T2)
   [] '(' Term(T) ')'                  => T
   [] 'id'(I) Line(L)                  => id(I L)
   [] 'int'(I)                         => int(I)
   end 
   syn Line($)
      skip => {GumpParser.'class', getScanner($) getLineNumber($)}
   end 
end

Program 3.1: The LambdaParser parser specification.

3.1.2 Invoking Gump

Parser specifications are processed in the same way scanner specifications are. First we prepare the Gump Parser Generator by feeding:

\switch +gump

Then the file to translate is simply fed into the compiler. Suppose you saved the example specification in the file LambdaParser.ozg; feed:

\insert LambdaParser.ozg

The extension .ozg indicating, as before, an Oz file with embedded Gump specifications.

Output Files

Two files are generated from the parser definition: LambdaParser.simplified contains a simplified version of the syntax rules where the EBNF constructs have been expanded to equivalent BNF forms (because the \switch +gumpparseroutputsimplified switch was set), whereas the file LambdaParser.output contains the output from the bison parse table generator (because the \switch +gumpparserverbose switch was set). These names are generated from the parser specification's name.

3.1.3 Using the Generated Parser

Program 3.2 shows an example Oz program that uses both the generated scanner from the last chapter and the generated parser from above.

Initialization

First, the scanner and parser classes are loaded. After instantiating and initializing the scanner, a parser object is created. This needs as initializer a single parameter, a scanner. This is, technically speaking, a unary procedure that understands the messages putToken and getToken described in Section 3.2.3.

Initiating a Parse

The most interesting message sent to the parser is the parse message. The first argument has to be a tuple. The label specifies the start symbol to use, the features correspond to the actual parameters of the start symbol. In this case, the actual parameter variables Definitions and Terms are bound to lists of definitions and terms, respectively. The second argument to the parse message is the result status. This is either unified with true if parsing was successful or with false otherwise.