- Up - | Next >> |
This section presents the parser for the functional language Lambda for which a scanner was specified in the last chapter.
Program 3.1 shows the parser specification which will serve as an example to demonstrate the basic concepts of the Gump Parser Generator. This example will be examined in detail in the following.
Class Descriptors
Again, a Gump specification resembles a class definition introduced by a special keyword, parser
, and augmented by additional declarations. The usual class descriptors from
and meth
are also used in this specification in lines 2 to 8. The switches \switch +
gumpparseroutputsimplified
and \switch +
gumpparserverbose
simply cause additional information to be output at parser generation time; we will see this in the next section.
The error
method will be called upon detection of parse errors. Its parameter is a virtual string describing the error. We redefine this method (which has a default implementation in the super class GumpParser.'class'
) since we want to provide the user with the line number information we maintain in the scanner.
Token Declarations
In line 10 begin the token declarations. All token classes (which must be atoms) that the scanner can produce are listed after the token
keyword. Additionally, some tokens are assigned an associativity (here: leftAssoc
) and a precedence value (a nonzero positive integer) after a colon. These are used to resolve ambiguities in the syntax rules. The reason for the assignments in our example are explained below. (You may notice that one of the listed tokens cannot be produced by the scanner, the 'APPLY'
token. This is called a pseudo-token and is solely defined for its associativity and precedence information.)
Syntax Rules
Line 19 marks the start of the syntax rules themselves. For each nonterminal, a syntax rule (introduced by the keyword syn
) must be given. Nonterminals may be named by atoms or variables.
Start Symbols
An atom means that this nonterminal is a start symbol. Several start symbols may be defined - the one to reduce to is selected when a concrete parse is initiated.
Formal Parameter Lists
Following the nonterminal is its parameter list, consisting of zero or more variables in parentheses. The start symbol program
has two parameters: a list of definitions and a list of terms. These are both output parameters, as is indicated by the commentary ?
.
EBNF Phrases
The body of each syntax rule is an EBNF phrase (EBNF is an abbreviation of Extended Backus-Naur-Formalism). As in Oz, we distinguish between statements and expressions: Some EBNF phrases carry values and may thus only stand at expression position, others don't and must be used at statement position.
The basic building blocks of EBNF expressions are grammar symbol applications, denoted by the name of a terminal or nonterminal followed by the actual parameter list in parentheses. An example of this is the Definition($)
in line 20, which is an application of the nonterminal Definition
with a single actual parameter. Since this is the nesting marker, the application is an expression (as opposed to a statement) with the value of the corresponding actual parameter as its value. This application is written inside the repetition symbols {
... }*
, which means that the application is to be repeated 0 to n times. The repetition construct builds a list of its argument's values at each iteration, since it is used in expression position. This list is assigned to the formal parameter Definitions
.
The next line, line 21, is similar: Here, a nonempty list (note the +
) of Term
s is expected, seperated by semicolons. The values computed by each Term
are collected in a list, which is assigned to the formal parameter Terms
.
Local Variables
The next syntax rule introduces a new feature: local variables. All variables in pattern position in syntax rules are implicitly declared local. EBNF pattern positions are the left side of an assignment (such as in line 20) and the actual parameters of grammar symbol applications. If in any of these places a single non-escaped variable (i. e., written without !
) is used, it is implicitly declared local to the EBNF construct it is used in. Such is the case for the variables I
and T
in line 24. The formal parameter variables assigned to in lines 20 and 21 had to be escaped to avoid their implicit (re-)declaration.
The syntax rule for Definition
in line 23 has a single parameter. Since this is the nesting marker, an EBNF expression is expected as body of this rule. The value of a sequence of EBNF expressions is the value of the last expression (as in Oz, where the value of a sequential composition is the value of the composition's second argument).
Semantic Actions
The last EBNF expression in line 23 is the semantic action, introduced by the arrow =>
. This action constructs an abstract syntax tree node (represented as a tuple).
Alternatives
Lines 26 to 32 show the rule for Term
. This rule has several alternatives, separated by the choice operator []
. These alternatives also imply the need for the given token precedences and associativities mentioned above: Not all inputs have a unique parse tree. If, for example, we wrote
lambda x.y z
this could be parsed as either
(lambda x.y) z
or
lambda x.(y z)
We want to enforce the second meaning (that is, the application has a higher precedence than the abstraction); furthermore, the application should be left-associative (i. e., x y z
means (x y) z
).
Resolving Conflicts
This is why the pseudo-token 'APPLY'
was introduced. Each alternative may also have, like the tokens, a precedence and an associativity. If the alternative contains a terminal, than the values of the last terminal are used. Alternatively, a special precedence token may be specified via prec(
terminal)
; then the values of this are used instead. Thus, the application Term Term
is left-associative. Higher precedence values mean tighter binding of operators. Thus, the application (token 'APPLY'
of precedence 2) has precedence over the abstraction (token '.'
of precedence 1).
However, one anomaly remains because the application has no (visible) operator - to resolve conflicts, the precedence/associativity values of the lookahead token are compared to the values of the (potentially) applicable rules. So if the lookahead is one of the tokens with which a Term
can begin, it is in fact an application we have to parse. This is why all these tokens are assigned the same precedence as the application. (For a more detailed description of how operator precedence information is used to resolve conflicts, consult the bison manual [DS95].
Epsilon Productions
The last nonterminal, Line
in line 33, is actually only introduced for the semantic value it computes. The empty sequence of grammar symbols is denoted by skip
.
\switch +gumpparseroutputsimplified +gumpparserverbose
declare
parser LambdaParser from GumpParser.'class'
meth error(VS) Scanner in
GumpParser.'class', getScanner(?Scanner)
{System.showInfo 'line '#{Scanner getLineNumber($)}#': '#VS}
end
token
'define' ';' '=' ')'
'.': leftAssoc(1)
'APPLY': leftAssoc(2)
'lambda': leftAssoc(2)
'(': leftAssoc(2)
'id': leftAssoc(2)
'int': leftAssoc(2)
syn program(?Definitions ?Terms)
!Definitions={ Definition($) }*
!Terms={ Term($) // ';' }+
end
syn Definition($)
'define' 'id'(I) '=' Term(T) ';' => definition(I T)
end
syn Term($)
'lambda' 'id'(I) '.' Term(T) => lambda(I T)
[] Term(T1) Term(T2) prec('APPLY') => apply(T1 T2)
[] '(' Term(T) ')' => T
[] 'id'(I) Line(L) => id(I L)
[] 'int'(I) => int(I)
end
syn Line($)
skip => {GumpParser.'class', getScanner($) getLineNumber($)}
end
end
Parser specifications are processed in the same way scanner specifications are. First we prepare the Gump Parser Generator by feeding:
\switch +gump
Then the file to translate is simply fed into the compiler. Suppose you saved the example specification in the file LambdaParser.ozg
; feed:
\insert LambdaParser.ozg
The extension .ozg
indicating, as before, an Oz file with embedded Gump specifications.
Output Files
Two files are generated from the parser
definition: LambdaParser.simplified
contains a simplified version of the syntax rules where the EBNF constructs have been expanded to equivalent BNF forms (because the \switch +
gumpparseroutputsimplified
switch was set), whereas the file LambdaParser.output
contains the output from the bison parse table generator (because the \switch +
gumpparserverbose
switch was set). These names are generated from the parser specification's name.
Program 3.2 shows an example Oz program that uses both the generated scanner from the last chapter and the generated parser from above.
Initialization
First, the scanner and parser classes are loaded. After instantiating and initializing the scanner, a parser object is created. This needs as initializer a single parameter, a scanner. This is, technically speaking, a unary procedure that understands the messages putToken
and getToken
described in Section 3.2.3.
Initiating a Parse
The most interesting message sent to the parser is the parse
message. The first argument has to be a tuple. The label specifies the start symbol to use, the features correspond to the actual parameters of the start symbol. In this case, the actual parameter variables Definitions
and Terms
are bound to lists of definitions and terms, respectively. The second argument to the parse
message is the result status. This is either unified with true
if parsing was successful or with false
otherwise.
\switch +gump
\insert gump/examples/LambdaScanner.ozg
\insert gump/examples/LambdaParser.ozg
local
MyScanner = {New LambdaScanner init()}
MyParser = {New LambdaParser init(MyScanner)}
Definitions Terms Status
in
{MyScanner scanFile('Lambda.in')}
{MyParser parse(program(?Definitions ?Terms) ?Status)}
{MyScanner close()}
if Status then
{Browse Definitions}
{Browse Terms}
{System.showInfo 'accepted'}
else
{System.showInfo 'rejected'}
end
end
- Up - | Next >> |