<< Prev | - Up - |
This section is the reference manual for the Gump Parser Generator. It is divided into three parts: First, the syntax of the Gump parser specification language is given in Section 3.2.1. Then, the options to parser generation supported by the Gump Parser Generator are detailed in Section 3.2.2. Finally, the runtime support for generated parsers, the mixin class GumpParser.'class'
, is presented in Section 3.2.3.
The meta-notation used for describing the syntax of the specification language is explained in Appendix A. (Note: This is not the language used in Gump to specify parsers. This is intentional.)
Gump specifications are allowed anywhere as a statement.
<statement> += <parser specification>
A parser specification is introduced by the keyword parser
, followed by the usual components of an Oz class. After these come additional parser-specific descriptors. Parser specifications must be named by variables, since the names of these variables will be used to generate auxiliary file names during parser generation.
<parser specification> ::= parser
<variable>{ <class descriptor> } { <method> } [ <token clause> ] { <parser descriptor> }+ end
The first extra parser descriptor is the token
clause. This defines the names of the terminals used in the specification as well as (optionally) their associativity and precedence. Several tokens are predefined: Atoms of length 1 are always considered to be tokens. Furthermore, token 'error'
stands for an erroneous token (sequence) and is used for error recovery, and token 'EOF'
signalizes the end of input and is always expected before reduction to the start symbol can take place.
<token clause> ::= token
{ <token declaration> }+
<token declaration> ::= <atom> [ " :
" <expression> ]
The optional expression following the colon in a token declaration must be a tuple with arity 1 and one of the labels leftAssoc
, rightAssoc
or nonAssoc
, depending on the desired associativity. The feature must always be a nonzero positive integer. Only the relative values matter; they are used to derive an ordering on the tokens. Larger values imply a greater binding strength of the operator. For the algorithm used to resolve conflicts using operator precedence information, refer to the bison manual [DS95].
Syntax rules are parser descriptors. They are composed of a head and a body. The head specifies the name of the defined nonterminal, where atoms are considered start symbols, as well as the formal parameters of the nonterminal. Only one syntax rule per nonterminal name is allowed.
<parser descriptor> ::= <syn clause>
<syn clause> ::= syn
<syn head> <syn alt>end
<syn head> ::= <atom> | <atom label> <syn formals> | <variable> | <variable label> <syn formals>
<syn formals> ::= " (
" { <syn formal> } ")
"
The body of a syntax rule is an EBNF phrase. It is distinguished between EBNF statements and EBNF expressions: EBNF expressions carry an additional value. In the following, it is always specified where EBNF statements or expressions are expected and which constructs yield a value.
Formal parameters are denoted by variables. At most one parameter may be the nesting marker; in this case the body of the syntax rule must be an EBNF expression. Its value is unified with the corresponding actual parameter upon application of the nonterminal.
<syn formal> ::= <variable> | " _
"| " $
"
An alternation specifies several sequences (called alternatives), separated by the choice operator []
. Either all sequences must be EBNF expressions or all sequences must be EBNF statements. If all alternatives are expressions, the alternation is an expression and yields at runtime the value of the selected sequence at runtime.
<syn alt> ::= <syn seq> { " []
" <syn seq> }
At the beginning of an sequence, local variables may be declared. These are visible only inside the sequence. The sequence itself is composed of n >= 0 EBNF factors, optionally followed by a semantic action. If an EBNF expression is expected at the place the sequence stands, then a semantic action must either be an expression or be omitted. In the latter case, the last EBNF phrase must be an EBNF expression, the value of the sequence then is the value of this EBNF expression. All other EBNF factors must be statements. If n = 0, then the sequence may be written as skip
.
<syn seq> ::= [ { <variable> }+ in
] { <syn factor> } [ <syn action> ]| skip
[ <syn action> ]
<syn action> ::= " =>
" ( <in statement> | <in expression> )
An EBNF factor is either an application or an assignment. An application is denoted by the name of either a terminal or a nonterminal, optionally followed by the actual parameters in parentheses. Terminals may either have a single (variable) parameter or no parameter at all; if a parameter is specified then it is unified with the actual token value at runtime. In the application of a nonterminal, the number of actual parameters must correspond to the number of formal parameters in the nonterminal's definition. Non-escaped variables as actual parameters are implicitly declared local to the innermost sequence that contains the application. At most one actual parameter may be the nesting marker. In this case, the application is an expression yielding the value of the corresponding actual parameter; else it is a statement.
<syn factor> ::= <syn application> | <syn assignment>
<syn application> ::= <atom> | <atom label> <syn actuals> | <variable> | <variable label> <syn actuals>
<syn actuals> ::= " (
" { <expression> } ")
"
Two grammar symbols are predefined which receive a special treatment:
'prec'(
A)
By inserting an application of prec
into a sequence, the latter is assigned an associativity and a precedence. These are taken from the token A. Sequences that contain no application of prec
inherit the values of the last token used in the sequence if there is one, and have no associated associativity and precedence otherwise.
'error'
The application of the predefined terminal 'error'
defines a restart point for error recovery. Consult the bison manual [DS95] for additional information.
An assignment equates a variable with the value of an EBNF expression. Unless the variable is escaped, it is implicitly declared local to the sequence the assignment appears in, else it must have been declared local within the current syntax rule (or be a formal parameter). An assignment is always a statement.
<syn assignment> ::= [ " !
" ] <variable> "=
" <syn factor>
This section and the next augment the syntax rules defined above by the concept of production templates. These provide for, e. g., the repetition constructs used in the example in Section 3.1.
The definition of a production template is another parser descriptor. Production templates are local to the parser specification they are defined in, and may be used only textually after their definition. (This is to avoid cyclic production template expansions.) Production templates may be redefined.
<parser descriptor> += <prod clause>
A production template definition consists of a head and a body. The body specifies the EBNF phrase the production template is to be replaced with when instantiated. The body may introduce optional local syntax rules which are always newly created when instantiated. These must be denoted by variables.
<prod clause> ::= prod
<prod head>[ <local rules> in
] <syn alt>end
<local rules> ::= { <syn clause> }+
The head of a production template provides - aside from the list of its formal parameters - the unique identification of the production template. This is composed of the following parts:
whether the production template is an expression or a statement when it is instantiated (expressions being denoted by V=
... or $=
...; in the head);
the optional identification name of the template, written before a colon;
the used parentheses, brackets or braces, if any;
the number of arguments, all being separated by //
; and
the used postfix operator, if any.
For example, you could define the commonly used notation [
X ]
as an EBNF option, or use {
X //
Y }+
for a separated list with at least one element. This construct could yield a value, such as a list of the Oz values produced by the expression X, which would be denoted by the production template Z={
X //
Y }+
. (Compare this to the template's instantiation in Program 3.1 in line 21.)
<prod head> ::= <template definition> | <variable> " =
" <template definition>| " $
" "=
" <template definition>
<template definition> ::= <prod formal list> | <atom> " :
" <prod formal list>
<prod formal list> ::= " (
" <prod formals> ")
" [ <prod postfix> ]| " [
" <prod formals> "]
" [ <prod postfix> ]| " {
" <prod formals> "}
" [ <prod postfix> ]| <variable> <prod postfix>
<prod formals> ::= <variable> { " //
" <variable> }
<prod postfix> ::= " +
" | "-
" | "*
" | "/
"
Production templates may be instantiated as EBNF factors.
<syn factor> += <template instantiation>
The instantiation of a production template is very similar to its definition, since it must specify the same unique identification. The difference is that instead of the formal parameter variables actual EBNF phrases are allowed.
<template instantiation> ::= <prod actual list> | <atom> " :
" <prod actual list>
<prod actual list> ::= " (
" <prod actuals> ")
" [ <prod postfix> ]| " [
" <prod actuals> "]
" [ <prod postfix> ]| " {
" <prod actuals> "}
" [ <prod postfix> ]| <syn application> <prod postfix>
<prod actuals> ::= <syn alt> { " //
" <syn alt> }
When a production template is expanded, name clashes must be avoided. This is why the expansion proceeds in several steps:
The local variables of the template are uniquely renamed, both in the body's EBNF phrase as well as in the local rules.
The local rules are uniquely renamed to avoid confusion with other rules in the parser specification.
The actual EBNF phrases are substituted for the parameter variables of the production template. The formal parameter variables may only occur as applications of grammar symbols and may either be applied with a single actual parameter or none at all. If the parameter is given, then the actual EBNF phrase must be an expression whose value is unified with the application's actual parameter.
The local rules are quantified over the local variables used in actual EBNF phrases of the instantiation by adding these as parameters.
The local rules are aded to the table of grammar symbols.
The template instantiation is replaced by the body's EBNF phrase from the production template's definition.
Table 3.1 shows the predefined production templates. For many operators several equivalent notations exist. All operators also have a form that yields a value: The grouping construct yields the value of its argument, as do options (or nil
if they are not chosen at runtime); the repetition constructs yield Oz lists of their first argument.
Grouping |
|
Option |
|
Mandatory Repetition | A
|
Optional Repetition | A
|
Mandatory Separated Repetition |
|
Optional Separated Repetition |
|
Due to the underlying LR(1) algorithm used, two different attribute types must be distinguished concerning parameters to nonterminals, namely synthesized and inherited attributes. This is in contrary to Oz, where input and output arguments need not be distinguished due to the concept of logical variables and unification. However, things are simplified by an algorithm determining the attribute types automatically.
Before this algorithm is explained in the following, we need to introduce a definition.
Definition
Let S be an expanded sequence (i. e., template instantiations and assignments have been expanded) with EBNF factors 0, ..., n. Let i be the index of the first EBNF factor (application or semantic action) in which a local Variable V (which is not a formal parameter) of the sequence occurs. Then we say that V is initialized in all EBNF factors with indices j, j >= i, and uninitialized in all others.
The following rules describe how attribute types are derived from their uses in applications of grammar symbols:
The (optional) parameter of a terminal always is a synthesized attribute (since the scanner always produces the token value).
Let the ith actual parameter of an application of a grammar symbol B be either an uninitialized local variable V or a nesting marker. Then the ith formal parameter of B is a synthesized attribute. Furthermore, V may not occur in any other actual parameter of the application.
Let the ith actual parameter of an application of a grammar symbol B be either an initialized local variable V or a complex Oz expression (i. e., neither a variable nor a nesting marker). Then the ith formal parameter of B is an inherited attribute. Furthermore, no uninitialized variable may occur in said actual parameter.
If a formal parameter of the syntax rule for a nonterminal A is used as actual parameter of an application of a nonterminal B, then the corresponding formal parameters of A and B are attributes of the same type, i. e., either both synthesized or both inherited.
Note that nothing can be concluded from the use of a formal parameter variable in a semantic action, since Oz does not distinguish between access of and assignment to a variable: both are realized by unification.
If contradicting attribute types are derived for any formal parameter variable of a nonterminal, then this is an error. If no attribute type can be derived for a formal parameter variable, then it is realized as a synthesized attribute.
Macro Directives
The following macro directive tells the bison parse table generator to expect a certain number of shift/reduce conflicts:
\gumpparserexpect
<int>
Switches
Table 3.2 summarizes the options that the Gump Parser Generator understands. They may be given as compiler switches before a parser specification.
Switch | Effect |
---|---|
create the | |
create the |
GumpParser.'class'
The mixin class GumpParser.'class'
, defined in the module GumpParser
, is required to make Gump parser specifications executable. It requires some features to be present in derived classes; these are automatically inserted by the Gump Parser Generator and contain the generated parse tables. They all begin with syn
...; thus it is a good idea not to define any such named class components in order to avoid conflicts with Gump internals. Likewise, you should not define any variables beginning with Syn
..., since such variable names are generated by the tool.
Abstract Members
Furthermore, the following method must be defined:
meth synExecuteAction(
+I
)
This method is invoked each time a reduction takes place. The parameter I
is the number of the rule reduced.
Provided Members
GumpParser.'class'
defines several attributes and methods that may be called by users of the generated parser or from inside semantic actions:
attr lookaheadSymbol
This contains the token class of the current lookahead symbol.
attr lookaheadValue
This contains the token value of the current lookahead symbol.
feat noLookahead
This is the value lookaheadSymbol
should be set to if you want to skip a token from inside a semantic action.
meth init(
+P
)
This initializes the internal structures of the GumpParser.'class'
and connects it to a scanner P
. P
must at least understand the messages putToken
and getToken
as described in Section 2.2.3.
meth parse(
+T
?B
)
This methods initates a parse. The label of tuple T
denotes the start symbol to use (which must be a declared nonterminal named by an atom); its features correspond to the parameters of the corresponding syntax rule. Values of inherited attributes are extracted from this tuple, values of synthesized attributes are unified with the corresponding features after the parse is finished (successfully). The parameter B
is unified with true
if the parse was successful and with false
otherwise.
meth accept()
By calling this method the parse is interrupted and success reported. (Note that the values of synthesized attributes of the start symbol given to parse
are not influenced by this.)
meth abort()
By calling this method the parse is interrupted and failure reported. (Note that the error
method is not called.)
meth raiseError()
This method places the parser in the same state as if a syntax error had been found in the input. Normal error recovery is attempted. The method error
is not called.
meth errorOK()
When a production with a restart point (token error
) is reduced, this method may be called to tell the parser that the error recovery process is finished and normal parsing may be resumed.
meth clearLookahead()
When a production with a restart point (token error
) is reduced, this method may be called to clear the lookahead token (if, for example, it was used to synchronize to the restart point and is not legal thereafter).
meth error(
+V
)
This method is always invoked when (during normal parsing) an error in the input is recognized. It is handed a diagnostic message in V
. This method may be overridden in derived classes.
meth getScanner(
?P
)
Returns the scanner object or procedure P
currently used as the token source.
<< Prev | - Up - |