wisent: Grammar format
2.1 Grammar format
==================
To be acceptable by Wisent a context-free grammar must respect a
particular format. That is, must be represented as an Emacs Lisp list
of the form:
‘(TERMINALS ASSOCS . NON-TERMINALS)’
TERMINALS
Is the list of terminal symbols used in the grammar.
ASSOCS
Specify the associativity of TERMINALS. It is ‘nil’ when there is
no associativity defined, or an alist of
‘(ASSOC-TYPE . ASSOC-VALUE)’ elements.
ASSOC-TYPE must be one of the ‘default-prec’, ‘nonassoc’, ‘left’ or
‘right’ symbols. When ASSOC-TYPE is ‘default-prec’, ASSOC-VALUE
must be ‘nil’ or ‘t’ (the default). Otherwise it is a list of
tokens which must have been previously declared in TERMINALS.
For details, see (bison)Contextual Precedence, in the Bison
manual.
NON-TERMINALS
Is the list of nonterminal definitions. Each definition has the
form:
‘(NONTERM . RULES)’
Where NONTERM is the nonterminal symbol defined and RULES the list
of rules that describe this nonterminal. Each rule is a list:
‘(COMPONENTS [PRECEDENCE] [ACTION])’
Where:
COMPONENTS
Is a list of various terminals and nonterminals that are put
together by this rule.
For example,
(exp ((exp ?+ exp)) ;; exp: exp '+' exp
) ;; ;
Says that two groupings of type ‘exp’, with a ‘+’ token in
between, can be combined into a larger grouping of type ‘exp’.
By convention, a nonterminal symbol should be in lower case,
such as ‘exp’, ‘stmt’ or ‘declaration’. Terminal symbols
should be upper case to distinguish them from nonterminals:
for example, ‘INTEGER’, ‘IDENTIFIER’, ‘IF’ or ‘RETURN’. A
terminal symbol that represents a particular keyword in the
language is conventionally the same as that keyword converted
to upper case. The terminal symbol ‘error’ is reserved for
error recovery.
Scattered among the components can be “middle-rule” actions.
Usually only ACTION is provided (action).
If COMPONENTS in a rule is ‘nil’, it means that the rule can
match the empty string. For example, here is how to define a
comma-separated sequence of zero or more ‘exp’ groupings:
(expseq (nil) ;; expseq: ;; empty
((expseq1)) ;; | expseq1
) ;; ;
(expseq1 ((exp)) ;; expseq1: exp
((expseq1 ?, exp)) ;; | expseq1 ',' exp
) ;; ;
PRECEDENCE
Assign the rule the precedence of the given terminal item,
overriding the precedence that would be deduced for it, that
is the one of the last terminal in it. Notice that only
terminals declared in ASSOCS have a precedence level. The
altered rule precedence then affects how conflicts involving
that rule are resolved.
PRECEDENCE is an optional vector of one terminal item.
Here is how PRECEDENCE solves the problem of unary minus.
First, declare a precedence for a fictitious terminal symbol
named ‘UMINUS’. There are no tokens of this type, but the
symbol serves to stand for its precedence:
...
((default-prec t) ;; This is the default
(left '+' '-')
(left '*')
(left UMINUS))
Now the precedence of ‘UMINUS’ can be used in specific rules:
(exp ... ;; exp: ...
((exp ?- exp)) ;; | exp '-' exp
... ;; ...
((?- exp) [UMINUS]) ;; | '-' exp %prec UMINUS
... ;; ...
) ;; ;
If you forget to append ‘[UMINUS]’ to the rule for unary
minus, Wisent silently assumes that minus has its usual
precedence. This kind of problem can be tricky to debug,
since one typically discovers the mistake only by testing the
code.
Using ‘(default-prec nil)’ declaration makes it easier to
discover this kind of problem systematically. It causes rules
that lack a PRECEDENCE modifier to have no precedence, even if
the last terminal symbol mentioned in their components has a
declared precedence.
If ‘(default-prec nil)’ is in effect, you must specify
PRECEDENCE for all rules that participate in precedence
conflict resolution. Then you will see any shift/reduce
conflict until you tell Wisent how to resolve it, either by
changing your grammar or by adding an explicit precedence.
This will probably add declarations to the grammar, but it
helps to protect against incorrect rule precedences.
The effect of ‘(default-prec nil)’ can be reversed by giving
‘(default-prec t)’, which is the default.
For more details, see (bison)Contextual Precedence, in
the Bison manual.
It is important to understand that ASSOCS declarations defines
associativity but also assign a precedence level to terminals.
All terminals declared in the same ‘left’, ‘right’ or
‘nonassoc’ association get the same precedence level. The
precedence level is increased at each new association.
On the other hand, PRECEDENCE explicitly assign the precedence
level of the given terminal to a rule.
ACTION
An action is an optional Emacs Lisp function call, like this:
‘(identity $1)’
The result of an action determines the semantic value of a
rule.
From an implementation standpoint, the function call will be
embedded in a lambda expression, and several useful local
variables will be defined:
‘$N’
Where N is a positive integer. Like in Bison, the value
of ‘$N’ is the semantic value of the Nth element of
COMPONENTS, starting from 1. It can be of any Lisp data
type.
‘$regionN’
Where N is a positive integer. For each ‘$N’ variable
defined there is a corresponding ‘$regionN’ variable.
Its value is a pair ‘(START-POS . END-POS)’ that
represent the start and end positions (in the lexical
input stream) of the ‘$N’ value. It can be ‘nil’ when
the component positions are not available, like for an
empty string component for example.
‘$region’
Its value is the leftmost and rightmost positions of
input data matched by all COMPONENTS in the rule. This
is a pair ‘(LEFTMOST-POS . RIGHTMOST-POS)’. It can be
‘nil’ when components positions are not available.
‘$nterm’
This variable is initialized with the nonterminal symbol
(NONTERM) the rule belongs to. It could be useful to
improve error reporting or debugging. It is also used to
automatically provide incremental re-parse entry points
for Semantic tags (Wisent Semantic).
‘$action’
The value of ‘$action’ is the symbolic name of the
current semantic action (Debugging actions).
When an action is not specified a default value is supplied,
it is ‘(identity $1)’. This means that the default semantic
value of a rule is the value of its first component. Excepted
for a rule matching the empty string, for which the default
action is to return ‘nil’.