wisent: Understanding the automaton
2.4.2 Understanding the automaton
---------------------------------
This section (took from the manual of Bison 1.49) describes how to use
the verbose report printed by ‘wisent-compile-grammar’ to understand the
generated automaton, to tune or fix a grammar.
We will use the following example:
(let ((wisent-verbose-flag t)) ;; Print a verbose report!
(wisent-compile-grammar
'((NUM STR) ; %token NUM STR
((left ?+ ?-) ; %left '+' '-';
(left ?*)) ; %left '*'
(exp ; exp:
((exp ?+ exp)) ; exp '+' exp
((exp ?- exp)) ; | exp '-' exp
((exp ?* exp)) ; | exp '*' exp
((exp ?/ exp)) ; | exp '/' exp
((NUM)) ; | NUM
) ; ;
(useless ; useless:
((STR)) ; STR
) ; ;
)
'nil) ; no %start declarations
)
When evaluating the above expression, grammar compilation first
issues the following two clear messages:
Grammar contains 1 useless nonterminals and 1 useless rules
Grammar contains 7 shift/reduce conflicts
The ‘*wisent-log*’ buffer details things!
The first section reports conflicts that were solved using precedence
and/or associativity:
Conflict in state 7 between rule 1 and token '+' resolved as reduce.
Conflict in state 7 between rule 1 and token '-' resolved as reduce.
Conflict in state 7 between rule 1 and token '*' resolved as shift.
Conflict in state 8 between rule 2 and token '+' resolved as reduce.
Conflict in state 8 between rule 2 and token '-' resolved as reduce.
Conflict in state 8 between rule 2 and token '*' resolved as shift.
Conflict in state 9 between rule 3 and token '+' resolved as reduce.
Conflict in state 9 between rule 3 and token '-' resolved as reduce.
Conflict in state 9 between rule 3 and token '*' resolved as reduce.
The next section reports useless tokens, nonterminal and rules (note
that useless tokens might be used by the scanner):
Useless nonterminals:
useless
Terminals which are not used:
STR
Useless rules:
#6 useless: STR;
The next section lists states that still have conflicts:
State 7 contains 1 shift/reduce conflict.
State 8 contains 1 shift/reduce conflict.
State 9 contains 1 shift/reduce conflict.
State 10 contains 4 shift/reduce conflicts.
The next section reproduces the grammar used:
Grammar
Number, Rule
1 exp -> exp '+' exp
2 exp -> exp '-' exp
3 exp -> exp '*' exp
4 exp -> exp '/' exp
5 exp -> NUM
And reports the uses of the symbols:
Terminals, with rules where they appear
$EOI (-1)
error (1)
NUM (2) 5
STR (3) 6
'+' (4) 1
'-' (5) 2
'*' (6) 3
'/' (7) 4
Nonterminals, with rules where they appear
exp (8)
on left: 1 2 3 4 5, on right: 1 2 3 4
The report then details the automaton itself, describing each state
with it set of “items”, also known as “pointed rules”. Each item is a
production rule together with a point (marked by ‘.’) that the input
cursor.
state 0
NUM shift, and go to state 1
exp go to state 2
State 0 corresponds to being at the very beginning of the parsing, in
the initial rule, right before the start symbol (‘exp’). When the
parser returns to this state right after having reduced a rule that
produced an ‘exp’, it jumps to state 2. If there is no such transition
on a nonterminal symbol, and the lookahead is a ‘NUM’, then this token
is shifted on the parse stack, and the control flow jumps to state 1.
Any other lookahead triggers a parse error.
In the state 1...
state 1
exp -> NUM . (rule 5)
$default reduce using rule 5 (exp)
the rule 5, ‘exp: NUM;’, is completed. Whatever the lookahead
(‘$default’), the parser will reduce it. If it was coming from state 0,
then, after this reduction it will return to state 0, and will jump to
state 2 (‘exp: go to state 2’).
state 2
exp -> exp . '+' exp (rule 1)
exp -> exp . '-' exp (rule 2)
exp -> exp . '*' exp (rule 3)
exp -> exp . '/' exp (rule 4)
$EOI shift, and go to state 11
'+' shift, and go to state 3
'-' shift, and go to state 4
'*' shift, and go to state 5
'/' shift, and go to state 6
In state 2, the automaton can only shift a symbol. For instance,
because of the item ‘exp -> exp . '+' exp’, if the lookahead if ‘+’, it
will be shifted on the parse stack, and the automaton control will jump
to state 3, corresponding to the item ‘exp -> exp . '+' exp’:
state 3
exp -> exp '+' . exp (rule 1)
NUM shift, and go to state 1
exp go to state 7
Since there is no default action, any other token than those listed
above will trigger a parse error.
The interpretation of states 4 to 6 is straightforward:
state 4
exp -> exp '-' . exp (rule 2)
NUM shift, and go to state 1
exp go to state 8
state 5
exp -> exp '*' . exp (rule 3)
NUM shift, and go to state 1
exp go to state 9
state 6
exp -> exp '/' . exp (rule 4)
NUM shift, and go to state 1
exp go to state 10
As was announced in beginning of the report, ‘State 7 contains 1
shift/reduce conflict.’:
state 7
exp -> exp . '+' exp (rule 1)
exp -> exp '+' exp . (rule 1)
exp -> exp . '-' exp (rule 2)
exp -> exp . '*' exp (rule 3)
exp -> exp . '/' exp (rule 4)
'*' shift, and go to state 5
'/' shift, and go to state 6
'/' [reduce using rule 1 (exp)]
$default reduce using rule 1 (exp)
Indeed, there are two actions associated to the lookahead ‘/’: either
shifting (and going to state 6), or reducing rule 1. The conflict means
that either the grammar is ambiguous, or the parser lacks information to
make the right decision. Indeed the grammar is ambiguous, as, since we
did not specify the precedence of ‘/’, the sentence ‘NUM + NUM / NUM’
can be parsed as ‘NUM + (NUM / NUM)’, which corresponds to shifting ‘/’,
or as ‘(NUM + NUM) / NUM’, which corresponds to reducing rule 1.
Because in LALR(1) parsing a single decision can be made, Wisent
arbitrarily chose to disable the reduction, see Conflicts.
Discarded actions are reported in between square brackets.
Note that all the previous states had a single possible action:
either shifting the next token and going to the corresponding state, or
reducing a single rule. In the other cases, i.e., when shifting _and_
reducing is possible or when _several_ reductions are possible, the
lookahead is required to select the action. State 7 is one such state:
if the lookahead is ‘*’ or ‘/’ then the action is shifting, otherwise
the action is reducing rule 1. In other words, the first two items,
corresponding to rule 1, are not eligible when the lookahead is ‘*’,
since we specified that ‘*’ has higher precedence that ‘+’. More
generally, some items are eligible only with some set of possible
lookaheads.
States 8 to 10 are similar:
state 8
exp -> exp . '+' exp (rule 1)
exp -> exp . '-' exp (rule 2)
exp -> exp '-' exp . (rule 2)
exp -> exp . '*' exp (rule 3)
exp -> exp . '/' exp (rule 4)
'*' shift, and go to state 5
'/' shift, and go to state 6
'/' [reduce using rule 2 (exp)]
$default reduce using rule 2 (exp)
state 9
exp -> exp . '+' exp (rule 1)
exp -> exp . '-' exp (rule 2)
exp -> exp . '*' exp (rule 3)
exp -> exp '*' exp . (rule 3)
exp -> exp . '/' exp (rule 4)
'/' shift, and go to state 6
'/' [reduce using rule 3 (exp)]
$default reduce using rule 3 (exp)
state 10
exp -> exp . '+' exp (rule 1)
exp -> exp . '-' exp (rule 2)
exp -> exp . '*' exp (rule 3)
exp -> exp . '/' exp (rule 4)
exp -> exp '/' exp . (rule 4)
'+' shift, and go to state 3
'-' shift, and go to state 4
'*' shift, and go to state 5
'/' shift, and go to state 6
'+' [reduce using rule 4 (exp)]
'-' [reduce using rule 4 (exp)]
'*' [reduce using rule 4 (exp)]
'/' [reduce using rule 4 (exp)]
$default reduce using rule 4 (exp)
Observe that state 10 contains conflicts due to the lack of
precedence of ‘/’ wrt ‘+’, ‘-’, and ‘*’, but also because the
associativity of ‘/’ is not specified.
Finally, the state 11 (plus 12) is named the “final state”, or the
“accepting state”:
state 11
$EOI shift, and go to state 12
state 12
$default accept
The end of input is shifted ‘$EOI shift,’ and the parser exits
successfully (‘go to state 12’, that terminates).