wisent: Understanding the automaton

 
 2.4.2 Understanding the automaton
 ---------------------------------
 
 This section (took from the manual of Bison 1.49) describes how to use
 the verbose report printed by ‘wisent-compile-grammar’ to understand the
 generated automaton, to tune or fix a grammar.
 
    We will use the following example:
 
      (let ((wisent-verbose-flag t)) ;; Print a verbose report!
        (wisent-compile-grammar
         '((NUM STR)                          ; %token NUM STR
 
           ((left ?+ ?-)                      ; %left '+' '-';
            (left ?*))                        ; %left '*'
 
           (exp                               ; exp:
            ((exp ?+ exp))                    ;    exp '+' exp
            ((exp ?- exp))                    ;  | exp '-' exp
            ((exp ?* exp))                    ;  | exp '*' exp
            ((exp ?/ exp))                    ;  | exp '/' exp
            ((NUM))                           ;  | NUM
            )                                 ;  ;
 
           (useless                           ; useless:
            ((STR))                           ;    STR
            )                                 ;  ;
           )
         'nil)                                ; no %start declarations
        )
 
    When evaluating the above expression, grammar compilation first
 issues the following two clear messages:
 
      Grammar contains 1 useless nonterminals and 1 useless rules
      Grammar contains 7 shift/reduce conflicts
 
    The ‘*wisent-log*’ buffer details things!
 
    The first section reports conflicts that were solved using precedence
 and/or associativity:
 
      Conflict in state 7 between rule 1 and token '+' resolved as reduce.
      Conflict in state 7 between rule 1 and token '-' resolved as reduce.
      Conflict in state 7 between rule 1 and token '*' resolved as shift.
      Conflict in state 8 between rule 2 and token '+' resolved as reduce.
      Conflict in state 8 between rule 2 and token '-' resolved as reduce.
      Conflict in state 8 between rule 2 and token '*' resolved as shift.
      Conflict in state 9 between rule 3 and token '+' resolved as reduce.
      Conflict in state 9 between rule 3 and token '-' resolved as reduce.
      Conflict in state 9 between rule 3 and token '*' resolved as reduce.
 
    The next section reports useless tokens, nonterminal and rules (note
 that useless tokens might be used by the scanner):
 
      Useless nonterminals:
 
         useless
 
 
      Terminals which are not used:
 
         STR
 
 
      Useless rules:
 
      #6     useless: STR;
 
    The next section lists states that still have conflicts:
 
      State 7 contains 1 shift/reduce conflict.
      State 8 contains 1 shift/reduce conflict.
      State 9 contains 1 shift/reduce conflict.
      State 10 contains 4 shift/reduce conflicts.
 
    The next section reproduces the grammar used:
 
      Grammar
 
        Number, Rule
        1       exp -> exp '+' exp
        2       exp -> exp '-' exp
        3       exp -> exp '*' exp
        4       exp -> exp '/' exp
        5       exp -> NUM
 
    And reports the uses of the symbols:
 
      Terminals, with rules where they appear
 
      $EOI (-1)
      error (1)
      NUM (2) 5
      STR (3) 6
      '+' (4) 1
      '-' (5) 2
      '*' (6) 3
      '/' (7) 4
 
 
      Nonterminals, with rules where they appear
 
      exp (8)
          on left: 1 2 3 4 5, on right: 1 2 3 4
 
    The report then details the automaton itself, describing each state
 with it set of “items”, also known as “pointed rules”.  Each item is a
 production rule together with a point (marked by ‘.’) that the input
 cursor.
 
      state 0
 
          NUM shift, and go to state 1
 
          exp go to state 2
 
    State 0 corresponds to being at the very beginning of the parsing, in
 the initial rule, right before the start symbol (‘exp’).  When the
 parser returns to this state right after having reduced a rule that
 produced an ‘exp’, it jumps to state 2.  If there is no such transition
 on a nonterminal symbol, and the lookahead is a ‘NUM’, then this token
 is shifted on the parse stack, and the control flow jumps to state 1.
 Any other lookahead triggers a parse error.
 
    In the state 1...
 
      state 1
 
          exp  ->  NUM .   (rule 5)
 
          $default    reduce using rule 5 (exp)
 
    the rule 5, ‘exp: NUM;’, is completed.  Whatever the lookahead
 (‘$default’), the parser will reduce it.  If it was coming from state 0,
 then, after this reduction it will return to state 0, and will jump to
 state 2 (‘exp: go to state 2’).
 
      state 2
 
          exp  ->  exp . '+' exp   (rule 1)
          exp  ->  exp . '-' exp   (rule 2)
          exp  ->  exp . '*' exp   (rule 3)
          exp  ->  exp . '/' exp   (rule 4)
 
          $EOI        shift, and go to state 11
          '+' shift, and go to state 3
          '-' shift, and go to state 4
          '*' shift, and go to state 5
          '/' shift, and go to state 6
 
    In state 2, the automaton can only shift a symbol.  For instance,
 because of the item ‘exp -> exp . '+' exp’, if the lookahead if ‘+’, it
 will be shifted on the parse stack, and the automaton control will jump
 to state 3, corresponding to the item ‘exp -> exp . '+' exp’:
 
      state 3
 
          exp  ->  exp '+' . exp   (rule 1)
 
          NUM shift, and go to state 1
 
          exp go to state 7
 
    Since there is no default action, any other token than those listed
 above will trigger a parse error.
 
    The interpretation of states 4 to 6 is straightforward:
 
      state 4
 
          exp  ->  exp '-' . exp   (rule 2)
 
          NUM shift, and go to state 1
 
          exp go to state 8
 
 
 
      state 5
 
          exp  ->  exp '*' . exp   (rule 3)
 
          NUM shift, and go to state 1
 
          exp go to state 9
 
 
 
      state 6
 
          exp  ->  exp '/' . exp   (rule 4)
 
          NUM shift, and go to state 1
 
          exp go to state 10
 
    As was announced in beginning of the report, ‘State 7 contains 1
 shift/reduce conflict.’:
 
      state 7
 
          exp  ->  exp . '+' exp   (rule 1)
          exp  ->  exp '+' exp .   (rule 1)
          exp  ->  exp . '-' exp   (rule 2)
          exp  ->  exp . '*' exp   (rule 3)
          exp  ->  exp . '/' exp   (rule 4)
 
          '*' shift, and go to state 5
          '/' shift, and go to state 6
 
          '/' [reduce using rule 1 (exp)]
          $default    reduce using rule 1 (exp)
 
    Indeed, there are two actions associated to the lookahead ‘/’: either
 shifting (and going to state 6), or reducing rule 1.  The conflict means
 that either the grammar is ambiguous, or the parser lacks information to
 make the right decision.  Indeed the grammar is ambiguous, as, since we
 did not specify the precedence of ‘/’, the sentence ‘NUM + NUM / NUM’
 can be parsed as ‘NUM + (NUM / NUM)’, which corresponds to shifting ‘/’,
 or as ‘(NUM + NUM) / NUM’, which corresponds to reducing rule 1.
 
    Because in LALR(1) parsing a single decision can be made, Wisent
 arbitrarily chose to disable the reduction, see SeeConflicts.
 Discarded actions are reported in between square brackets.
 
    Note that all the previous states had a single possible action:
 either shifting the next token and going to the corresponding state, or
 reducing a single rule.  In the other cases, i.e., when shifting _and_
 reducing is possible or when _several_ reductions are possible, the
 lookahead is required to select the action.  State 7 is one such state:
 if the lookahead is ‘*’ or ‘/’ then the action is shifting, otherwise
 the action is reducing rule 1.  In other words, the first two items,
 corresponding to rule 1, are not eligible when the lookahead is ‘*’,
 since we specified that ‘*’ has higher precedence that ‘+’.  More
 generally, some items are eligible only with some set of possible
 lookaheads.
 
    States 8 to 10 are similar:
 
      state 8
 
          exp  ->  exp . '+' exp   (rule 1)
          exp  ->  exp . '-' exp   (rule 2)
          exp  ->  exp '-' exp .   (rule 2)
          exp  ->  exp . '*' exp   (rule 3)
          exp  ->  exp . '/' exp   (rule 4)
 
          '*' shift, and go to state 5
          '/' shift, and go to state 6
 
          '/' [reduce using rule 2 (exp)]
          $default    reduce using rule 2 (exp)
 
 
      state 9
 
          exp  ->  exp . '+' exp   (rule 1)
          exp  ->  exp . '-' exp   (rule 2)
          exp  ->  exp . '*' exp   (rule 3)
          exp  ->  exp '*' exp .   (rule 3)
          exp  ->  exp . '/' exp   (rule 4)
 
          '/' shift, and go to state 6
 
          '/' [reduce using rule 3 (exp)]
          $default    reduce using rule 3 (exp)
 
 
      state 10
 
          exp  ->  exp . '+' exp   (rule 1)
          exp  ->  exp . '-' exp   (rule 2)
          exp  ->  exp . '*' exp   (rule 3)
          exp  ->  exp . '/' exp   (rule 4)
          exp  ->  exp '/' exp .   (rule 4)
 
          '+' shift, and go to state 3
          '-' shift, and go to state 4
          '*' shift, and go to state 5
          '/' shift, and go to state 6
 
          '+' [reduce using rule 4 (exp)]
          '-' [reduce using rule 4 (exp)]
          '*' [reduce using rule 4 (exp)]
          '/' [reduce using rule 4 (exp)]
          $default    reduce using rule 4 (exp)
 
    Observe that state 10 contains conflicts due to the lack of
 precedence of ‘/’ wrt ‘+’, ‘-’, and ‘*’, but also because the
 associativity of ‘/’ is not specified.
 
    Finally, the state 11 (plus 12) is named the “final state”, or the
 “accepting state”:
 
      state 11
 
          $EOI        shift, and go to state 12
 
 
 
      state 12
 
          $default    accept
 
    The end of input is shifted ‘$EOI shift,’ and the parser exits
 successfully (‘go to state 12’, that terminates).