elisp: SMIE Tricks

 
 22.7.1.5 Living With a Weak Parser
 ..................................
 
 The parsing technique used by SMIE does not allow tokens to behave
 differently in different contexts.  For most programming languages, this
 manifests itself by precedence conflicts when converting the BNF
 grammar.
 
    Sometimes, those conflicts can be worked around by expressing the
 grammar slightly differently.  For example, for Modula-2 it might seem
 natural to have a BNF grammar that looks like this:
 
        ...
        (inst ("IF" exp "THEN" insts "ELSE" insts "END")
              ("CASE" exp "OF" cases "END")
              ...)
        (cases (cases "|" cases)
               (caselabel ":" insts)
               ("ELSE" insts))
        ...
 
    But this will create conflicts for ‘"ELSE"’: on the one hand, the IF
 rule implies (among many other things) that ‘"ELSE" = "END"’; but on the
 other hand, since ‘"ELSE"’ appears within ‘cases’, which appears left of
 ‘"END"’, we also have ‘"ELSE" > "END"’.  We can solve the conflict
 either by using:
        ...
        (inst ("IF" exp "THEN" insts "ELSE" insts "END")
              ("CASE" exp "OF" cases "END")
              ("CASE" exp "OF" cases "ELSE" insts "END")
              ...)
        (cases (cases "|" cases) (caselabel ":" insts))
        ...
    or
        ...
        (inst ("IF" exp "THEN" else "END")
              ("CASE" exp "OF" cases "END")
              ...)
        (else (insts "ELSE" insts))
        (cases (cases "|" cases) (caselabel ":" insts) (else))
        ...
 
    Reworking the grammar to try and solve conflicts has its downsides,
 tho, because SMIE assumes that the grammar reflects the logical
 structure of the code, so it is preferable to keep the BNF closer to the
 intended abstract syntax tree.
 
    Other times, after careful consideration you may conclude that those
 conflicts are not serious and simply resolve them via the RESOLVERS
 argument of ‘smie-bnf->prec2’.  Usually this is because the grammar is
 simply ambiguous: the conflict does not affect the set of programs
 described by the grammar, but only the way those programs are parsed.
 This is typically the case for separators and associative infix
 operators, where you want to add a resolver like ‘'((assoc "|"))’.
 Another case where this can happen is for the classic _dangling else_
 problem, where you will use ‘'((assoc "else" "then"))’.  It can also
 happen for cases where the conflict is real and cannot really be
 resolved, but it is unlikely to pose a problem in practice.
 
    Finally, in many cases some conflicts will remain despite all efforts
 to restructure the grammar.  Do not despair: while the parser cannot be
 made more clever, you can make the lexer as smart as you want.  So, the
 solution is then to look at the tokens involved in the conflict and to
 split one of those tokens into 2 (or more) different tokens.  E.g., if
 the grammar needs to distinguish between two incompatible uses of the
 token ‘"begin"’, make the lexer return different tokens (say
 ‘"begin-fun"’ and ‘"begin-plain"’) depending on which kind of ‘"begin"’
 it finds.  This pushes the work of distinguishing the different cases to
 the lexer, which will thus have to look at the surrounding text to find
 ad-hoc clues.