elisp: SMIE Grammar
22.7.1.3 Defining the Grammar of a Language
...........................................
The usual way to define the SMIE grammar of a language is by defining a
new global variable that holds the precedence table by giving a set of
BNF rules. For example, the grammar definition for a small Pascal-like
language could look like:
(require 'smie)
(defvar sample-smie-grammar
(smie-prec2->grammar
(smie-bnf->prec2
'((id)
(inst ("begin" insts "end")
("if" exp "then" inst "else" inst)
(id ":=" exp)
(exp))
(insts (insts ";" insts) (inst))
(exp (exp "+" exp)
(exp "*" exp)
("(" exps ")"))
(exps (exps "," exps) (exp)))
'((assoc ";"))
'((assoc ","))
'((assoc "+") (assoc "*")))))
A few things to note:
• The above grammar does not explicitly mention the syntax of
function calls: SMIE will automatically allow any sequence of
sexps, such as identifiers, balanced parentheses, or ‘begin ...
end’ blocks to appear anywhere anyway.
• The grammar category ‘id’ has no right hand side: this does not
mean that it can match only the empty string, since as mentioned
any sequence of sexps can appear anywhere anyway.
• Because non terminals cannot appear consecutively in the BNF
grammar, it is difficult to correctly handle tokens that act as
terminators, so the above grammar treats ‘";"’ as a statement
_separator_ instead, which SMIE can handle very well.
• Separators used in sequences (such as ‘","’ and ‘";"’ above) are
best defined with BNF rules such as ‘(foo (foo "separator" foo)
...)’ which generate precedence conflicts which are then resolved
by giving them an explicit ‘(assoc "separator")’.
• The ‘("(" exps ")")’ rule was not needed to pair up parens, since
SMIE will pair up any characters that are marked as having paren
syntax in the syntax table. What this rule does instead (together
with the definition of ‘exps’) is to make it clear that ‘","’
should not appear outside of parentheses.
• Rather than have a single _precs_ table to resolve conflicts, it is
preferable to have several tables, so as to let the BNF part of the
grammar specify relative precedences where possible.
• Unless there is a very good reason to prefer ‘left’ or ‘right’, it
is usually preferable to mark operators as associative, using
‘assoc’. For that reason ‘"+"’ and ‘"*"’ are defined above as
‘assoc’, although the language defines them formally as left
associative.