wisent: Writing a lexer
3.1 What the parser must receive
================================
It is important to understand that the parser does not parse characters,
but lexical tokens, and does not know anything about characters in text
streams!
Reading input data to produce lexical tokens is performed by a lexer
(also called a scanner) in a lexical analysis step, before the syntax
analysis step performed by the parser. The parser automatically calls
the lexer when it needs the next token to parse.
A Wisent’s lexer is an Emacs Lisp function with no argument. It must
return a valid lexical token of the form:
‘(TOKEN-CLASS VALUE [START . END])’
TOKEN-CLASS
Is a category of lexical token identifying a terminal as specified
in the grammar (Wisent Grammar). It can be a symbol or a
character literal.
VALUE
Is the value of the lexical token. It can be of any valid Emacs
Lisp data type.
START
END
Are the optional beginning and ending positions of VALUE in the
input stream.
When there are no more tokens to read the lexer must return the token
‘(list wisent-eoi-term)’ to each request.
-- Variable: wisent-eoi-term
Predefined constant, End-Of-Input terminal symbol.
‘wisent-lex’ is an example of a lexer that reads lexical tokens
produced by a Semantic lexer, and translates them into lexical tokens
suitable to the Wisent parser. See also Wisent Lex.
To call the lexer in a semantic action use the function
‘wisent-lexer’. See also Actions goodies.