Info: (wisent) Writing a lexer

wisent: Writing a lexer

 
 3.1 What the parser must receive
 ================================
 
 It is important to understand that the parser does not parse characters,
 but lexical tokens, and does not know anything about characters in text
 streams!
 
    Reading input data to produce lexical tokens is performed by a lexer
 (also called a scanner) in a lexical analysis step, before the syntax
 analysis step performed by the parser.  The parser automatically calls
 the lexer when it needs the next token to parse.
 
    A Wisent’s lexer is an Emacs Lisp function with no argument.  It must
 return a valid lexical token of the form:
 
    ‘(TOKEN-CLASS VALUE [START . END])’
 
 TOKEN-CLASS
      Is a category of lexical token identifying a terminal as specified
      in the grammar (Wisent Grammar).  It can be a symbol or a
      character literal.
 
 VALUE
      Is the value of the lexical token.  It can be of any valid Emacs
      Lisp data type.
 
 START
 END
      Are the optional beginning and ending positions of VALUE in the
      input stream.
 
    When there are no more tokens to read the lexer must return the token
 ‘(list wisent-eoi-term)’ to each request.
 
  -- Variable: wisent-eoi-term
      Predefined constant, End-Of-Input terminal symbol.
 
    ‘wisent-lex’ is an example of a lexer that reads lexical tokens
 produced by a Semantic lexer, and translates them into lexical tokens
 suitable to the Wisent parser.  See also Wisent Lex.
 
    To call the lexer in a semantic action use the function
 ‘wisent-lexer’.  See also Actions goodies.
Info Catalog
wisent: Wisent Parsing
wisent: Actions goodies