eintr: Words and Symbols

 
 14.1 What to Count?
 ===================
 
 When we first start thinking about how to count the words in a function
 definition, the first question is (or ought to be) what are we going to
 count?  When we speak of “words” with respect to a Lisp function
 definition, we are actually speaking, in large part, of symbols.  For
 example, the following ‘multiply-by-seven’ function contains the five
 symbols ‘defun’, ‘multiply-by-seven’, ‘number’, ‘*’, and ‘7’.  In
 addition, in the documentation string, it contains the four words
 ‘Multiply’, ‘NUMBER’, ‘by’, and ‘seven’.  The symbol ‘number’ is
 repeated, so the definition contains a total of ten words and symbols.
 
      (defun multiply-by-seven (number)
        "Multiply NUMBER by seven."
        (* 7 number))
 
 However, if we mark the ‘multiply-by-seven’ definition with ‘C-M-h’
 (‘mark-defun’), and then call ‘count-words-example’ on it, we will find
 that ‘count-words-example’ claims the definition has eleven words, not
 ten!  Something is wrong!
 
    The problem is twofold: ‘count-words-example’ does not count the ‘*’
 as a word, and it counts the single symbol, ‘multiply-by-seven’, as
 containing three words.  The hyphens are treated as if they were
 interword spaces rather than intraword connectors: ‘multiply-by-seven’
 is counted as if it were written ‘multiply by seven’.
 
    The cause of this confusion is the regular expression search within
 the ‘count-words-example’ definition that moves point forward word by
 word.  In the canonical version of ‘count-words-example’, the regexp is:
 
      "\\w+\\W*"
 
 This regular expression is a pattern defining one or more word
 constituent characters possibly followed by one or more characters that
 are not word constituents.  What is meant by “word constituent
 characters” brings us to the issue of syntax, which is worth a section
 of its own.