eintr: Words and Symbols
14.1 What to Count?
===================
When we first start thinking about how to count the words in a function
definition, the first question is (or ought to be) what are we going to
count? When we speak of “words” with respect to a Lisp function
definition, we are actually speaking, in large part, of symbols. For
example, the following ‘multiply-by-seven’ function contains the five
symbols ‘defun’, ‘multiply-by-seven’, ‘number’, ‘*’, and ‘7’. In
addition, in the documentation string, it contains the four words
‘Multiply’, ‘NUMBER’, ‘by’, and ‘seven’. The symbol ‘number’ is
repeated, so the definition contains a total of ten words and symbols.
(defun multiply-by-seven (number)
"Multiply NUMBER by seven."
(* 7 number))
However, if we mark the ‘multiply-by-seven’ definition with ‘C-M-h’
(‘mark-defun’), and then call ‘count-words-example’ on it, we will find
that ‘count-words-example’ claims the definition has eleven words, not
ten! Something is wrong!
The problem is twofold: ‘count-words-example’ does not count the ‘*’
as a word, and it counts the single symbol, ‘multiply-by-seven’, as
containing three words. The hyphens are treated as if they were
interword spaces rather than intraword connectors: ‘multiply-by-seven’
is counted as if it were written ‘multiply by seven’.
The cause of this confusion is the regular expression search within
the ‘count-words-example’ definition that moves point forward word by
word. In the canonical version of ‘count-words-example’, the regexp is:
"\\w+\\W*"
This regular expression is a pattern defining one or more word
constituent characters possibly followed by one or more characters that
are not word constituents. What is meant by “word constituent
characters” brings us to the issue of syntax, which is worth a section
of its own.