eintr: Design count-words-example

 
 Designing ‘count-words-example’
 -------------------------------
 
 First, we will implement the word count command with a ‘while’ loop,
 then with recursion.  The command will, of course, be interactive.
 
    The template for an interactive function definition is, as always:
 
      (defun NAME-OF-FUNCTION (ARGUMENT-LIST)
        "DOCUMENTATION..."
        (INTERACTIVE-EXPRESSION...)
        BODY...)
 
    What we need to do is fill in the slots.
 
    The name of the function should be self-explanatory and similar to
 the existing ‘count-lines-region’ name.  This makes the name easier to
 remember.  ‘count-words-region’ is the obvious choice.  Since that name
 is now used for the standard Emacs command to count words, we will name
 our implementation ‘count-words-example’.
 
    The function counts words within a region.  This means that the
 argument list must contain symbols that are bound to the two positions,
 the beginning and end of the region.  These two positions can be called
 ‘beginning’ and ‘end’ respectively.  The first line of the documentation
 should be a single sentence, since that is all that is printed as
 documentation by a command such as ‘apropos’.  The interactive
 expression will be of the form ‘(interactive "r")’, since that will
 cause Emacs to pass the beginning and end of the region to the
 function’s argument list.  All this is routine.
 
    The body of the function needs to be written to do three tasks:
 first, to set up conditions under which the ‘while’ loop can count
 words, second, to run the ‘while’ loop, and third, to send a message to
 the user.
 
    When a user calls ‘count-words-example’, point may be at the
 beginning or the end of the region.  However, the counting process must
 start at the beginning of the region.  This means we will want to put
 point there if it is not already there.  Executing ‘(goto-char
 beginning)’ ensures this.  Of course, we will want to return point to
 its expected position when the function finishes its work.  For this
 reason, the body must be enclosed in a ‘save-excursion’ expression.
 
    The central part of the body of the function consists of a ‘while’
 loop in which one expression jumps point forward word by word, and
 another expression counts those jumps.  The true-or-false-test of the
 ‘while’ loop should test true so long as point should jump forward, and
 false when point is at the end of the region.
 
    We could use ‘(forward-word 1)’ as the expression for moving point
 forward word by word, but it is easier to see what Emacs identifies as a
 “word” if we use a regular expression search.
 
    A regular expression search that finds the pattern for which it is
 searching leaves point after the last character matched.  This means
 that a succession of successful word searches will move point forward
 word by word.
 
    As a practical matter, we want the regular expression search to jump
 over whitespace and punctuation between words as well as over the words
 themselves.  A regexp that refuses to jump over interword whitespace
 would never jump more than one word!  This means that the regexp should
 include the whitespace and punctuation that follows a word, if any, as
 well as the word itself.  (A word may end a buffer and not have any
 following whitespace or punctuation, so that part of the regexp must be
 optional.)
 
    Thus, what we want for the regexp is a pattern defining one or more
 word constituent characters followed, optionally, by one or more
 characters that are not word constituents.  The regular expression for
 this is:
 
      \w+\W*
 
 The buffer’s syntax table determines which characters are and are not
 word constituents.  For more information about syntax, SeeSyntax
 Tables (elisp)Syntax Tables.
 
    The search expression looks like this:
 
      (re-search-forward "\\w+\\W*")
 
 (Note that paired backslashes precede the ‘w’ and ‘W’.  A single
 backslash has special meaning to the Emacs Lisp interpreter.  It
 indicates that the following character is interpreted differently than
 usual.  For example, the two characters, ‘\n’, stand for ‘newline’,
 rather than for a backslash followed by ‘n’.  Two backslashes in a row
 stand for an ordinary, unspecial backslash, so Emacs Lisp interpreter
 ends of seeing a single backslash followed by a letter.  So it discovers
 the letter is special.)
 
    We need a counter to count how many words there are; this variable
 must first be set to 0 and then incremented each time Emacs goes around
 the ‘while’ loop.  The incrementing expression is simply:
 
      (setq count (1+ count))
 
    Finally, we want to tell the user how many words there are in the
 region.  The ‘message’ function is intended for presenting this kind of
 information to the user.  The message has to be phrased so that it reads
 properly regardless of how many words there are in the region: we don’t
 want to say that “there are 1 words in the region”.  The conflict
 between singular and plural is ungrammatical.  We can solve this problem
 by using a conditional expression that evaluates different messages
 depending on the number of words in the region.  There are three
 possibilities: no words in the region, one word in the region, and more
 than one word.  This means that the ‘cond’ special form is appropriate.
 
    All this leads to the following function definition:
 
      ;;; First version; has bugs!
      (defun count-words-example (beginning end)
        "Print number of words in the region.
      Words are defined as at least one word-constituent
      character followed by at least one character that
      is not a word-constituent.  The buffer's syntax
      table determines which characters these are."
        (interactive "r")
        (message "Counting words in region ... ")
 
      ;;; 1. Set up appropriate conditions.
        (save-excursion
          (goto-char beginning)
          (let ((count 0))
 
      ;;; 2. Run the while loop.
            (while (< (point) end)
              (re-search-forward "\\w+\\W*")
              (setq count (1+ count)))
 
      ;;; 3. Send a message to the user.
            (cond ((zerop count)
                   (message
                    "The region does NOT have any words."))
                  ((= 1 count)
                   (message
                    "The region has 1 word."))
                  (t
                   (message
                    "The region has %d words." count))))))
 
 As written, the function works, but not in all circumstances.