eintr: Whitespace Bug

 
 13.1.1 The Whitespace Bug in ‘count-words-example’
 --------------------------------------------------
 
 The ‘count-words-example’ command described in the preceding section has
 two bugs, or rather, one bug with two manifestations.  First, if you
 mark a region containing only whitespace in the middle of some text, the
 ‘count-words-example’ command tells you that the region contains one
 word!  Second, if you mark a region containing only whitespace at the
 end of the buffer or the accessible portion of a narrowed buffer, the
 command displays an error message that looks like this:
 
      Search failed: "\\w+\\W*"
 
    If you are reading this in Info in GNU Emacs, you can test for these
 bugs yourself.
 
    First, evaluate the function in the usual manner to install it.  Here
 is a copy of the definition.  Place your cursor after the closing
 parenthesis and type ‘C-x C-e’ to install it.
 
      ;; First version; has bugs!
      (defun count-words-example (beginning end)
        "Print number of words in the region.
      Words are defined as at least one word-constituent character followed
      by at least one character that is not a word-constituent.  The buffer's
      syntax table determines which characters these are."
        (interactive "r")
        (message "Counting words in region ... ")
 
      ;;; 1. Set up appropriate conditions.
        (save-excursion
          (goto-char beginning)
          (let ((count 0))
 
      ;;; 2. Run the while loop.
            (while (< (point) end)
              (re-search-forward "\\w+\\W*")
              (setq count (1+ count)))
 
      ;;; 3. Send a message to the user.
            (cond ((zerop count)
                   (message "The region does NOT have any words."))
                  ((= 1 count) (message "The region has 1 word."))
                  (t (message "The region has %d words." count))))))
 
    If you wish, you can also install this keybinding by evaluating it:
 
      (global-set-key "\C-c=" 'count-words-example)
 
    To conduct the first test, set mark and point to the beginning and
 end of the following line and then type ‘C-c =’ (or ‘M-x
 count-words-example’ if you have not bound ‘C-c =’):
 
          one   two  three
 
 Emacs will tell you, correctly, that the region has three words.
 
    Repeat the test, but place mark at the beginning of the line and
 place point just _before_ the word ‘one’.  Again type the command ‘C-c
 =’ (or ‘M-x count-words-example’).  Emacs should tell you that the
 region has no words, since it is composed only of the whitespace at the
 beginning of the line.  But instead Emacs tells you that the region has
 one word!
 
    For the third test, copy the sample line to the end of the
 ‘*scratch*’ buffer and then type several spaces at the end of the line.
 Place mark right after the word ‘three’ and point at the end of line.
 (The end of the line will be the end of the buffer.)  Type ‘C-c =’ (or
 ‘M-x count-words-example’) as you did before.  Again, Emacs should tell
 you that the region has no words, since it is composed only of the
 whitespace at the end of the line.  Instead, Emacs displays an error
 message saying ‘Search failed’.
 
    The two bugs stem from the same problem.
 
    Consider the first manifestation of the bug, in which the command
 tells you that the whitespace at the beginning of the line contains one
 word.  What happens is this: The ‘M-x count-words-example’ command moves
 point to the beginning of the region.  The ‘while’ tests whether the
 value of point is smaller than the value of ‘end’, which it is.
 Consequently, the regular expression search looks for and finds the
 first word.  It leaves point after the word.  ‘count’ is set to one.
 The ‘while’ loop repeats; but this time the value of point is larger
 than the value of ‘end’, the loop is exited; and the function displays a
 message saying the number of words in the region is one.  In brief, the
 regular expression search looks for and finds the word even though it is
 outside the marked region.
 
    In the second manifestation of the bug, the region is whitespace at
 the end of the buffer.  Emacs says ‘Search failed’.  What happens is
 that the true-or-false-test in the ‘while’ loop tests true, so the
 search expression is executed.  But since there are no more words in the
 buffer, the search fails.
 
    In both manifestations of the bug, the search extends or attempts to
 extend outside of the region.
 
    The solution is to limit the search to the region—this is a fairly
 simple action, but as you may have come to expect, it is not quite as
 simple as you might think.
 
    As we have seen, the ‘re-search-forward’ function takes a search
 pattern as its first argument.  But in addition to this first, mandatory
 argument, it accepts three optional arguments.  The optional second
 argument bounds the search.  The optional third argument, if ‘t’, causes
 the function to return ‘nil’ rather than signal an error if the search
 fails.  The optional fourth argument is a repeat count.  (In Emacs, you
 can see a function’s documentation by typing ‘C-h f’, the name of the
 function, and then <RET>.)
 
    In the ‘count-words-example’ definition, the value of the end of the
 region is held by the variable ‘end’ which is passed as an argument to
 the function.  Thus, we can add ‘end’ as an argument to the regular
 expression search expression:
 
      (re-search-forward "\\w+\\W*" end)
 
    However, if you make only this change to the ‘count-words-example’
 definition and then test the new version of the definition on a stretch
 of whitespace, you will receive an error message saying ‘Search failed’.
 
    What happens is this: the search is limited to the region, and fails
 as you expect because there are no word-constituent characters in the
 region.  Since it fails, we receive an error message.  But we do not
 want to receive an error message in this case; we want to receive the
 message “The region does NOT have any words.”
 
    The solution to this problem is to provide ‘re-search-forward’ with a
 third argument of ‘t’, which causes the function to return ‘nil’ rather
 than signal an error if the search fails.
 
    However, if you make this change and try it, you will see the message
 “Counting words in region ...  ” and ... you will keep on seeing that
 message ..., until you type ‘C-g’ (‘keyboard-quit’).
 
    Here is what happens: the search is limited to the region, as before,
 and it fails because there are no word-constituent characters in the
 region, as expected.  Consequently, the ‘re-search-forward’ expression
 returns ‘nil’.  It does nothing else.  In particular, it does not move
 point, which it does as a side effect if it finds the search target.
 After the ‘re-search-forward’ expression returns ‘nil’, the next
 expression in the ‘while’ loop is evaluated.  This expression increments
 the count.  Then the loop repeats.  The true-or-false-test tests true
 because the value of point is still less than the value of end, since
 the ‘re-search-forward’ expression did not move point.  ... and the
 cycle repeats ...
 
    The ‘count-words-example’ definition requires yet another
 modification, to cause the true-or-false-test of the ‘while’ loop to
 test false if the search fails.  Put another way, there are two
 conditions that must be satisfied in the true-or-false-test before the
 word count variable is incremented: point must still be within the
 region and the search expression must have found a word to count.
 
    Since both the first condition and the second condition must be true
 together, the two expressions, the region test and the search
 expression, can be joined with an ‘and’ special form and embedded in the
 ‘while’ loop as the true-or-false-test, like this:
 
      (and (< (point) end) (re-search-forward "\\w+\\W*" end t))
 
 (SeeThe ‘kill-new’ function kill-new function, for information about
 ‘and’.)
 
    The ‘re-search-forward’ expression returns ‘t’ if the search succeeds
 and as a side effect moves point.  Consequently, as words are found,
 point is moved through the region.  When the search expression fails to
 find another word, or when point reaches the end of the region, the
 true-or-false-test tests false, the ‘while’ loop exits, and the
 ‘count-words-example’ function displays one or other of its messages.
 
    After incorporating these final changes, the ‘count-words-example’
 works without bugs (or at least, without bugs that I have found!).  Here
 is what it looks like:
 
      ;;; Final version: while
      (defun count-words-example (beginning end)
        "Print number of words in the region."
        (interactive "r")
        (message "Counting words in region ... ")
 
      ;;; 1. Set up appropriate conditions.
        (save-excursion
          (let ((count 0))
            (goto-char beginning)
 
      ;;; 2. Run the while loop.
            (while (and (< (point) end)
                        (re-search-forward "\\w+\\W*" end t))
              (setq count (1+ count)))
 
      ;;; 3. Send a message to the user.
            (cond ((zerop count)
                   (message
                    "The region does NOT have any words."))
                  ((= 1 count)
                   (message
                    "The region has 1 word."))
                  (t
                   (message
                    "The region has %d words." count))))))