elisp: Regexp Example

 
 33.3.2 Complex Regexp Example
 -----------------------------
 
 Here is a complicated regexp which was formerly used by Emacs to
 recognize the end of a sentence together with any whitespace that
 follows.  (Nowadays Emacs uses a similar but more complex default regexp
 constructed by the function ‘sentence-end’.  SeeStandard Regexps.)
 
    Below, we show first the regexp as a string in Lisp syntax (to
 distinguish spaces from tab characters), and then the result of
 evaluating it.  The string constant begins and ends with a double-quote.
 ‘\"’ stands for a double-quote as part of the string, ‘\\’ for a
 backslash as part of the string, ‘\t’ for a tab and ‘\n’ for a newline.
 
      "[.?!][]\"')}]*\\($\\| $\\|\t\\|  \\)[ \t\n]*"
           ⇒ "[.?!][]\"')}]*\\($\\| $\\|  \\|  \\)[
      ]*"
 
 In the output, tab and newline appear as themselves.
 
    This regular expression contains four parts in succession and can be
 deciphered as follows:
 
 ‘[.?!]’
      The first part of the pattern is a character alternative that
      matches any one of three characters: period, question mark, and
      exclamation mark.  The match must begin with one of these three
      characters.  (This is one point where the new default regexp used
      by Emacs differs from the old.  The new value also allows some
      non-ASCII characters that end a sentence without any following
      whitespace.)
 
 ‘[]\"')}]*’
      The second part of the pattern matches any closing braces and
      quotation marks, zero or more of them, that may follow the period,
      question mark or exclamation mark.  The ‘\"’ is Lisp syntax for a
      double-quote in a string.  The ‘*’ at the end indicates that the
      immediately preceding regular expression (a character alternative,
      in this case) may be repeated zero or more times.
 
 ‘\\($\\| $\\|\t\\|  \\)’
      The third part of the pattern matches the whitespace that follows
      the end of a sentence: the end of a line (optionally with a space),
      or a tab, or two spaces.  The double backslashes mark the
      parentheses and vertical bars as regular expression syntax; the
      parentheses delimit a group and the vertical bars separate
      alternatives.  The dollar sign is used to match the end of a line.
 
 ‘[ \t\n]*’
      Finally, the last part of the pattern matches any additional
      whitespace beyond the minimum needed to end a sentence.