eintr: sentence-end

 
 12.1 The Regular Expression for ‘sentence-end’
 ==============================================
 
 The symbol ‘sentence-end’ is bound to the pattern that marks the end of
 a sentence.  What should this regular expression be?
 
    Clearly, a sentence may be ended by a period, a question mark, or an
 exclamation mark.  Indeed, in English, only clauses that end with one of
 those three characters should be considered the end of a sentence.  This
 means that the pattern should include the character set:
 
      [.?!]
 
    However, we do not want ‘forward-sentence’ merely to jump to a
 period, a question mark, or an exclamation mark, because such a
 character might be used in the middle of a sentence.  A period, for
 example, is used after abbreviations.  So other information is needed.
 
    According to convention, you type two spaces after every sentence,
 but only one space after a period, a question mark, or an exclamation
 mark in the body of a sentence.  So a period, a question mark, or an
 exclamation mark followed by two spaces is a good indicator of an end of
 sentence.  However, in a file, the two spaces may instead be a tab or
 the end of a line.  This means that the regular expression should
 include these three items as alternatives.
 
    This group of alternatives will look like this:
 
      \\($\\| \\|  \\)
             ^   ^^
            TAB  SPC
 
 Here, ‘$’ indicates the end of the line, and I have pointed out where
 the tab and two spaces are inserted in the expression.  Both are
 inserted by putting the actual characters into the expression.
 
    Two backslashes, ‘\\’, are required before the parentheses and
 vertical bars: the first backslash quotes the following backslash in
 Emacs; and the second indicates that the following character, the
 parenthesis or the vertical bar, is special.
 
    Also, a sentence may be followed by one or more carriage returns,
 like this:
 
      [
      ]*
 
 Like tabs and spaces, a carriage return is inserted into a regular
 expression by inserting it literally.  The asterisk indicates that the
 <RET> is repeated zero or more times.
 
    But a sentence end does not consist only of a period, a question mark
 or an exclamation mark followed by appropriate space: a closing
 quotation mark or a closing brace of some kind may precede the space.
 Indeed more than one such mark or brace may precede the space.  These
 require a expression that looks like this:
 
      []\"')}]*
 
    In this expression, the first ‘]’ is the first character in the
 expression; the second character is ‘"’, which is preceded by a ‘\’ to
 tell Emacs the ‘"’ is _not_ special.  The last three characters are ‘'’,
 ‘)’, and ‘}’.
 
    All this suggests what the regular expression pattern for matching
 the end of a sentence should be; and, indeed, if we evaluate
 ‘sentence-end’ we find that it returns the following value:
 
      sentence-end
           ⇒ "[.?!][]\"')}]*\\($\\|     \\|  \\)[
      ]*"
 
 (Well, not in GNU Emacs 22; that is because of an effort to make the
 process simpler and to handle more glyphs and languages.  When the value
 of ‘sentence-end’ is ‘nil’, then use the value defined by the function
 ‘sentence-end’.  (Here is a use of the difference between a value and a
 function in Emacs Lisp.)  The function returns a value constructed from
 the variables ‘sentence-end-base’, ‘sentence-end-double-space’,
 ‘sentence-end-without-period’, and ‘sentence-end-without-space’.  The
 critical variable is ‘sentence-end-base’; its global value is similar to
 the one described above but it also contains two additional quotation
 marks.  These have differing degrees of curliness.  The
 ‘sentence-end-without-period’ variable, when true, tells Emacs that a
 sentence may end without a period, such as text in Thai.)