Info: (emacs) Regexp Backslash

Info Catalog
emacs: Regexps
emacs: Search
emacs: Regexp Example
emacs: Regexp Backslash

 
 15.7 Backslash in Regular Expressions
 =====================================
 
 For the most part, ‘\’ followed by any character matches only that
 character.  However, there are several exceptions: two-character
 sequences starting with ‘\’ that have special meanings.  The second
 character in the sequence is always an ordinary character when used on
 its own.  Here is a table of ‘\’ constructs.
 
 ‘\|’
      specifies an alternative.  Two regular expressions A and B with
      ‘\|’ in between form an expression that matches some text if either
      A matches it or B matches it.  It works by trying to match A, and
      if that fails, by trying to match B.
 
      Thus, ‘foo\|bar’ matches either ‘foo’ or ‘bar’ but no other string.
 
      ‘\|’ applies to the largest possible surrounding expressions.  Only
      a surrounding ‘\( ... \)’ grouping can limit the grouping power of
      ‘\|’.
 
      Full backtracking capability exists to handle multiple uses of
      ‘\|’.
 
 ‘\( ... \)’
      is a grouping construct that serves three purposes:
 
        1. To enclose a set of ‘\|’ alternatives for other operations.
           Thus, ‘\(foo\|bar\)x’ matches either ‘foox’ or ‘barx’.
 
        2. To enclose a complicated expression for the postfix operators
           ‘*’, ‘+’ and ‘?’ to operate on.  Thus, ‘ba\(na\)*’ matches
           ‘bananana’, etc., with any (zero or more) number of ‘na’
           strings.
 
        3. To record a matched substring for future reference.
 
      This last application is not a consequence of the idea of a
      parenthetical grouping; it is a separate feature that is assigned
      as a second meaning to the same ‘\( ... \)’ construct.  In practice
      there is usually no conflict between the two meanings; when there
      is a conflict, you can use a shy group.
 
 ‘\(?: ... \)’
      specifies a shy group that does not record the matched substring;
      you can’t refer back to it with ‘\D’ (see below).  This is useful
      in mechanically combining regular expressions, so that you can add
      groups for syntactic purposes without interfering with the
      numbering of the groups that are meant to be referred to.
 
 ‘\D’
      matches the same text that matched the Dth occurrence of a ‘\( ...
      \)’ construct.  This is called a “back reference”.
 
      After the end of a ‘\( ... \)’ construct, the matcher remembers the
      beginning and end of the text matched by that construct.  Then,
      later on in the regular expression, you can use ‘\’ followed by the
      digit D to mean “match the same text matched the Dth time by the
      ‘\( ... \)’ construct”.
 
      The strings matching the first nine ‘\( ... \)’ constructs
      appearing in a regular expression are assigned numbers 1 through 9
      in the order that the open-parentheses appear in the regular
      expression.  So you can use ‘\1’ through ‘\9’ to refer to the text
      matched by the corresponding ‘\( ... \)’ constructs.
 
      For example, ‘\(.*\)\1’ matches any newline-free string that is
      composed of two identical halves.  The ‘\(.*\)’ matches the first
      half, which may be anything, but the ‘\1’ that follows must match
      the same exact text.
 
      If a particular ‘\( ... \)’ construct matches more than once (which
      can easily happen if it is followed by ‘*’), only the last match is
      recorded.
 
 ‘\`’
      matches the empty string, but only at the beginning of the string
      or buffer (or its accessible portion) being matched against.
 
 ‘\'’
      matches the empty string, but only at the end of the string or
      buffer (or its accessible portion) being matched against.
 
 ‘\=’
      matches the empty string, but only at point.
 
 ‘\b’
      matches the empty string, but only at the beginning or end of a
      word.  Thus, ‘\bfoo\b’ matches any occurrence of ‘foo’ as a
      separate word.  ‘\bballs?\b’ matches ‘ball’ or ‘balls’ as a
      separate word.
 
      ‘\b’ matches at the beginning or end of the buffer regardless of
      what text appears next to it.
 
 ‘\B’
      matches the empty string, but _not_ at the beginning or end of a
      word.
 
 ‘\<’
      matches the empty string, but only at the beginning of a word.
      ‘\<’ matches at the beginning of the buffer only if a
      word-constituent character follows.
 
 ‘\>’
      matches the empty string, but only at the end of a word.  ‘\>’
      matches at the end of the buffer only if the contents end with a
      word-constituent character.
 
 ‘\w’
      matches any word-constituent character.  The syntax table
      determines which characters these are.  Syntax Tables
      (elisp)Syntax Tables.
 
 ‘\W’
      matches any character that is not a word-constituent.
 
 ‘\_<’
      matches the empty string, but only at the beginning of a symbol.  A
      symbol is a sequence of one or more symbol-constituent characters.
      A symbol-constituent character is a character whose syntax is
      either ‘w’ or ‘_’.  ‘\_<’ matches at the beginning of the buffer
      only if a symbol-constituent character follows.
 
 ‘\_>’
      matches the empty string, but only at the end of a symbol.  ‘\_>’
      matches at the end of the buffer only if the contents end with a
      symbol-constituent character.
 
 ‘\sC’
      matches any character whose syntax is C.  Here C is a character
      that designates a particular syntax class: thus, ‘w’ for word
      constituent, ‘-’ or ‘ ’ for whitespace, ‘.’ for ordinary
      punctuation, etc.  Syntax Tables (elisp)Syntax Tables.
 
 ‘\SC’
      matches any character whose syntax is not C.
 
 ‘\cC’
      matches any character that belongs to the category C.  For example,
      ‘\cc’ matches Chinese characters, ‘\cg’ matches Greek characters,
      etc.  For the description of the known categories, type ‘M-x
      describe-categories <RET>’.
 
 ‘\CC’
      matches any character that does _not_ belong to category C.
 
    The constructs that pertain to words and syntax are controlled by the
 setting of the syntax table.  Syntax Tables (elisp)Syntax Tables.
Info Catalog
emacs: Regexps
emacs: Search
emacs: Regexp Example