elisp: Regexp Search
33.4 Regular Expression Searching
=================================
In GNU Emacs, you can search for the next match for a regular expression
(Syntax of Regexps) either incrementally or not. For
incremental search commands, see Regular Expression Search
(emacs)Regexp Search. Here we describe only the search functions useful
in programs. The principal one is ‘re-search-forward’.
These search functions convert the regular expression to multibyte if
the buffer is multibyte; they convert the regular expression to unibyte
if the buffer is unibyte. Text Representations.
-- Command: re-search-forward regexp &optional limit noerror count
This function searches forward in the current buffer for a string
of text that is matched by the regular expression REGEXP. The
function skips over any amount of text that is not matched by
REGEXP, and leaves point at the end of the first match found. It
returns the new value of point.
If LIMIT is non-‘nil’, it must be a position in the current buffer.
It specifies the upper bound to the search. No match extending
after that position is accepted. If LIMIT is omitted or ‘nil’, it
defaults to the end of the accessible portion of the buffer.
What ‘re-search-forward’ does when the search fails depends on the
value of NOERROR:
‘nil’
Signal a ‘search-failed’ error.
‘t’
Do nothing and return ‘nil’.
anything else
Move point to LIMIT (or the end of the accessible portion of
the buffer) and return ‘nil’.
The argument NOERROR only affects valid searches which fail to find
a match. Invalid arguments cause errors regardless of NOERROR.
If COUNT is a positive number N, the search is done N times; each
successive search starts at the end of the previous match. If all
these successive searches succeed, the function call succeeds,
moving point and returning its new value. Otherwise the function
call fails, with results depending on the value of NOERROR, as
described above. If COUNT is a negative number -N, the search is
done N times in the opposite (backward) direction.
In the following example, point is initially before the ‘T’.
Evaluating the search call moves point to the end of that line
(between the ‘t’ of ‘hat’ and the newline).
---------- Buffer: foo ----------
I read "★The cat in the hat
comes back" twice.
---------- Buffer: foo ----------
(re-search-forward "[a-z]+" nil t 5)
⇒ 27
---------- Buffer: foo ----------
I read "The cat in the hat★
comes back" twice.
---------- Buffer: foo ----------
-- Command: re-search-backward regexp &optional limit noerror count
This function searches backward in the current buffer for a string
of text that is matched by the regular expression REGEXP, leaving
point at the beginning of the first text found.
This function is analogous to ‘re-search-forward’, but they are not
simple mirror images. ‘re-search-forward’ finds the match whose
beginning is as close as possible to the starting point. If
‘re-search-backward’ were a perfect mirror image, it would find the
match whose end is as close as possible. However, in fact it finds
the match whose beginning is as close as possible (and yet ends
before the starting point). The reason for this is that matching a
regular expression at a given spot always works from beginning to
end, and starts at a specified beginning position.
A true mirror-image of ‘re-search-forward’ would require a special
feature for matching regular expressions from end to beginning.
It’s not worth the trouble of implementing that.
-- Function: string-match regexp string &optional start
This function returns the index of the start of the first match for
the regular expression REGEXP in STRING, or ‘nil’ if there is no
match. If START is non-‘nil’, the search starts at that index in
STRING.
For example,
(string-match
"quick" "The quick brown fox jumped quickly.")
⇒ 4
(string-match
"quick" "The quick brown fox jumped quickly." 8)
⇒ 27
The index of the first character of the string is 0, the index of
the second character is 1, and so on.
If this function finds a match, the index of the first character
beyond the match is available as ‘(match-end 0)’. Match
Data.
(string-match
"quick" "The quick brown fox jumped quickly." 8)
⇒ 27
(match-end 0)
⇒ 32
-- Function: string-match-p regexp string &optional start
This predicate function does what ‘string-match’ does, but it
avoids modifying the match data.
-- Function: looking-at regexp
This function determines whether the text in the current buffer
directly following point matches the regular expression REGEXP.
“Directly following” means precisely that: the search is “anchored”
and it can succeed only starting with the first character following
point. The result is ‘t’ if so, ‘nil’ otherwise.
This function does not move point, but it does update the match
data. Match Data. If you need to test for a match without
modifying the match data, use ‘looking-at-p’, described below.
In this example, point is located directly before the ‘T’. If it
were anywhere else, the result would be ‘nil’.
---------- Buffer: foo ----------
I read "★The cat in the hat
comes back" twice.
---------- Buffer: foo ----------
(looking-at "The cat in the hat$")
⇒ t
-- Function: looking-back regexp limit &optional greedy
This function returns ‘t’ if REGEXP matches the text immediately
before point (i.e., ending at point), and ‘nil’ otherwise.
Because regular expression matching works only going forward, this
is implemented by searching backwards from point for a match that
ends at point. That can be quite slow if it has to search a long
distance. You can bound the time required by specifying a
non-‘nil’ value for LIMIT, which says not to search before LIMIT.
In this case, the match that is found must begin at or after LIMIT.
Here’s an example:
---------- Buffer: foo ----------
I read "★The cat in the hat
comes back" twice.
---------- Buffer: foo ----------
(looking-back "read \"" 3)
⇒ t
(looking-back "read \"" 4)
⇒ nil
If GREEDY is non-‘nil’, this function extends the match backwards
as far as possible, stopping when a single additional previous
character cannot be part of a match for REGEXP. When the match is
extended, its starting position is allowed to occur before LIMIT.
As a general recommendation, try to avoid using ‘looking-back’
wherever possible, since it is slow. For this reason, there are no
plans to add a ‘looking-back-p’ function.
-- Function: looking-at-p regexp
This predicate function works like ‘looking-at’, but without
updating the match data.
-- Variable: search-spaces-regexp
If this variable is non-‘nil’, it should be a regular expression
that says how to search for whitespace. In that case, any group of
spaces in a regular expression being searched for stands for use of
this regular expression. However, spaces inside of constructs such
as ‘[...]’ and ‘*’, ‘+’, ‘?’ are not affected by
‘search-spaces-regexp’.
Since this variable affects all regular expression search and match
constructs, you should bind it temporarily for as small as possible
a part of the code.