ed: Regular expressions
5 Regular expressions
*********************
Regular expressions are patterns used in selecting text. For example,
the 'ed' command
g/STRING/
prints all lines containing STRING. Regular expressions are also used
by the 's' command for selecting old text to be replaced with new text.
In addition to a specifying string literals, regular expressions can
represent classes of strings. Strings thus represented are said to be
matched by the corresponding regular expression. If it is possible for a
regular expression to match several strings in a line, then the
left-most longest match is the one selected.
The following symbols are used in constructing regular expressions:
'C'
Any character C not listed below, including '{', '}', '(', ')',
'<' and '>', matches itself.
'\C'
Any backslash-escaped character C, other than '{', '}', '(', ')',
'<', '>', 'b', 'B', 'w', 'W', '+' and '?', matches itself.
'.'
Matches any single character.
'[CHAR-CLASS]'
Matches any single character in CHAR-CLASS. To include a ']' in
CHAR-CLASS, it must be the first character. A range of characters
may be specified by separating the end characters of the range
with a '-', e.g., 'a-z' specifies the lower case characters. The
following literal expressions can also be used in CHAR-CLASS to
specify sets of characters:
[:alnum:] [:cntrl:] [:lower:] [:space:]
[:alpha:] [:digit:] [:print:] [:upper:]
[:blank:] [:graph:] [:punct:] [:xdigit:]
If '-' appears as the first or last character of CHAR-CLASS, then
it matches itself. All other characters in CHAR-CLASS match
themselves.
Patterns in CHAR-CLASS of the form:
[.COL-ELM.]
[=COL-ELM=]
where COL-ELM is a "collating element" are interpreted according
to 'locale (5)'. See 'regex (3)' for an explanation of these
constructs.
'[^CHAR-CLASS]'
Matches any single character, other than newline, not in
CHAR-CLASS. CHAR-CLASS is defined as above.
'^'
If '^' is the first character of a regular expression, then it
anchors the regular expression to the beginning of a line.
Otherwise, it matches itself.
'$'
If '$' is the last character of a regular expression, it anchors
the regular expression to the end of a line. Otherwise, it matches
itself.
'\(RE\)'
Defines a (possibly null) subexpression RE. Subexpressions may be
nested. A subsequent backreference of the form '\N', where N is a
number in the range [1,9], expands to the text matched by the Nth
subexpression. For example, the regular expression '\(a.c\)\1'
matches the string 'abcabc', but not 'abcadc'. Subexpressions are
ordered relative to their left delimiter.
'*'
Matches the single character regular expression or subexpression
immediately preceding it zero or more times. If '*' is the first
character of a regular expression or subexpression, then it matches
itself. The '*' operator sometimes yields unexpected results. For
example, the regular expression 'b*' matches the beginning of the
string 'abbb', as opposed to the substring 'bbb', since a null
match is the only left-most match.
'\{N,M\}'
'\{N,\}'
'\{N\}'
Matches the single character regular expression or subexpression
immediately preceding it at least N and at most M times. If M is
omitted, then it matches at least N times. If the comma is also
omitted, then it matches exactly N times. If any of these forms
occurs first in a regular expression or subexpression, then it is
interpreted literally (i.e., the regular expression '\{2\}'
matches the string '{2}', and so on).
'\<'
'\>'
Anchors the single character regular expression or subexpression
immediately following it to the beginning (in the case of '\<') or
ending (in the case of '\>') of a "word", i.e., in ASCII, a
maximal string of alphanumeric characters, including the
underscore (_).
The following extended operators are preceded by a backslash '\' to
distinguish them from traditional 'ed' syntax.
'\`'
'\''
Unconditionally matches the beginning '\`' or ending '\'' of a
line.
'\?'
Optionally matches the single character regular expression or
subexpression immediately preceding it. For example, the regular
expression 'a[bd]\?c' matches the strings 'abc', 'adc' and 'ac'.
If '\?' occurs at the beginning of a regular expressions or
subexpression, then it matches a literal '?'.
'\+'
Matches the single character regular expression or subexpression
immediately preceding it one or more times. So the regular
expression 'a+' is shorthand for 'aa*'. If '\+' occurs at the
beginning of a regular expression or subexpression, then it
matches a literal '+'.
'\b'
Matches the beginning or ending (null string) of a word. Thus the
regular expression '\bhello\b' is equivalent to '\<hello\>'.
However, '\b\b' is a valid regular expression whereas '\<\>' is
not.
'\B'
Matches (a null string) inside a word.
'\w'
Matches any character in a word.
'\W'
Matches any character not in a word.