gawk: Leftmost Longest
3.5 How Much Text Matches?
==========================
Consider the following:
echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
This example uses the 'sub()' function to make a change to the input
record. ('sub()' replaces the first instance of any text matched by the
first argument with the string provided as the second argument;
String Functions.) Here, the regexp '/a+/' indicates "one or more 'a'
characters," and the replacement text is '<A>'.
The input contains four 'a' characters. 'awk' (and POSIX) regular
expressions always match the leftmost, _longest_ sequence of input
characters that can match. Thus, all four 'a' characters are replaced
with '<A>' in this example:
$ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
-| <A>bcd
For simple match/no-match tests, this is not so important. But when
doing text matching and substitutions with the 'match()', 'sub()',
'gsub()', and 'gensub()' functions, it is very important. String
Functions, for more information on these functions. Understanding
this principle is also important for regexp-based record and field
splitting (Records, and also Field Separators).