gawk: Leftmost Longest

 
 3.5 How Much Text Matches?
 ==========================
 
 Consider the following:
 
      echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
 
    This example uses the 'sub()' function to make a change to the input
 record.  ('sub()' replaces the first instance of any text matched by the
 first argument with the string provided as the second argument; See
 String Functions.)  Here, the regexp '/a+/' indicates "one or more 'a'
 characters," and the replacement text is '<A>'.
 
    The input contains four 'a' characters.  'awk' (and POSIX) regular
 expressions always match the leftmost, _longest_ sequence of input
 characters that can match.  Thus, all four 'a' characters are replaced
 with '<A>' in this example:
 
      $ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
      -| <A>bcd
 
    For simple match/no-match tests, this is not so important.  But when
 doing text matching and substitutions with the 'match()', 'sub()',
 'gsub()', and 'gensub()' functions, it is very important.  SeeString
 Functions, for more information on these functions.  Understanding
 this principle is also important for regexp-based record and field
 splitting (SeeRecords, and also SeeField Separators).