Info: (gawk) Simple Sed

Info Catalog
gawk: Extract Program
gawk: Miscellaneous Programs
gawk: Igawk Program
gawk: Simple Sed

 
 11.3.8 A Simple Stream Editor
 -----------------------------
 
 The 'sed' utility is a "stream editor", a program that reads a stream of
 data, makes changes to it, and passes it on.  It is often used to make
 global changes to a large file or to a stream of data generated by a
 pipeline of commands.  Although 'sed' is a complicated program in its
 own right, its most common use is to perform global substitutions in the
 middle of a pipeline:
 
      COMMAND1 < orig.data | sed 's/old/new/g' | COMMAND2 > result
 
    Here, 's/old/new/g' tells 'sed' to look for the regexp 'old' on each
 input line and globally replace it with the text 'new' (i.e., all the
 occurrences on a line).  This is similar to 'awk''s 'gsub()' function
 (String Functions).
 
    The following program, 'awksed.awk', accepts at least two
 command-line arguments: the pattern to look for and the text to replace
 it with.  Any additional arguments are treated as data file names to
 process.  If none are provided, the standard input is used:
 
      # awksed.awk --- do s/foo/bar/g using just print
      #    Thanks to Michael Brennan for the idea
 
      function usage()
      {
          print "usage: awksed pat repl [files...]" > "/dev/stderr"
          exit 1
      }
 
      BEGIN {
          # validate arguments
          if (ARGC < 3)
              usage()
 
          RS = ARGV[1]
          ORS = ARGV[2]
 
          # don't use arguments as files
          ARGV[1] = ARGV[2] = ""
      }
 
      # look ma, no hands!
      {
          if (RT == "")
              printf "%s", $0
          else
              print
      }
 
    The program relies on 'gawk''s ability to have 'RS' be a regexp, as
 well as on the setting of 'RT' to the actual text that terminates the
 record (Records).
 
    The idea is to have 'RS' be the pattern to look for.  'gawk'
 automatically sets '$0' to the text between matches of the pattern.
 This is text that we want to keep, unmodified.  Then, by setting 'ORS'
 to the replacement text, a simple 'print' statement outputs the text we
 want to keep, followed by the replacement text.
 
    There is one wrinkle to this scheme, which is what to do if the last
 record doesn't end with text that matches 'RS'.  Using a 'print'
 statement unconditionally prints the replacement text, which is not
 correct.  However, if the file did not end in text that matches 'RS',
 'RT' is set to the null string.  In this case, we can print '$0' using
 'printf' (Printf).
 
    The 'BEGIN' rule handles the setup, checking for the right number of
 arguments and calling 'usage()' if there is a problem.  Then it sets
 'RS' and 'ORS' from the command-line arguments and sets 'ARGV[1]' and
 'ARGV[2]' to the null string, so that they are not treated as file names
 (ARGC and ARGV).
 
    The 'usage()' function prints an error message and exits.  Finally,
 the single rule handles the printing scheme outlined earlier, using
 'print' or 'printf' as appropriate, depending upon the value of 'RT'.
Info Catalog
gawk: Extract Program
gawk: Miscellaneous Programs
gawk: Igawk Program