gawk: I/O Functions

 
 9.1.4 Input/Output Functions
 ----------------------------
 
 The following functions relate to input/output (I/O). Optional
 parameters are enclosed in square brackets ([ ]):
 
 'close('FILENAME [',' HOW]')'
      Close the file FILENAME for input or output.  Alternatively, the
      argument may be a shell command that was used for creating a
      coprocess, or for redirecting to or from a pipe; then the coprocess
      or pipe is closed.  SeeClose Files And Pipes for more
      information.
 
      When closing a coprocess, it is occasionally useful to first close
      one end of the two-way pipe and then to close the other.  This is
      done by providing a second argument to 'close()'.  This second
      argument (HOW) should be one of the two string values '"to"' or
      '"from"', indicating which end of the pipe to close.  Case in the
      string does not matter.  SeeTwo-way I/O, which discusses this
      feature in more detail and gives an example.
 
      Note that the second argument to 'close()' is a 'gawk' extension;
      it is not available in compatibility mode (SeeOptions).
 
 'fflush('[FILENAME]')'
      Flush any buffered output associated with FILENAME, which is either
      a file opened for writing or a shell command for redirecting output
      to a pipe or coprocess.
 
      Many utility programs "buffer" their output (i.e., they save
      information to write to a disk file or the screen in memory until
      there is enough for it to be worthwhile to send the data to the
      output device).  This is often more efficient than writing every
      little bit of information as soon as it is ready.  However,
      sometimes it is necessary to force a program to "flush" its buffers
      (i.e., write the information to its destination, even if a buffer
      is not full).  This is the purpose of the 'fflush()'
      function--'gawk' also buffers its output, and the 'fflush()'
      function forces 'gawk' to flush its buffers.
 
      Brian Kernighan added 'fflush()' to his 'awk' in April 1992.  For
      two decades, it was a common extension.  In December 2012, it was
      accepted for inclusion into the POSIX standard.  See the Austin
      Group website (http://austingroupbugs.net/view.php?id=634).
 
      POSIX standardizes 'fflush()' as follows: if there is no argument,
      or if the argument is the null string ('""'), then 'awk' flushes
      the buffers for _all_ open output files and pipes.
 
           NOTE: Prior to version 4.0.2, 'gawk' would flush only the
           standard output if there was no argument, and flush all output
           files and pipes if the argument was the null string.  This was
           changed in order to be compatible with Brian Kernighan's
           'awk', in the hope that standardizing this feature in POSIX
           would then be easier (which indeed proved to be the case).
 
           With 'gawk', you can use 'fflush("/dev/stdout")' if you wish
           to flush only the standard output.
 
      'fflush()' returns zero if the buffer is successfully flushed;
      otherwise, it returns a nonzero value.  ('gawk' returns -1.)  In
      the case where all buffers are flushed, the return value is zero
      only if all buffers were flushed successfully.  Otherwise, it is
      -1, and 'gawk' warns about the problem FILENAME.
 
      'gawk' also issues a warning message if you attempt to flush a file
      or pipe that was opened for reading (such as with 'getline'), or if
      FILENAME is not an open file, pipe, or coprocess.  In such a case,
      'fflush()' returns -1, as well.
 
               Interactive Versus Noninteractive Buffering
 
    As a side point, buffering issues can be even more confusing if your
 program is "interactive" (i.e., communicating with a user sitting at a
 keyboard).(1)
 
    Interactive programs generally "line buffer" their output (i.e., they
 write out every line).  Noninteractive programs wait until they have a
 full buffer, which may be many lines of output.  Here is an example of
 the difference:
 
      $ awk '{ print $1 + $2 }'
      1 1
      -| 2
      2 3
      -| 5
      Ctrl-d
 
 Each line of output is printed immediately.  Compare that behavior with
 this example:
 
      $ awk '{ print $1 + $2 }' | cat
      1 1
      2 3
      Ctrl-d
      -| 2
      -| 5
 
 Here, no output is printed until after the 'Ctrl-d' is typed, because it
 is all buffered and sent down the pipe to 'cat' in one shot.
 
 'system(COMMAND)'
      Execute the operating system command COMMAND and then return to the
      'awk' program.  Return COMMAND's exit status (see further on).
 
      For example, if the following fragment of code is put in your 'awk'
      program:
 
           END {
                system("date | mail -s 'awk run done' root")
           }
 
      the system administrator is sent mail when the 'awk' program
      finishes processing input and begins its end-of-input processing.
 
      Note that redirecting 'print' or 'printf' into a pipe is often
      enough to accomplish your task.  If you need to run many commands,
      it is more efficient to simply print them down a pipeline to the
      shell:
 
           while (MORE STUFF TO DO)
               print COMMAND | "/bin/sh"
           close("/bin/sh")
 
      However, if your 'awk' program is interactive, 'system()' is useful
      for running large self-contained programs, such as a shell or an
      editor.  Some operating systems cannot implement the 'system()'
      function.  'system()' causes a fatal error if it is not supported.
 
           NOTE: When '--sandbox' is specified, the 'system()' function
           is disabled (SeeOptions).
 
      On POSIX systems, a command's exit status is a 16-bit number.  The
      exit value passed to the C 'exit()' function is held in the
      high-order eight bits.  The low-order bits indicate if the process
      was killed by a signal (bit 7) and if so, the guilty signal number
      (bits 0-6).
 
      Traditionally, 'awk''s 'system()' function has simply returned the
      exit status value divided by 256.  In the normal case this gives
      the exit status but in the case of death-by-signal it yields a
      fractional floating-point value.(2)  POSIX states that 'awk''s
      'system()' should return the full 16-bit value.
 
      'gawk' steers a middle ground.  The return values are summarized in
      SeeTable 9.5 table-system-return-values.
 
      Situation                     Return value from 'system()'
      --------------------------------------------------------------------------
      '--traditional'               C 'system()''s value divided by 256
      '--posix'                     C 'system()''s value
      Normal exit of command        Command's exit status
      Death by signal of command    256 + number of murderous signal
      Death by signal of command    512 + number of murderous signal
      with core dump
      Some kind of error            -1
 
      Table 9.5: Return values from 'system()'
 
              Controlling Output Buffering with 'system()'
 
    The 'fflush()' function provides explicit control over output
 buffering for individual files and pipes.  However, its use is not
 portable to many older 'awk' implementations.  An alternative method to
 flush output buffers is to call 'system()' with a null string as its
 argument:
 
      system("")   # flush output
 
 'gawk' treats this use of the 'system()' function as a special case and
 is smart enough not to run a shell (or other command interpreter) with
 the empty command.  Therefore, with 'gawk', this idiom is not only
 useful, it is also efficient.  Although this method should work with
 other 'awk' implementations, it does not necessarily avoid starting an
 unnecessary shell.  (Other implementations may only flush the buffer
 associated with the standard output and not necessarily all buffered
 output.)
 
    If you think about what a programmer expects, it makes sense that
 'system()' should flush any pending output.  The following program:
 
      BEGIN {
           print "first print"
           system("echo system echo")
           print "second print"
      }
 
 must print:
 
      first print
      system echo
      second print
 
 and not:
 
      system echo
      first print
      second print
 
    If 'awk' did not flush its buffers before calling 'system()', you
 would see the latter (undesirable) output.
 
    ---------- Footnotes ----------
 
    (1) A program is interactive if the standard output is connected to a
 terminal device.  On modern systems, this means your keyboard and
 screen.
 
    (2) In private correspondence, Dr. Kernighan has indicated to me that
 the way this was done was probably a mistake.