gawk: Close Files And Pipes

 
 5.9 Closing Input and Output Redirections
 =========================================
 
 If the same file name or the same shell command is used with 'getline'
 more than once during the execution of an 'awk' program (See
 Getline), the file is opened (or the command is executed) the first
 time only.  At that time, the first record of input is read from that
 file or command.  The next time the same file or command is used with
 'getline', another record is read from it, and so on.
 
    Similarly, when a file or pipe is opened for output, 'awk' remembers
 the file name or command associated with it, and subsequent writes to
 the same file or command are appended to the previous writes.  The file
 or pipe stays open until 'awk' exits.
 
    This implies that special steps are necessary in order to read the
 same file again from the beginning, or to rerun a shell command (rather
 than reading more output from the same command).  The 'close()' function
 makes these things possible:
 
      close(FILENAME)
 
 or:
 
      close(COMMAND)
 
    The argument FILENAME or COMMAND can be any expression.  Its value
 must _exactly_ match the string that was used to open the file or start
 the command (spaces and other "irrelevant" characters included).  For
 example, if you open a pipe with this:
 
      "sort -r names" | getline foo
 
 then you must close it with this:
 
      close("sort -r names")
 
    Once this function call is executed, the next 'getline' from that
 file or command, or the next 'print' or 'printf' to that file or
 command, reopens the file or reruns the command.  Because the expression
 that you use to close a file or pipeline must exactly match the
 expression used to open the file or run the command, it is good practice
 to use a variable to store the file name or command.  The previous
 example becomes the following:
 
      sortcom = "sort -r names"
      sortcom | getline foo
      ...
      close(sortcom)
 
 This helps avoid hard-to-find typographical errors in your 'awk'
 programs.  Here are some of the reasons for closing an output file:
 
    * To write a file and read it back later on in the same 'awk'
      program.  Close the file after writing it, then begin reading it
      with 'getline'.
 
    * To write numerous files, successively, in the same 'awk' program.
      If the files aren't closed, eventually 'awk' may exceed a system
      limit on the number of open files in one process.  It is best to
      close each one when the program has finished writing it.
 
    * To make a command finish.  When output is redirected through a
      pipe, the command reading the pipe normally continues to try to
      read input as long as the pipe is open.  Often this means the
      command cannot really do its work until the pipe is closed.  For
      example, if output is redirected to the 'mail' program, the message
      is not actually sent until the pipe is closed.
 
    * To run the same program a second time, with the same arguments.
      This is not the same thing as giving more input to the first run!
 
      For example, suppose a program pipes output to the 'mail' program.
      If it outputs several lines redirected to this pipe without closing
      it, they make a single message of several lines.  By contrast, if
      the program closes the pipe after each line of output, then each
      line makes a separate message.
 
    If you use more files than the system allows you to have open, 'gawk'
 attempts to multiplex the available open files among your data files.
 'gawk''s ability to do this depends upon the facilities of your
 operating system, so it may not always work.  It is therefore both good
 practice and good portability advice to always use 'close()' on your
 files when you are done with them.  In fact, if you are using a lot of
 pipes, it is essential that you close commands when done.  For example,
 consider something like this:
 
      {
          ...
          command = ("grep " $1 " /some/file | my_prog -q " $3)
          while ((command | getline) > 0) {
              PROCESS OUTPUT OF command
          }
          # need close(command) here
      }
 
    This example creates a new pipeline based on data in _each_ record.
 Without the call to 'close()' indicated in the comment, 'awk' creates
 child processes to run the commands, until it eventually runs out of
 file descriptors for more pipelines.
 
    Even though each command has finished (as indicated by the
 end-of-file return status from 'getline'), the child process is not
 terminated;(1) more importantly, the file descriptor for the pipe is not
 closed and released until 'close()' is called or 'awk' exits.
 
    'close()' silently does nothing if given an argument that does not
 represent a file, pipe, or coprocess that was opened with a redirection.
 In such a case, it returns a negative value, indicating an error.  In
 addition, 'gawk' sets 'ERRNO' to a string indicating the error.
 
    Note also that 'close(FILENAME)' has no "magic" effects on the
 implicit loop that reads through the files named on the command line.
 It is, more likely, a close of a file that was never opened with a
 redirection, so 'awk' silently does nothing, except return a negative
 value.
 
    When using the '|&' operator to communicate with a coprocess, it is
 occasionally useful to be able to close one end of the two-way pipe
 without closing the other.  This is done by supplying a second argument
 to 'close()'.  As in any other call to 'close()', the first argument is
 the name of the command or special file used to start the coprocess.
 The second argument should be a string, with either of the values '"to"'
 or '"from"'.  Case does not matter.  As this is an advanced feature,
 discussion is delayed until SeeTwo-way I/O, which describes it in
 more detail and gives an example.
 
                     Using 'close()''s Return Value
 
    In many older versions of Unix 'awk', the 'close()' function is
 actually a statement.  (d.c.)  It is a syntax error to try and use the
 return value from 'close()':
 
      command = "..."
      command | getline info
      retval = close(command)  # syntax error in many Unix awks
 
    'gawk' treats 'close()' as a function.  The return value is -1 if the
 argument names something that was never opened with a redirection, or if
 there is a system problem closing the file or process.  In these cases,
 'gawk' sets the predefined variable 'ERRNO' to a string describing the
 problem.
 
    In 'gawk', starting with version 4.2, when closing a pipe or
 coprocess (input or output), the return value is the exit status of the
 command, as described in See(2)Otherwise(2)  Otherwise, it is the return value
 from the system's 'close()' or 'fclose()' C functions when closing input
 or output files, respectively.  This value is zero if the close
 succeeds, or -1 if it fails.
 
 Situation                            Return value from 'close()'
 --------------------------------------------------------------------------
 Normal exit of command               Command's exit status
 Death by signal of command           256 + number of murderous signal
 Death by signal of command with      512 + number of murderous signal
 core dump
 Some kind of error                   -1
 
 Table 5.1: Return values from 'close()' of a pipe
 
    The POSIX standard is very vague; it says that 'close()' returns zero
 on success and a nonzero value otherwise.  In general, different
 implementations vary in what they report when closing pipes; thus, the
 return value cannot be used portably.  (d.c.)  In POSIX mode (See
 Options), 'gawk' just returns zero when closing a pipe.
 
    ---------- Footnotes ----------
 
    (1) The technical terminology is rather morbid.  The finished child
 is called a "zombie," and cleaning up after it is referred to as
 "reaping."
 
    (2) Prior to version 4.2, the return value from closing a pipe or
 co-process was the full 16-bit exit value as defined by the 'wait()'
 system call.