gawk: Auto-set

 
 7.5.2 Built-in Variables That Convey Information
 ------------------------------------------------
 
 The following is an alphabetical list of variables that 'awk' sets
 automatically on certain occasions in order to provide information to
 your program.
 
    The variables that are specific to 'gawk' are marked with a pound
 sign ('#').  These variables are 'gawk' extensions.  In other 'awk'
 implementations or if 'gawk' is in compatibility mode (SeeOptions),
 they are not special:
 
 'ARGC', 'ARGV'
      The command-line arguments available to 'awk' programs are stored
      in an array called 'ARGV'.  'ARGC' is the number of command-line
      arguments present.  SeeOther Arguments.  Unlike most 'awk'
      arrays, 'ARGV' is indexed from 0 to 'ARGC' - 1.  In the following
      example:
 
           $ awk 'BEGIN {
           >         for (i = 0; i < ARGC; i++)
           >             print ARGV[i]
           >      }' inventory-shipped mail-list
           -| awk
           -| inventory-shipped
           -| mail-list
 
      'ARGV[0]' contains 'awk', 'ARGV[1]' contains 'inventory-shipped',
      and 'ARGV[2]' contains 'mail-list'.  The value of 'ARGC' is three,
      one more than the index of the last element in 'ARGV', because the
      elements are numbered from zero.
 
      The names 'ARGC' and 'ARGV', as well as the convention of indexing
      the array from 0 to 'ARGC' - 1, are derived from the C language's
      method of accessing command-line arguments.
 
      The value of 'ARGV[0]' can vary from system to system.  Also, you
      should note that the program text is _not_ included in 'ARGV', nor
      are any of 'awk''s command-line options.  SeeARGC and ARGV for
      information about how 'awk' uses these variables.  (d.c.)
 
 'ARGIND #'
      The index in 'ARGV' of the current file being processed.  Every
      time 'gawk' opens a new data file for processing, it sets 'ARGIND'
      to the index in 'ARGV' of the file name.  When 'gawk' is processing
      the input files, 'FILENAME == ARGV[ARGIND]' is always true.
 
      This variable is useful in file processing; it allows you to tell
      how far along you are in the list of data files as well as to
      distinguish between successive instances of the same file name on
      the command line.
 
      While you can change the value of 'ARGIND' within your 'awk'
      program, 'gawk' automatically sets it to a new value when it opens
      the next file.
 
 'ENVIRON'
      An associative array containing the values of the environment.  The
      array indices are the environment variable names; the elements are
      the values of the particular environment variables.  For example,
      'ENVIRON["HOME"]' might be '/home/arnold'.
 
      For POSIX 'awk', changing this array does not affect the
      environment passed on to any programs that 'awk' may spawn via
      redirection or the 'system()' function.
 
      However, beginning with version 4.2, if not in POSIX compatibility
      mode, 'gawk' does update its own environment when 'ENVIRON' is
      changed, thus changing the environment seen by programs that it
      creates.  You should therefore be especially careful if you modify
      'ENVIRON["PATH"]', which is the search path for finding executable
      programs.
 
      This can also affect the running 'gawk' program, since some of the
      built-in functions may pay attention to certain environment
      variables.  The most notable instance of this is 'mktime()' (See
      Time Functions), which pays attention the value of the 'TZ'
      environment variable on many systems.
 
      Some operating systems may not have environment variables.  On such
      systems, the 'ENVIRON' array is empty (except for
      'ENVIRON["AWKPATH"]' and 'ENVIRON["AWKLIBPATH"]'; SeeAWKPATH
      Variable and SeeAWKLIBPATH Variable).
 
 'ERRNO #'
      If a system error occurs during a redirection for 'getline', during
      a read for 'getline', or during a 'close()' operation, then 'ERRNO'
      contains a string describing the error.
 
      In addition, 'gawk' clears 'ERRNO' before opening each command-line
      input file.  This enables checking if the file is readable inside a
      'BEGINFILE' pattern (SeeBEGINFILE/ENDFILE).
 
      Otherwise, 'ERRNO' works similarly to the C variable 'errno'.
      Except for the case just mentioned, 'gawk' _never_ clears it (sets
      it to zero or '""').  Thus, you should only expect its value to be
      meaningful when an I/O operation returns a failure value, such as
      'getline' returning -1.  You are, of course, free to clear it
      yourself before doing an I/O operation.
 
      If the value of 'ERRNO' corresponds to a system error in the C
      'errno' variable, then 'PROCINFO["errno"]' will be set to the value
      of 'errno'.  For non-system errors, 'PROCINFO["errno"]' will be
      zero.
 
 'FILENAME'
      The name of the current input file.  When no data files are listed
      on the command line, 'awk' reads from the standard input and
      'FILENAME' is set to '"-"'.  'FILENAME' changes each time a new
      file is read (SeeReading Files).  Inside a 'BEGIN' rule, the
      value of 'FILENAME' is '""', because there are no input files being
      processed yet.(1)  (d.c.)  Note, though, that using 'getline'
      (SeeGetline) inside a 'BEGIN' rule can give 'FILENAME' a
      value.
 
 'FNR'
      The current record number in the current file.  'awk' increments
      'FNR' each time it reads a new record (SeeRecords).  'awk'
      resets 'FNR' to zero each time it starts a new input file.
 
 'NF'
      The number of fields in the current input record.  'NF' is set each
      time a new record is read, when a new field is created, or when
      '$0' changes (SeeFields).
 
      Unlike most of the variables described in this node, assigning a
      value to 'NF' has the potential to affect 'awk''s internal
      workings.  In particular, assignments to 'NF' can be used to create
      fields in or remove fields from the current record.  SeeChanging
      Fields.
 
 'FUNCTAB #'
      An array whose indices and corresponding values are the names of
      all the built-in, user-defined, and extension functions in the
      program.
 
           NOTE: Attempting to use the 'delete' statement with the
           'FUNCTAB' array causes a fatal error.  Any attempt to assign
           to an element of 'FUNCTAB' also causes a fatal error.
 
 'NR'
      The number of input records 'awk' has processed since the beginning
      of the program's execution (SeeRecords).  'awk' increments
      'NR' each time it reads a new record.
 
 'PROCINFO #'
      The elements of this array provide access to information about the
      running 'awk' program.  The following elements (listed
      alphabetically) are guaranteed to be available:
 
      'PROCINFO["argv"]'
           The 'PROCINFO["argv"]' array contains all of the command-line
           arguments (after glob expansion and redirection processing on
           platforms where that must be done manually by the program)
           with subscripts ranging from 0 through 'argc' - 1.  For
           example, 'PROCINFO["argv"][0]' will contain the name by which
           'gawk' was invoked.  Here is an example of how this feature
           may be used:
 
                gawk '
                BEGIN {
                        for (i = 0; i < length(PROCINFO["argv"]); i++)
                                print i, PROCINFO["argv"][i]
                }'
 
           Please note that this differs from the standard 'ARGV' array
           which does not include command-line arguments that have
           already been processed by 'gawk' (SeeARGC and ARGV).
 
      'PROCINFO["egid"]'
           The value of the 'getegid()' system call.
 
      'PROCINFO["errno"]'
           The value of the C 'errno' variable when 'ERRNO' is set to the
           associated error message.
 
      'PROCINFO["euid"]'
           The value of the 'geteuid()' system call.
 
      'PROCINFO["FS"]'
           This is '"FS"' if field splitting with 'FS' is in effect,
           '"FIELDWIDTHS"' if field splitting with 'FIELDWIDTHS' is in
           effect, '"FPAT"' if field matching with 'FPAT' is in effect,
           or '"API"' if field splitting is controlled by an API input
           parser.
 
      'PROCINFO["gid"]'
           The value of the 'getgid()' system call.
 
      'PROCINFO["identifiers"]'
           A subarray, indexed by the names of all identifiers used in
           the text of the 'awk' program.  An "identifier" is simply the
           name of a variable (be it scalar or array), built-in function,
           user-defined function, or extension function.  For each
           identifier, the value of the element is one of the following:
 
           '"array"'
                The identifier is an array.
 
           '"builtin"'
                The identifier is a built-in function.
 
           '"extension"'
                The identifier is an extension function loaded via
                '@load' or '-l'.
 
           '"scalar"'
                The identifier is a scalar.
 
           '"untyped"'
                The identifier is untyped (could be used as a scalar or
                an array; 'gawk' doesn't know yet).
 
           '"user"'
                The identifier is a user-defined function.
 
           The values indicate what 'gawk' knows about the identifiers
           after it has finished parsing the program; they are _not_
           updated while the program runs.
 
      'PROCINFO["pgrpid"]'
           The process group ID of the current process.
 
      'PROCINFO["pid"]'
           The process ID of the current process.
 
      'PROCINFO["ppid"]'
           The parent process ID of the current process.
 
      'PROCINFO["strftime"]'
           The default time format string for 'strftime()'.  Assigning a
           new value to this element changes the default.  SeeTime
           Functions.
 
      'PROCINFO["uid"]'
           The value of the 'getuid()' system call.
 
      'PROCINFO["version"]'
           The version of 'gawk'.
 
      The following additional elements in the array are available to
      provide information about the MPFR and GMP libraries if your
      Arbitrary Precision Arithmetic::):
 
      'PROCINFO["gmp_version"]'
           The version of the GNU MP library.
 
      'PROCINFO["mpfr_version"]'
           The version of the GNU MPFR library.
 
      'PROCINFO["prec_max"]'
           The maximum precision supported by MPFR.
 
      'PROCINFO["prec_min"]'
           The minimum precision required by MPFR.
 
      The following additional elements in the array are available to
      provide information about the version of the extension API, if your
      version of 'gawk' supports dynamic loading of extension functions
      (SeeDynamic Extensions):
 
      'PROCINFO["api_major"]'
           The major version of the extension API.
 
      'PROCINFO["api_minor"]'
           The minor version of the extension API.
 
      On some systems, there may be elements in the array, '"group1"'
      through '"groupN"' for some N.  N is the number of supplementary
      groups that the process has.  Use the 'in' operator to test for
      these elements (SeeReference to Elements).
 
      The following elements allow you to change 'gawk''s behavior:
 
      'PROCINFO["NONFATAL"]'
           If this element exists, then I/O errors for all redirections
           become nonfatal.  SeeNonfatal.
 
      'PROCINFO["NAME", "NONFATAL"]'
           Make I/O errors for NAME be nonfatal.  SeeNonfatal.
 
      'PROCINFO["COMMAND", "pty"]'
           For two-way communication to COMMAND, use a pseudo-tty instead
           of setting up a two-way pipe.  SeeTwo-way I/O for more
           information.
 
      'PROCINFO["INPUT_NAME", "READ_TIMEOUT"]'
           Set a timeout for reading from input redirection INPUT_NAME.
           SeeRead Timeout for more information.
 
      'PROCINFO["INPUT_NAME", "RETRY"]'
           If an I/O error that may be retried occurs when reading data
           from INPUT_NAME, and this array entry exists, then 'getline'
           returns -2 instead of following the default behavior of
           returning -1 and configuring INPUT_NAME to return no further
           data.  An I/O error that may be retried is one where 'errno'
           has the value 'EAGAIN', 'EWOULDBLOCK', 'EINTR', or
           'ETIMEDOUT'.  This may be useful in conjunction with
           'PROCINFO["INPUT_NAME", "READ_TIMEOUT"]' or situations where a
           file descriptor has been configured to behave in a
           non-blocking fashion.  SeeRetrying Input for more
           information.
 
      'PROCINFO["sorted_in"]'
           If this element exists in 'PROCINFO', its value controls the
           order in which array indices will be processed by 'for (INDX
           in ARRAY)' loops.  This is an advanced feature, so we defer
           the full description until later; see SeeScanning an
           Array.
 
 'RLENGTH'
      The length of the substring matched by the 'match()' function
      (SeeString Functions).  'RLENGTH' is set by invoking the
      'match()' function.  Its value is the length of the matched string,
      or -1 if no match is found.
 
 'RSTART'
      The start index in characters of the substring that is matched by
      the 'match()' function (SeeString Functions).  'RSTART' is set
      by invoking the 'match()' function.  Its value is the position of
      the string where the matched substring starts, or zero if no match
      was found.
 
 'RT #'
      The input text that matched the text denoted by 'RS', the record
      separator.  It is set every time a record is read.
 
 'SYMTAB #'
      An array whose indices are the names of all defined global
      variables and arrays in the program.  'SYMTAB' makes 'gawk''s
      symbol table visible to the 'awk' programmer.  It is built as
      'gawk' parses the program and is complete before the program starts
      to run.
 
      The array may be used for indirect access to read or write the
      value of a variable:
 
           foo = 5
           SYMTAB["foo"] = 4
           print foo    # prints 4
 
      The 'isarray()' function (SeeType Functions) may be used to
      test if an element in 'SYMTAB' is an array.  Also, you may not use
      the 'delete' statement with the 'SYMTAB' array.
 
      You may use an index for 'SYMTAB' that is not a predefined
      identifier:
 
           SYMTAB["xxx"] = 5
           print SYMTAB["xxx"]
 
      This works as expected: in this case 'SYMTAB' acts just like a
      regular array.  The only difference is that you can't then delete
      'SYMTAB["xxx"]'.
 
      The 'SYMTAB' array is more interesting than it looks.  Andrew
      Schorr points out that it effectively gives 'awk' data pointers.
      Consider his example:
 
           # Indirect multiply of any variable by amount, return result
 
           function multiply(variable, amount)
           {
               return SYMTAB[variable] *= amount
           }
 
      You would use it like this:
 
           BEGIN {
               answer = 10.5
               multiply("answer", 4)
               print "The answer is", answer
           }
 
      When run, this produces:
 
           $ gawk -f answer.awk
           -| The answer is 42
 
           NOTE: In order to avoid severe time-travel paradoxes,(2)
           neither 'FUNCTAB' nor 'SYMTAB' is available as an element
           within the 'SYMTAB' array.
 
                         Changing 'NR' and 'FNR'
 
    'awk' increments 'NR' and 'FNR' each time it reads a record, instead
 of setting them to the absolute value of the number of records read.
 This means that a program can change these variables and their new
 values are incremented for each record.  (d.c.)  The following example
 shows this:
 
      $ echo '1
      > 2
      > 3
      > 4' | awk 'NR == 2 { NR = 17 }
      > { print NR }'
      -| 1
      -| 17
      -| 18
      -| 19
 
 Before 'FNR' was added to the 'awk' language (SeeV7/SVR3.1), many
 'awk' programs used this feature to track the number of records in a
 file by resetting 'NR' to zero when 'FILENAME' changed.
 
    ---------- Footnotes ----------
 
    (1) Some early implementations of Unix 'awk' initialized 'FILENAME'
 to '"-"', even if there were data files to be processed.  This behavior
 was incorrect and should not be relied upon in your programs.
 
    (2) Not to mention difficult implementation issues.