gawk: Auto-set
7.5.2 Built-in Variables That Convey Information
------------------------------------------------
The following is an alphabetical list of variables that 'awk' sets
automatically on certain occasions in order to provide information to
your program.
The variables that are specific to 'gawk' are marked with a pound
sign ('#'). These variables are 'gawk' extensions. In other 'awk'
implementations or if 'gawk' is in compatibility mode (Options),
they are not special:
'ARGC', 'ARGV'
The command-line arguments available to 'awk' programs are stored
in an array called 'ARGV'. 'ARGC' is the number of command-line
arguments present. Other Arguments. Unlike most 'awk'
arrays, 'ARGV' is indexed from 0 to 'ARGC' - 1. In the following
example:
$ awk 'BEGIN {
> for (i = 0; i < ARGC; i++)
> print ARGV[i]
> }' inventory-shipped mail-list
-| awk
-| inventory-shipped
-| mail-list
'ARGV[0]' contains 'awk', 'ARGV[1]' contains 'inventory-shipped',
and 'ARGV[2]' contains 'mail-list'. The value of 'ARGC' is three,
one more than the index of the last element in 'ARGV', because the
elements are numbered from zero.
The names 'ARGC' and 'ARGV', as well as the convention of indexing
the array from 0 to 'ARGC' - 1, are derived from the C language's
method of accessing command-line arguments.
The value of 'ARGV[0]' can vary from system to system. Also, you
should note that the program text is _not_ included in 'ARGV', nor
are any of 'awk''s command-line options. ARGC and ARGV for
information about how 'awk' uses these variables. (d.c.)
'ARGIND #'
The index in 'ARGV' of the current file being processed. Every
time 'gawk' opens a new data file for processing, it sets 'ARGIND'
to the index in 'ARGV' of the file name. When 'gawk' is processing
the input files, 'FILENAME == ARGV[ARGIND]' is always true.
This variable is useful in file processing; it allows you to tell
how far along you are in the list of data files as well as to
distinguish between successive instances of the same file name on
the command line.
While you can change the value of 'ARGIND' within your 'awk'
program, 'gawk' automatically sets it to a new value when it opens
the next file.
'ENVIRON'
An associative array containing the values of the environment. The
array indices are the environment variable names; the elements are
the values of the particular environment variables. For example,
'ENVIRON["HOME"]' might be '/home/arnold'.
For POSIX 'awk', changing this array does not affect the
environment passed on to any programs that 'awk' may spawn via
redirection or the 'system()' function.
However, beginning with version 4.2, if not in POSIX compatibility
mode, 'gawk' does update its own environment when 'ENVIRON' is
changed, thus changing the environment seen by programs that it
creates. You should therefore be especially careful if you modify
'ENVIRON["PATH"]', which is the search path for finding executable
programs.
This can also affect the running 'gawk' program, since some of the
built-in functions may pay attention to certain environment
variables. The most notable instance of this is 'mktime()' (
Time Functions), which pays attention the value of the 'TZ'
environment variable on many systems.
Some operating systems may not have environment variables. On such
systems, the 'ENVIRON' array is empty (except for
'ENVIRON["AWKPATH"]' and 'ENVIRON["AWKLIBPATH"]'; AWKPATH
Variable and AWKLIBPATH Variable).
'ERRNO #'
If a system error occurs during a redirection for 'getline', during
a read for 'getline', or during a 'close()' operation, then 'ERRNO'
contains a string describing the error.
In addition, 'gawk' clears 'ERRNO' before opening each command-line
input file. This enables checking if the file is readable inside a
'BEGINFILE' pattern (BEGINFILE/ENDFILE).
Otherwise, 'ERRNO' works similarly to the C variable 'errno'.
Except for the case just mentioned, 'gawk' _never_ clears it (sets
it to zero or '""'). Thus, you should only expect its value to be
meaningful when an I/O operation returns a failure value, such as
'getline' returning -1. You are, of course, free to clear it
yourself before doing an I/O operation.
If the value of 'ERRNO' corresponds to a system error in the C
'errno' variable, then 'PROCINFO["errno"]' will be set to the value
of 'errno'. For non-system errors, 'PROCINFO["errno"]' will be
zero.
'FILENAME'
The name of the current input file. When no data files are listed
on the command line, 'awk' reads from the standard input and
'FILENAME' is set to '"-"'. 'FILENAME' changes each time a new
file is read (Reading Files). Inside a 'BEGIN' rule, the
value of 'FILENAME' is '""', because there are no input files being
processed yet.(1) (d.c.) Note, though, that using 'getline'
(Getline) inside a 'BEGIN' rule can give 'FILENAME' a
value.
'FNR'
The current record number in the current file. 'awk' increments
'FNR' each time it reads a new record (Records). 'awk'
resets 'FNR' to zero each time it starts a new input file.
'NF'
The number of fields in the current input record. 'NF' is set each
time a new record is read, when a new field is created, or when
'$0' changes (Fields).
Unlike most of the variables described in this node, assigning a
value to 'NF' has the potential to affect 'awk''s internal
workings. In particular, assignments to 'NF' can be used to create
fields in or remove fields from the current record. Changing
Fields.
'FUNCTAB #'
An array whose indices and corresponding values are the names of
all the built-in, user-defined, and extension functions in the
program.
NOTE: Attempting to use the 'delete' statement with the
'FUNCTAB' array causes a fatal error. Any attempt to assign
to an element of 'FUNCTAB' also causes a fatal error.
'NR'
The number of input records 'awk' has processed since the beginning
of the program's execution (Records). 'awk' increments
'NR' each time it reads a new record.
'PROCINFO #'
The elements of this array provide access to information about the
running 'awk' program. The following elements (listed
alphabetically) are guaranteed to be available:
'PROCINFO["argv"]'
The 'PROCINFO["argv"]' array contains all of the command-line
arguments (after glob expansion and redirection processing on
platforms where that must be done manually by the program)
with subscripts ranging from 0 through 'argc' - 1. For
example, 'PROCINFO["argv"][0]' will contain the name by which
'gawk' was invoked. Here is an example of how this feature
may be used:
gawk '
BEGIN {
for (i = 0; i < length(PROCINFO["argv"]); i++)
print i, PROCINFO["argv"][i]
}'
Please note that this differs from the standard 'ARGV' array
which does not include command-line arguments that have
already been processed by 'gawk' (ARGC and ARGV).
'PROCINFO["egid"]'
The value of the 'getegid()' system call.
'PROCINFO["errno"]'
The value of the C 'errno' variable when 'ERRNO' is set to the
associated error message.
'PROCINFO["euid"]'
The value of the 'geteuid()' system call.
'PROCINFO["FS"]'
This is '"FS"' if field splitting with 'FS' is in effect,
'"FIELDWIDTHS"' if field splitting with 'FIELDWIDTHS' is in
effect, '"FPAT"' if field matching with 'FPAT' is in effect,
or '"API"' if field splitting is controlled by an API input
parser.
'PROCINFO["gid"]'
The value of the 'getgid()' system call.
'PROCINFO["identifiers"]'
A subarray, indexed by the names of all identifiers used in
the text of the 'awk' program. An "identifier" is simply the
name of a variable (be it scalar or array), built-in function,
user-defined function, or extension function. For each
identifier, the value of the element is one of the following:
'"array"'
The identifier is an array.
'"builtin"'
The identifier is a built-in function.
'"extension"'
The identifier is an extension function loaded via
'@load' or '-l'.
'"scalar"'
The identifier is a scalar.
'"untyped"'
The identifier is untyped (could be used as a scalar or
an array; 'gawk' doesn't know yet).
'"user"'
The identifier is a user-defined function.
The values indicate what 'gawk' knows about the identifiers
after it has finished parsing the program; they are _not_
updated while the program runs.
'PROCINFO["pgrpid"]'
The process group ID of the current process.
'PROCINFO["pid"]'
The process ID of the current process.
'PROCINFO["ppid"]'
The parent process ID of the current process.
'PROCINFO["strftime"]'
The default time format string for 'strftime()'. Assigning a
new value to this element changes the default. Time
Functions.
'PROCINFO["uid"]'
The value of the 'getuid()' system call.
'PROCINFO["version"]'
The version of 'gawk'.
The following additional elements in the array are available to
provide information about the MPFR and GMP libraries if your
Arbitrary Precision Arithmetic::):
'PROCINFO["gmp_version"]'
The version of the GNU MP library.
'PROCINFO["mpfr_version"]'
The version of the GNU MPFR library.
'PROCINFO["prec_max"]'
The maximum precision supported by MPFR.
'PROCINFO["prec_min"]'
The minimum precision required by MPFR.
The following additional elements in the array are available to
provide information about the version of the extension API, if your
version of 'gawk' supports dynamic loading of extension functions
(Dynamic Extensions):
'PROCINFO["api_major"]'
The major version of the extension API.
'PROCINFO["api_minor"]'
The minor version of the extension API.
On some systems, there may be elements in the array, '"group1"'
through '"groupN"' for some N. N is the number of supplementary
groups that the process has. Use the 'in' operator to test for
these elements (Reference to Elements).
The following elements allow you to change 'gawk''s behavior:
'PROCINFO["NONFATAL"]'
If this element exists, then I/O errors for all redirections
become nonfatal. Nonfatal.
'PROCINFO["NAME", "NONFATAL"]'
Make I/O errors for NAME be nonfatal. Nonfatal.
'PROCINFO["COMMAND", "pty"]'
For two-way communication to COMMAND, use a pseudo-tty instead
of setting up a two-way pipe. Two-way I/O for more
information.
'PROCINFO["INPUT_NAME", "READ_TIMEOUT"]'
Set a timeout for reading from input redirection INPUT_NAME.
Read Timeout for more information.
'PROCINFO["INPUT_NAME", "RETRY"]'
If an I/O error that may be retried occurs when reading data
from INPUT_NAME, and this array entry exists, then 'getline'
returns -2 instead of following the default behavior of
returning -1 and configuring INPUT_NAME to return no further
data. An I/O error that may be retried is one where 'errno'
has the value 'EAGAIN', 'EWOULDBLOCK', 'EINTR', or
'ETIMEDOUT'. This may be useful in conjunction with
'PROCINFO["INPUT_NAME", "READ_TIMEOUT"]' or situations where a
file descriptor has been configured to behave in a
non-blocking fashion. Retrying Input for more
information.
'PROCINFO["sorted_in"]'
If this element exists in 'PROCINFO', its value controls the
order in which array indices will be processed by 'for (INDX
in ARRAY)' loops. This is an advanced feature, so we defer
the full description until later; see Scanning an
Array.
'RLENGTH'
The length of the substring matched by the 'match()' function
(String Functions). 'RLENGTH' is set by invoking the
'match()' function. Its value is the length of the matched string,
or -1 if no match is found.
'RSTART'
The start index in characters of the substring that is matched by
the 'match()' function (String Functions). 'RSTART' is set
by invoking the 'match()' function. Its value is the position of
the string where the matched substring starts, or zero if no match
was found.
'RT #'
The input text that matched the text denoted by 'RS', the record
separator. It is set every time a record is read.
'SYMTAB #'
An array whose indices are the names of all defined global
variables and arrays in the program. 'SYMTAB' makes 'gawk''s
symbol table visible to the 'awk' programmer. It is built as
'gawk' parses the program and is complete before the program starts
to run.
The array may be used for indirect access to read or write the
value of a variable:
foo = 5
SYMTAB["foo"] = 4
print foo # prints 4
The 'isarray()' function (Type Functions) may be used to
test if an element in 'SYMTAB' is an array. Also, you may not use
the 'delete' statement with the 'SYMTAB' array.
You may use an index for 'SYMTAB' that is not a predefined
identifier:
SYMTAB["xxx"] = 5
print SYMTAB["xxx"]
This works as expected: in this case 'SYMTAB' acts just like a
regular array. The only difference is that you can't then delete
'SYMTAB["xxx"]'.
The 'SYMTAB' array is more interesting than it looks. Andrew
Schorr points out that it effectively gives 'awk' data pointers.
Consider his example:
# Indirect multiply of any variable by amount, return result
function multiply(variable, amount)
{
return SYMTAB[variable] *= amount
}
You would use it like this:
BEGIN {
answer = 10.5
multiply("answer", 4)
print "The answer is", answer
}
When run, this produces:
$ gawk -f answer.awk
-| The answer is 42
NOTE: In order to avoid severe time-travel paradoxes,(2)
neither 'FUNCTAB' nor 'SYMTAB' is available as an element
within the 'SYMTAB' array.
Changing 'NR' and 'FNR'
'awk' increments 'NR' and 'FNR' each time it reads a record, instead
of setting them to the absolute value of the number of records read.
This means that a program can change these variables and their new
values are incremented for each record. (d.c.) The following example
shows this:
$ echo '1
> 2
> 3
> 4' | awk 'NR == 2 { NR = 17 }
> { print NR }'
-| 1
-| 17
-| 18
-| 19
Before 'FNR' was added to the 'awk' language (V7/SVR3.1), many
'awk' programs used this feature to track the number of records in a
file by resetting 'NR' to zero when 'FILENAME' changed.
---------- Footnotes ----------
(1) Some early implementations of Unix 'awk' initialized 'FILENAME'
to '"-"', even if there were data files to be processed. This behavior
was incorrect and should not be relied upon in your programs.
(2) Not to mention difficult implementation issues.