gawk: Feature History
A.6 History of 'gawk' Features
==============================
This minor node describes the features in 'gawk' over and above those in
POSIX 'awk', in the order they were added to 'gawk'.
Version 2.10 of 'gawk' introduced the following features:
* The 'AWKPATH' environment variable for specifying a path search for
the '-f' command-line option (Options).
* The 'IGNORECASE' variable and its effects (
Case-sensitivity).
* The '/dev/stdin', '/dev/stdout', '/dev/stderr' and '/dev/fd/N'
special file names (Special Files).
Version 2.13 of 'gawk' introduced the following features:
* The 'FIELDWIDTHS' variable and its effects (Constant Size).
* The 'systime()' and 'strftime()' built-in functions for obtaining
and printing timestamps (Time Functions).
* Additional command-line options (Options):
- The '-W lint' option to provide error and portability checking
for both the source code and at runtime.
- The '-W compat' option to turn off the GNU extensions.
- The '-W posix' option for full POSIX compliance.
Version 2.14 of 'gawk' introduced the following feature:
* The 'next file' statement for skipping to the next data file (
Nextfile Statement).
Version 2.15 of 'gawk' introduced the following features:
* New variables (Built-in Variables):
- 'ARGIND', which tracks the movement of 'FILENAME' through
'ARGV'.
- 'ERRNO', which contains the system error message when
'getline' returns -1 or 'close()' fails.
* The '/dev/pid', '/dev/ppid', '/dev/pgrpid', and '/dev/user' special
file names. These have since been removed.
* The ability to delete all of an array at once with 'delete ARRAY'
(Delete).
* Command-line option changes (Options):
- The ability to use GNU-style long-named options that start
with '--'.
- The '--source' option for mixing command-line and library-file
source code.
Version 3.0 of 'gawk' introduced the following features:
* New or changed variables:
- 'IGNORECASE' changed, now applying to string comparison as
well as regexp operations (Case-sensitivity).
- 'RT', which contains the input text that matched 'RS' (
Records).
* Full support for both POSIX and GNU regexps (Regexp).
* The 'gensub()' function for more powerful text manipulation (
String Functions).
* The 'strftime()' function acquired a default time format, allowing
it to be called with no arguments (Time Functions).
* The ability for 'FS' and for the third argument to 'split()' to be
null strings (Single Character Fields).
* The ability for 'RS' to be a regexp (Records).
* The 'next file' statement became 'nextfile' (Nextfile
Statement).
* The 'fflush()' function from BWK 'awk' (then at Bell Laboratories;
I/O Functions).
* New command-line options:
- The '--lint-old' option to warn about constructs that are not
available in the original Version 7 Unix version of 'awk'
(V7/SVR3.1).
- The '-m' option from BWK 'awk'. (Brian was still at Bell
Laboratories at the time.) This was later removed from both
his 'awk' and from 'gawk'.
- The '--re-interval' option to provide interval expressions in
regexps (Regexp Operators).
- The '--traditional' option was added as a better name for
'--compat' (Options).
* The use of GNU Autoconf to control the configuration process (
Quick Installation).
* Amiga support. This has since been removed.
Version 3.1 of 'gawk' introduced the following features:
* New variables (Built-in Variables):
- 'BINMODE', for non-POSIX systems, which allows binary I/O for
input and/or output files (PC Using).
- 'LINT', which dynamically controls lint warnings.
- 'PROCINFO', an array for providing process-related
information.
- 'TEXTDOMAIN', for setting an application's
internationalization text domain (
Internationalization).
* The ability to use octal and hexadecimal constants in 'awk' program
source code (Nondecimal-numbers).
* The '|&' operator for two-way I/O to a coprocess (Two-way
I/O).
* The '/inet' special files for TCP/IP networking using '|&' (
TCP/IP Networking).
* The optional second argument to 'close()' that allows closing one
end of a two-way pipe to a coprocess (Two-way I/O).
* The optional third argument to the 'match()' function for capturing
text-matching subexpressions within a regexp (String
Functions).
* Positional specifiers in 'printf' formats for making translations
easier (Printf Ordering).
* A number of new built-in functions:
- The 'asort()' and 'asorti()' functions for sorting arrays
(Array Sorting).
- The 'bindtextdomain()', 'dcgettext()' and 'dcngettext()'
functions for internationalization (Programmer i18n).
- The 'extension()' function and the ability to add new built-in
functions dynamically (Dynamic Extensions).
- The 'mktime()' function for creating timestamps (Time
Functions).
- The 'and()', 'or()', 'xor()', 'compl()', 'lshift()',
'rshift()', and 'strtonum()' functions (Bitwise
Functions).
* The support for 'next file' as two words was removed completely
(Nextfile Statement).
* Additional command-line options (Options):
- The '--dump-variables' option to print a list of all global
variables.
- The '--exec' option, for use in CGI scripts.
- The '--gen-po' command-line option and the use of a leading
underscore to mark strings that should be translated (
String Extraction).
- The '--non-decimal-data' option to allow non-decimal input
data (Nondecimal Data).
- The '--profile' option and 'pgawk', the profiling version of
'gawk', for producing execution profiles of 'awk' programs
(Profiling).
- The '--use-lc-numeric' option to force 'gawk' to use the
locale's decimal point for parsing input data (
Conversion).
* The use of GNU Automake to help in standardizing the configuration
process (Quick Installation).
* The use of GNU 'gettext' for 'gawk''s own message output (
Gawk I18N).
* BeOS support. This was later removed.
* Tandem support. This was later removed.
* The Atari port became officially unsupported and was later removed
entirely.
* The source code changed to use ISO C standard-style function
definitions.
* POSIX compliance for 'sub()' and 'gsub()' (Gory Details).
* The 'length()' function was extended to accept an array argument
and return the number of elements in the array (String
Functions).
* The 'strftime()' function acquired a third argument to enable
printing times as UTC (Time Functions).
Version 4.0 of 'gawk' introduced the following features:
* Variable additions:
- 'FPAT', which allows you to specify a regexp that matches the
fields, instead of matching the field separator (
Splitting By Content).
- If 'PROCINFO["sorted_in"]' exists, 'for(iggy in foo)' loops
sort the indices before looping over them. The value of this
element provides control over how the indices are sorted
before the loop traversal starts (Controlling
Scanning).
- 'PROCINFO["strftime"]', which holds the default format for
'strftime()' (Time Functions).
* The special files '/dev/pid', '/dev/ppid', '/dev/pgrpid' and
'/dev/user' were removed.
* Support for IPv6 was added via the '/inet6' special file. '/inet4'
forces IPv4 and '/inet' chooses the system default, which is
probably IPv4 (TCP/IP Networking).
* The use of '\s' and '\S' escape sequences in regular expressions
(GNU Regexp Operators).
* Interval expressions became part of default regular expressions
(Regexp Operators).
* POSIX character classes work even with '--traditional' (
Regexp Operators).
* 'break' and 'continue' became invalid outside a loop, even with
DONTPRINTYET '--traditional' (Break Statement, and also see *noteDONTPRINTYET '--traditional' (Break Statement, and also see
Continue Statement).
* 'fflush()', 'nextfile', and 'delete ARRAY' are allowed if '--posix'
or '--traditional', since they are all now part of POSIX.
* An optional third argument to 'asort()' and 'asorti()', specifying
how to sort (String Functions).
* The behavior of 'fflush()' changed to match BWK 'awk' and for
POSIX; now both 'fflush()' and 'fflush("")' flush all open output
redirections (I/O Functions).
* The 'isarray()' function which distinguishes if an item is an array
or not, to make it possible to traverse arrays of arrays (
Type Functions).
* The 'patsplit()' function which gives the same capability as
'FPAT', for splitting (String Functions).
* An optional fourth argument to the 'split()' function, which is an
array to hold the values of the separators (String
Functions).
* Arrays of arrays (Arrays of Arrays).
* The 'BEGINFILE' and 'ENDFILE' special patterns (
BEGINFILE/ENDFILE).
* Indirect function calls (Indirect Calls).
* 'switch' / 'case' are enabled by default (Switch
Statement).
* Command-line option changes (Options):
- The '-b' and '--characters-as-bytes' options which prevent
'gawk' from treating input as a multibyte string.
- The redundant '--compat', '--copyleft', and '--usage' long
options were removed.
- The '--gen-po' option was finally renamed to the correct
'--gen-pot'.
- The '--sandbox' option which disables certain features.
- All long options acquired corresponding short options, for use
in '#!' scripts.
* Directories named on the command line now produce a warning, not a
fatal error, unless '--posix' or '--traditional' are used (
Command-line directories).
* The 'gawk' internals were rewritten, bringing the 'dgawk' debugger
and possibly improved performance (Debugger).
* Per the GNU Coding Standards, dynamic extensions must now define a
global symbol indicating that they are GPL-compatible (Plugin
License).
* In POSIX mode, string comparisons use 'strcoll()' / 'wcscoll()'
(POSIX String Comparison).
* The option for raw sockets was removed, since it was never
implemented (TCP/IP Networking).
* Ranges of the form '[d-h]' are treated as if they were in the C
locale, no matter what kind of regexp is being used, and even if
'--posix' (Ranges and Locales).
* Support was removed for the following systems:
- Atari
- Amiga
- BeOS
- Cray
- MIPS RiscOS
- MS-DOS with the Microsoft Compiler
- MS-Windows with the Microsoft Compiler
- NeXT
- SunOS 3.x, Sun 386 (Road Runner)
- Tandem (non-POSIX)
- Prestandard VAX C compiler for VAX/VMS
Version 4.1 of 'gawk' introduced the following features:
* Three new arrays: 'SYMTAB', 'FUNCTAB', and
'PROCINFO["identifiers"]' (Auto-set).
* The three executables 'gawk', 'pgawk', and 'dgawk', were merged
into one, named just 'gawk'. As a result the command-line options
changed.
* Command-line option changes (Options):
- The '-D' option invokes the debugger.
- The '-i' and '--include' options load 'awk' library files.
- The '-l' and '--load' options load compiled dynamic
extensions.
- The '-M' and '--bignum' options enable MPFR.
- The '-o' option only does pretty-printing.
- The '-p' option is used for profiling.
- The '-R' option was removed.
* Support for high precision arithmetic with MPFR (Arbitrary
Precision Arithmetic).
* The 'and()', 'or()' and 'xor()' functions changed to allow any
number of arguments, with a minimum of two (Bitwise
Functions).
* The dynamic extension interface was completely redone (
Dynamic Extensions).
* Redirected 'getline' became allowed inside 'BEGINFILE' and
'ENDFILE' (BEGINFILE/ENDFILE).
* The 'where' command was added to the debugger (Execution
Stack).
* Support for Ultrix was removed.
Version 4.2 of 'gawk' introduced the following changes:
* Changes to 'ENVIRON' are reflected into 'gawk''s environment and
that of programs that it runs. Auto-set.
* 'FIELDWIDTHS' was enhanced to allow skipping characters before
assigning a value to a field (Splitting By Content).
* The 'PROCINFO["argv"]' array. Auto-set.
* The maximum number of hexadecimal digits in '\x' escapes is now
two. Escape Sequences.
* Strongly typed regexp constants of the form '@/.../' (Strong
Regexp Constants).
* The bitwise functions changed, making negative arguments into a
fatal error (Bitwise Functions).
* The 'mktime()' function now accepts an optional second argument
(Time Functions).
* The 'typeof()' function (Type Functions).
* Optimizations are enabled by default. Use '-s' / '--no-optimize'
to disable optimizations.
* For many years, POSIX specified that default field splitting only
allowed spaces and tabs to separate fields, and this was how 'gawk'
behaved with '--posix'. As of 2013, the standard restored
historical behavior, and now default field splitting with '--posix'
also allows newlines to separate fields.
* Nonfatal output with 'print' and 'printf'. Nonfatal.
* Retryable I/O via 'PROCINFO[INPUT-FILE, "RETRY"]'; (Retrying
Input).
* Changes to the pretty-printer (Profiling):
- The '--pretty-print' option no longer runs the 'awk' program
too.
- Comments in the source program are preserved and placed into
the output file.
- Explicit parentheses for expressions in the input are
preserved in the generated output.
* Improvements to the extension API (Dynamic Extensions):
- The 'get_file()' function to access open redirections.
- The 'nonfatal()' function for generating nonfatal error
messages.
- Support for GMP and MPFR values.
- Input parsers can now override the default field parsing
mechanism by specifying explicit locations.
* Shell startup files are supplied with the distribution and
installed by 'make install' (Shell Startup Files).
* The 'igawk' program and its manual page are no longer installed
when 'gawk' is built. Igawk Program.
* Support for MirBSD was removed.
* Support for GNU/Linux on Alpha was removed.