gawk: Feature History

 
 A.6 History of 'gawk' Features
 ==============================
 
 This minor node describes the features in 'gawk' over and above those in
 POSIX 'awk', in the order they were added to 'gawk'.
 
    Version 2.10 of 'gawk' introduced the following features:
 
    * The 'AWKPATH' environment variable for specifying a path search for
      the '-f' command-line option (SeeOptions).
 
    * The 'IGNORECASE' variable and its effects (See
      Case-sensitivity).
 
    * The '/dev/stdin', '/dev/stdout', '/dev/stderr' and '/dev/fd/N'
      special file names (SeeSpecial Files).
 
    Version 2.13 of 'gawk' introduced the following features:
 
    * The 'FIELDWIDTHS' variable and its effects (SeeConstant Size).
 
    * The 'systime()' and 'strftime()' built-in functions for obtaining
      and printing timestamps (SeeTime Functions).
 
    * Additional command-line options (SeeOptions):
 
         - The '-W lint' option to provide error and portability checking
           for both the source code and at runtime.
 
         - The '-W compat' option to turn off the GNU extensions.
 
         - The '-W posix' option for full POSIX compliance.
 
    Version 2.14 of 'gawk' introduced the following feature:
 
    * The 'next file' statement for skipping to the next data file (See
      Nextfile Statement).
 
    Version 2.15 of 'gawk' introduced the following features:
 
    * New variables (SeeBuilt-in Variables):
 
         - 'ARGIND', which tracks the movement of 'FILENAME' through
           'ARGV'.
 
         - 'ERRNO', which contains the system error message when
           'getline' returns -1 or 'close()' fails.
 
    * The '/dev/pid', '/dev/ppid', '/dev/pgrpid', and '/dev/user' special
      file names.  These have since been removed.
 
    * The ability to delete all of an array at once with 'delete ARRAY'
      (SeeDelete).
 
    * Command-line option changes (SeeOptions):
 
         - The ability to use GNU-style long-named options that start
           with '--'.
 
         - The '--source' option for mixing command-line and library-file
           source code.
 
    Version 3.0 of 'gawk' introduced the following features:
 
    * New or changed variables:
 
         - 'IGNORECASE' changed, now applying to string comparison as
           well as regexp operations (SeeCase-sensitivity).
 
         - 'RT', which contains the input text that matched 'RS' (See
           Records).
 
    * Full support for both POSIX and GNU regexps (SeeRegexp).
 
    * The 'gensub()' function for more powerful text manipulation (See
      String Functions).
 
    * The 'strftime()' function acquired a default time format, allowing
      it to be called with no arguments (SeeTime Functions).
 
    * The ability for 'FS' and for the third argument to 'split()' to be
      null strings (SeeSingle Character Fields).
 
    * The ability for 'RS' to be a regexp (SeeRecords).
 
    * The 'next file' statement became 'nextfile' (SeeNextfile
      Statement).
 
    * The 'fflush()' function from BWK 'awk' (then at Bell Laboratories;
      SeeI/O Functions).
 
    * New command-line options:
 
         - The '--lint-old' option to warn about constructs that are not
           available in the original Version 7 Unix version of 'awk'
           (SeeV7/SVR3.1).
 
         - The '-m' option from BWK 'awk'.  (Brian was still at Bell
           Laboratories at the time.)  This was later removed from both
           his 'awk' and from 'gawk'.
 
         - The '--re-interval' option to provide interval expressions in
           regexps (SeeRegexp Operators).
 
         - The '--traditional' option was added as a better name for
           '--compat' (SeeOptions).
 
    * The use of GNU Autoconf to control the configuration process (See
      Quick Installation).
 
    * Amiga support.  This has since been removed.
 
    Version 3.1 of 'gawk' introduced the following features:
 
    * New variables (SeeBuilt-in Variables):
 
         - 'BINMODE', for non-POSIX systems, which allows binary I/O for
           input and/or output files (SeePC Using).
 
         - 'LINT', which dynamically controls lint warnings.
 
         - 'PROCINFO', an array for providing process-related
           information.
 
         - 'TEXTDOMAIN', for setting an application's
           internationalization text domain (See
           Internationalization).
 
    * The ability to use octal and hexadecimal constants in 'awk' program
      source code (SeeNondecimal-numbers).
 
    * The '|&' operator for two-way I/O to a coprocess (SeeTwo-way
      I/O).
 
    * The '/inet' special files for TCP/IP networking using '|&' (See
      TCP/IP Networking).
 
    * The optional second argument to 'close()' that allows closing one
      end of a two-way pipe to a coprocess (SeeTwo-way I/O).
 
    * The optional third argument to the 'match()' function for capturing
      text-matching subexpressions within a regexp (SeeString
      Functions).
 
    * Positional specifiers in 'printf' formats for making translations
      easier (SeePrintf Ordering).
 
    * A number of new built-in functions:
 
         - The 'asort()' and 'asorti()' functions for sorting arrays
           (SeeArray Sorting).
 
         - The 'bindtextdomain()', 'dcgettext()' and 'dcngettext()'
           functions for internationalization (SeeProgrammer i18n).
 
         - The 'extension()' function and the ability to add new built-in
           functions dynamically (SeeDynamic Extensions).
 
         - The 'mktime()' function for creating timestamps (SeeTime
           Functions).
 
         - The 'and()', 'or()', 'xor()', 'compl()', 'lshift()',
           'rshift()', and 'strtonum()' functions (SeeBitwise
           Functions).
 
    * The support for 'next file' as two words was removed completely
      (SeeNextfile Statement).
 
    * Additional command-line options (SeeOptions):
 
         - The '--dump-variables' option to print a list of all global
           variables.
 
         - The '--exec' option, for use in CGI scripts.
 
         - The '--gen-po' command-line option and the use of a leading
           underscore to mark strings that should be translated (See
           String Extraction).
 
         - The '--non-decimal-data' option to allow non-decimal input
           data (SeeNondecimal Data).
 
         - The '--profile' option and 'pgawk', the profiling version of
           'gawk', for producing execution profiles of 'awk' programs
           (SeeProfiling).
 
         - The '--use-lc-numeric' option to force 'gawk' to use the
           locale's decimal point for parsing input data (See
           Conversion).
 
    * The use of GNU Automake to help in standardizing the configuration
      process (SeeQuick Installation).
 
    * The use of GNU 'gettext' for 'gawk''s own message output (See
      Gawk I18N).
 
    * BeOS support.  This was later removed.
 
    * Tandem support.  This was later removed.
 
    * The Atari port became officially unsupported and was later removed
      entirely.
 
    * The source code changed to use ISO C standard-style function
      definitions.
 
    * POSIX compliance for 'sub()' and 'gsub()' (SeeGory Details).
 
    * The 'length()' function was extended to accept an array argument
      and return the number of elements in the array (SeeString
      Functions).
 
    * The 'strftime()' function acquired a third argument to enable
      printing times as UTC (SeeTime Functions).
 
    Version 4.0 of 'gawk' introduced the following features:
 
    * Variable additions:
 
         - 'FPAT', which allows you to specify a regexp that matches the
           fields, instead of matching the field separator (See
           Splitting By Content).
 
         - If 'PROCINFO["sorted_in"]' exists, 'for(iggy in foo)' loops
           sort the indices before looping over them.  The value of this
           element provides control over how the indices are sorted
           before the loop traversal starts (SeeControlling
           Scanning).
 
         - 'PROCINFO["strftime"]', which holds the default format for
           'strftime()' (SeeTime Functions).
 
    * The special files '/dev/pid', '/dev/ppid', '/dev/pgrpid' and
      '/dev/user' were removed.
 
    * Support for IPv6 was added via the '/inet6' special file.  '/inet4'
      forces IPv4 and '/inet' chooses the system default, which is
      probably IPv4 (SeeTCP/IP Networking).
 
    * The use of '\s' and '\S' escape sequences in regular expressions
      (SeeGNU Regexp Operators).
 
    * Interval expressions became part of default regular expressions
      (SeeRegexp Operators).
 
    * POSIX character classes work even with '--traditional' (See
      Regexp Operators).
 
    * 'break' and 'continue' became invalid outside a loop, even with
DONTPRINTYET       '--traditional' (SeeBreak Statement, and also see *noteDONTPRINTYET       '--traditional' (SeeBreak Statement, and also see See
      Continue Statement).
 
    * 'fflush()', 'nextfile', and 'delete ARRAY' are allowed if '--posix'
      or '--traditional', since they are all now part of POSIX.
 
    * An optional third argument to 'asort()' and 'asorti()', specifying
      how to sort (SeeString Functions).
 
    * The behavior of 'fflush()' changed to match BWK 'awk' and for
      POSIX; now both 'fflush()' and 'fflush("")' flush all open output
      redirections (SeeI/O Functions).
 
    * The 'isarray()' function which distinguishes if an item is an array
      or not, to make it possible to traverse arrays of arrays (See
      Type Functions).
 
    * The 'patsplit()' function which gives the same capability as
      'FPAT', for splitting (SeeString Functions).
 
    * An optional fourth argument to the 'split()' function, which is an
      array to hold the values of the separators (SeeString
      Functions).
 
    * Arrays of arrays (SeeArrays of Arrays).
 
    * The 'BEGINFILE' and 'ENDFILE' special patterns (See
      BEGINFILE/ENDFILE).
 
    * Indirect function calls (SeeIndirect Calls).
 
    * 'switch' / 'case' are enabled by default (SeeSwitch
      Statement).
 
    * Command-line option changes (SeeOptions):
 
         - The '-b' and '--characters-as-bytes' options which prevent
           'gawk' from treating input as a multibyte string.
 
         - The redundant '--compat', '--copyleft', and '--usage' long
           options were removed.
 
         - The '--gen-po' option was finally renamed to the correct
           '--gen-pot'.
 
         - The '--sandbox' option which disables certain features.
 
         - All long options acquired corresponding short options, for use
           in '#!' scripts.
 
    * Directories named on the command line now produce a warning, not a
      fatal error, unless '--posix' or '--traditional' are used (See
      Command-line directories).
 
    * The 'gawk' internals were rewritten, bringing the 'dgawk' debugger
      and possibly improved performance (SeeDebugger).
 
    * Per the GNU Coding Standards, dynamic extensions must now define a
      global symbol indicating that they are GPL-compatible (SeePlugin
      License).
 
    * In POSIX mode, string comparisons use 'strcoll()' / 'wcscoll()'
      (SeePOSIX String Comparison).
 
    * The option for raw sockets was removed, since it was never
      implemented (SeeTCP/IP Networking).
 
    * Ranges of the form '[d-h]' are treated as if they were in the C
      locale, no matter what kind of regexp is being used, and even if
      '--posix' (SeeRanges and Locales).
 
    * Support was removed for the following systems:
 
         - Atari
 
         - Amiga
 
         - BeOS
 
         - Cray
 
         - MIPS RiscOS
 
         - MS-DOS with the Microsoft Compiler
 
         - MS-Windows with the Microsoft Compiler
 
         - NeXT
 
         - SunOS 3.x, Sun 386 (Road Runner)
 
         - Tandem (non-POSIX)
 
         - Prestandard VAX C compiler for VAX/VMS
 
    Version 4.1 of 'gawk' introduced the following features:
 
    * Three new arrays: 'SYMTAB', 'FUNCTAB', and
      'PROCINFO["identifiers"]' (SeeAuto-set).
 
    * The three executables 'gawk', 'pgawk', and 'dgawk', were merged
      into one, named just 'gawk'.  As a result the command-line options
      changed.
 
    * Command-line option changes (SeeOptions):
 
         - The '-D' option invokes the debugger.
 
         - The '-i' and '--include' options load 'awk' library files.
 
         - The '-l' and '--load' options load compiled dynamic
           extensions.
 
         - The '-M' and '--bignum' options enable MPFR.
 
         - The '-o' option only does pretty-printing.
 
         - The '-p' option is used for profiling.
 
         - The '-R' option was removed.
 
    * Support for high precision arithmetic with MPFR (SeeArbitrary
      Precision Arithmetic).
 
    * The 'and()', 'or()' and 'xor()' functions changed to allow any
      number of arguments, with a minimum of two (SeeBitwise
      Functions).
 
    * The dynamic extension interface was completely redone (See
      Dynamic Extensions).
 
    * Redirected 'getline' became allowed inside 'BEGINFILE' and
      'ENDFILE' (SeeBEGINFILE/ENDFILE).
 
    * The 'where' command was added to the debugger (SeeExecution
      Stack).
 
    * Support for Ultrix was removed.
 
    Version 4.2 of 'gawk' introduced the following changes:
 
    * Changes to 'ENVIRON' are reflected into 'gawk''s environment and
      that of programs that it runs.  SeeAuto-set.
 
    * 'FIELDWIDTHS' was enhanced to allow skipping characters before
      assigning a value to a field (SeeSplitting By Content).
 
    * The 'PROCINFO["argv"]' array.  SeeAuto-set.
 
    * The maximum number of hexadecimal digits in '\x' escapes is now
      two.  SeeEscape Sequences.
 
    * Strongly typed regexp constants of the form '@/.../' (SeeStrong
      Regexp Constants).
 
    * The bitwise functions changed, making negative arguments into a
      fatal error (SeeBitwise Functions).
 
    * The 'mktime()' function now accepts an optional second argument
      (SeeTime Functions).
 
    * The 'typeof()' function (SeeType Functions).
 
    * Optimizations are enabled by default.  Use '-s' / '--no-optimize'
      to disable optimizations.
 
    * For many years, POSIX specified that default field splitting only
      allowed spaces and tabs to separate fields, and this was how 'gawk'
      behaved with '--posix'.  As of 2013, the standard restored
      historical behavior, and now default field splitting with '--posix'
      also allows newlines to separate fields.
 
    * Nonfatal output with 'print' and 'printf'.  SeeNonfatal.
 
    * Retryable I/O via 'PROCINFO[INPUT-FILE, "RETRY"]'; (SeeRetrying
      Input).
 
    * Changes to the pretty-printer (SeeProfiling):
 
         - The '--pretty-print' option no longer runs the 'awk' program
           too.
 
         - Comments in the source program are preserved and placed into
           the output file.
 
         - Explicit parentheses for expressions in the input are
           preserved in the generated output.
 
    * Improvements to the extension API (SeeDynamic Extensions):
 
         - The 'get_file()' function to access open redirections.
 
         - The 'nonfatal()' function for generating nonfatal error
           messages.
 
         - Support for GMP and MPFR values.
 
         - Input parsers can now override the default field parsing
           mechanism by specifying explicit locations.
 
    * Shell startup files are supplied with the distribution and
      installed by 'make install' (SeeShell Startup Files).
 
    * The 'igawk' program and its manual page are no longer installed
      when 'gawk' is built.  SeeIgawk Program.
 
    * Support for MirBSD was removed.
 
    * Support for GNU/Linux on Alpha was removed.