gawk: Extension Sample File Functions

 
 16.7.1 File-Related Functions
 -----------------------------
 
 The 'filefuncs' extension provides three different functions, as
 follows.  The usage is:
 
 '@load "filefuncs"'
      This is how you load the extension.
 
 'result = chdir("/some/directory")'
      The 'chdir()' function is a direct hook to the 'chdir()' system
      call to change the current directory.  It returns zero upon success
      or a value less than zero upon error.  In the latter case, it
      updates 'ERRNO'.
 
 'result = stat("/some/path", statdata' [', follow']')'
      The 'stat()' function provides a hook into the 'stat()' system
      call.  It returns zero upon success or a value less than zero upon
      error.  In the latter case, it updates 'ERRNO'.
 
      By default, it uses the 'lstat()' system call.  However, if passed
      a third argument, it uses 'stat()' instead.
 
      In all cases, it clears the 'statdata' array.  When the call is
      successful, 'stat()' fills the 'statdata' array with information
      retrieved from the filesystem, as follows:
 
      Subscript   Field in 'struct stat'               File type
      ----------------------------------------------------------------
      '"name"'    The file name                        All
      '"dev"'     'st_dev'                             All
      '"ino"'     'st_ino'                             All
      '"mode"'    'st_mode'                            All
      '"nlink"'   'st_nlink'                           All
      '"uid"'     'st_uid'                             All
      '"gid"'     'st_gid'                             All
      '"size"'    'st_size'                            All
      '"atime"'   'st_atime'                           All
      '"mtime"'   'st_mtime'                           All
      '"ctime"'   'st_ctime'                           All
      '"rdev"'    'st_rdev'                            Device files
      '"major"'   'st_major'                           Device files
      '"minor"'   'st_minor'                           Device files
      '"blksize"' 'st_blksize'                         All
      '"pmode"'   A human-readable version of the      All
                  mode value, like that printed by
                  'ls' (for example, '"-rwxr-xr-x"')
      '"linkval"' The value of the symbolic link       Symbolic
                                                       links
      '"type"'    The type of the file as a            All
                  string--one of '"file"',
                  '"blockdev"', '"chardev"',
                  '"directory"', '"socket"',
                  '"fifo"', '"symlink"', '"door"',
                  or '"unknown"' (not all systems
                  support all file types)
 
 'flags = or(FTS_PHYSICAL, ...)'
 'result = fts(pathlist, flags, filedata)'
      Walk the file trees provided in 'pathlist' and fill in the
      'filedata' array, as described next.  'flags' is the bitwise OR of
      several predefined values, also described in a moment.  Return zero
      if there were no errors, otherwise return -1.
 
    The 'fts()' function provides a hook to the C library 'fts()'
 routines for traversing file hierarchies.  Instead of returning data
 about one file at a time in a stream, it fills in a multidimensional
 array with data about each file and directory encountered in the
 requested hierarchies.
 
    The arguments are as follows:
 
 'pathlist'
      An array of file names.  The element values are used; the index
      values are ignored.
 
 'flags'
      This should be the bitwise OR of one or more of the following
      predefined constant flag values.  At least one of 'FTS_LOGICAL' or
      'FTS_PHYSICAL' must be provided; otherwise 'fts()' returns an error
      value and sets 'ERRNO'.  The flags are:
 
      'FTS_LOGICAL'
           Do a "logical" file traversal, where the information returned
           for a symbolic link refers to the linked-to file, and not to
           the symbolic link itself.  This flag is mutually exclusive
           with 'FTS_PHYSICAL'.
 
      'FTS_PHYSICAL'
           Do a "physical" file traversal, where the information returned
           for a symbolic link refers to the symbolic link itself.  This
           flag is mutually exclusive with 'FTS_LOGICAL'.
 
      'FTS_NOCHDIR'
           As a performance optimization, the C library 'fts()' routines
           change directory as they traverse a file hierarchy.  This flag
           disables that optimization.
 
      'FTS_COMFOLLOW'
           Immediately follow a symbolic link named in 'pathlist',
           whether or not 'FTS_LOGICAL' is set.
 
      'FTS_SEEDOT'
           By default, the C library 'fts()' routines do not return
           entries for '.' (dot) and '..' (dot-dot).  This option causes
           entries for dot-dot to also be included.  (The extension
           always includes an entry for dot; more on this in a moment.)
 
      'FTS_XDEV'
           During a traversal, do not cross onto a different mounted
           filesystem.
 
 'filedata'
      The 'filedata' array holds the results.  'fts()' first clears it.
      Then it creates an element in 'filedata' for every element in
      'pathlist'.  The index is the name of the directory or file given
      in 'pathlist'.  The element for this index is itself an array.
      There are two cases:
 
      _The path is a file_
           In this case, the array contains two or three elements:
 
           '"path"'
                The full path to this file, starting from the "root" that
                was given in the 'pathlist' array.
 
           '"stat"'
                This element is itself an array, containing the same
                information as provided by the 'stat()' function
                described earlier for its 'statdata' argument.  The
                element may not be present if the 'stat()' system call
                for the file failed.
 
           '"error"'
                If some kind of error was encountered, the array will
                also contain an element named '"error"', which is a
                string describing the error.
 
      _The path is a directory_
           In this case, the array contains one element for each entry in
           the directory.  If an entry is a file, that element is the
           same as for files, just described.  If the entry is a
           directory, that element is (recursively) an array describing
           the subdirectory.  If 'FTS_SEEDOT' was provided in the flags,
           then there will also be an element named '".."'.  This element
           will be an array containing the data as provided by 'stat()'.
 
           In addition, there will be an element whose index is '"."'.
           This element is an array containing the same two or three
           elements as for a file: '"path"', '"stat"', and '"error"'.
 
    The 'fts()' function returns zero if there were no errors.
 Otherwise, it returns -1.
 
      NOTE: The 'fts()' extension does not exactly mimic the interface of
      the C library 'fts()' routines, choosing instead to provide an
      interface that is based on associative arrays, which is more
      comfortable to use from an 'awk' program.  This includes the lack
      of a comparison function, because 'gawk' already provides powerful
      array sorting facilities.  Although an 'fts_read()'-like interface
      could have been provided, this felt less natural than simply
      creating a multidimensional array to represent the file hierarchy
      and its information.
 
    See 'test/fts.awk' in the 'gawk' distribution for an example use of
 the 'fts()' extension function.