gawk: Extension Sample File Functions
16.7.1 File-Related Functions
-----------------------------
The 'filefuncs' extension provides three different functions, as
follows. The usage is:
'@load "filefuncs"'
This is how you load the extension.
'result = chdir("/some/directory")'
The 'chdir()' function is a direct hook to the 'chdir()' system
call to change the current directory. It returns zero upon success
or a value less than zero upon error. In the latter case, it
updates 'ERRNO'.
'result = stat("/some/path", statdata' [', follow']')'
The 'stat()' function provides a hook into the 'stat()' system
call. It returns zero upon success or a value less than zero upon
error. In the latter case, it updates 'ERRNO'.
By default, it uses the 'lstat()' system call. However, if passed
a third argument, it uses 'stat()' instead.
In all cases, it clears the 'statdata' array. When the call is
successful, 'stat()' fills the 'statdata' array with information
retrieved from the filesystem, as follows:
Subscript Field in 'struct stat' File type
----------------------------------------------------------------
'"name"' The file name All
'"dev"' 'st_dev' All
'"ino"' 'st_ino' All
'"mode"' 'st_mode' All
'"nlink"' 'st_nlink' All
'"uid"' 'st_uid' All
'"gid"' 'st_gid' All
'"size"' 'st_size' All
'"atime"' 'st_atime' All
'"mtime"' 'st_mtime' All
'"ctime"' 'st_ctime' All
'"rdev"' 'st_rdev' Device files
'"major"' 'st_major' Device files
'"minor"' 'st_minor' Device files
'"blksize"' 'st_blksize' All
'"pmode"' A human-readable version of the All
mode value, like that printed by
'ls' (for example, '"-rwxr-xr-x"')
'"linkval"' The value of the symbolic link Symbolic
links
'"type"' The type of the file as a All
string--one of '"file"',
'"blockdev"', '"chardev"',
'"directory"', '"socket"',
'"fifo"', '"symlink"', '"door"',
or '"unknown"' (not all systems
support all file types)
'flags = or(FTS_PHYSICAL, ...)'
'result = fts(pathlist, flags, filedata)'
Walk the file trees provided in 'pathlist' and fill in the
'filedata' array, as described next. 'flags' is the bitwise OR of
several predefined values, also described in a moment. Return zero
if there were no errors, otherwise return -1.
The 'fts()' function provides a hook to the C library 'fts()'
routines for traversing file hierarchies. Instead of returning data
about one file at a time in a stream, it fills in a multidimensional
array with data about each file and directory encountered in the
requested hierarchies.
The arguments are as follows:
'pathlist'
An array of file names. The element values are used; the index
values are ignored.
'flags'
This should be the bitwise OR of one or more of the following
predefined constant flag values. At least one of 'FTS_LOGICAL' or
'FTS_PHYSICAL' must be provided; otherwise 'fts()' returns an error
value and sets 'ERRNO'. The flags are:
'FTS_LOGICAL'
Do a "logical" file traversal, where the information returned
for a symbolic link refers to the linked-to file, and not to
the symbolic link itself. This flag is mutually exclusive
with 'FTS_PHYSICAL'.
'FTS_PHYSICAL'
Do a "physical" file traversal, where the information returned
for a symbolic link refers to the symbolic link itself. This
flag is mutually exclusive with 'FTS_LOGICAL'.
'FTS_NOCHDIR'
As a performance optimization, the C library 'fts()' routines
change directory as they traverse a file hierarchy. This flag
disables that optimization.
'FTS_COMFOLLOW'
Immediately follow a symbolic link named in 'pathlist',
whether or not 'FTS_LOGICAL' is set.
'FTS_SEEDOT'
By default, the C library 'fts()' routines do not return
entries for '.' (dot) and '..' (dot-dot). This option causes
entries for dot-dot to also be included. (The extension
always includes an entry for dot; more on this in a moment.)
'FTS_XDEV'
During a traversal, do not cross onto a different mounted
filesystem.
'filedata'
The 'filedata' array holds the results. 'fts()' first clears it.
Then it creates an element in 'filedata' for every element in
'pathlist'. The index is the name of the directory or file given
in 'pathlist'. The element for this index is itself an array.
There are two cases:
_The path is a file_
In this case, the array contains two or three elements:
'"path"'
The full path to this file, starting from the "root" that
was given in the 'pathlist' array.
'"stat"'
This element is itself an array, containing the same
information as provided by the 'stat()' function
described earlier for its 'statdata' argument. The
element may not be present if the 'stat()' system call
for the file failed.
'"error"'
If some kind of error was encountered, the array will
also contain an element named '"error"', which is a
string describing the error.
_The path is a directory_
In this case, the array contains one element for each entry in
the directory. If an entry is a file, that element is the
same as for files, just described. If the entry is a
directory, that element is (recursively) an array describing
the subdirectory. If 'FTS_SEEDOT' was provided in the flags,
then there will also be an element named '".."'. This element
will be an array containing the data as provided by 'stat()'.
In addition, there will be an element whose index is '"."'.
This element is an array containing the same two or three
elements as for a file: '"path"', '"stat"', and '"error"'.
The 'fts()' function returns zero if there were no errors.
Otherwise, it returns -1.
NOTE: The 'fts()' extension does not exactly mimic the interface of
the C library 'fts()' routines, choosing instead to provide an
interface that is based on associative arrays, which is more
comfortable to use from an 'awk' program. This includes the lack
of a comparison function, because 'gawk' already provides powerful
array sorting facilities. Although an 'fts_read()'-like interface
could have been provided, this felt less natural than simply
creating a multidimensional array to represent the file hierarchy
and its information.
See 'test/fts.awk' in the 'gawk' distribution for an example use of
the 'fts()' extension function.