gawk: Extension New Mechanism Goals
C.5.2 Goals For A New Mechanism
-------------------------------
Some goals for the new API were:
* The API should be independent of 'gawk' internals. Changes in
'gawk' internals should not be visible to the writer of an
extension function.
* The API should provide _binary_ compatibility across 'gawk'
releases as long as the API itself does not change.
* The API should enable extensions written in C or C++ to have
roughly the same "appearance" to 'awk'-level code as 'awk'
functions do. This means that extensions should have:
- The ability to access function parameters.
- The ability to turn an undefined parameter into an array (call
by reference).
- The ability to create, access and update global variables.
- Easy access to all the elements of an array at once ("array
flattening") in order to loop over all the element in an easy
fashion for C code.
- The ability to create arrays (including 'gawk''s true arrays
of arrays).
Some additional important goals were:
* The API should use only features in ISO C 90, so that extensions
can be written using the widest range of C and C++ compilers. The
header should include the appropriate '#ifdef __cplusplus' and
'extern "C"' magic so that a C++ compiler could be used. (If using
C++, the runtime system has to be smart enough to call any
constructors and destructors, as 'gawk' is a C program. As of this
writing, this has not been tested.)
* The API mechanism should not require access to 'gawk''s symbols(1)
by the compile-time or dynamic linker, in order to enable creation
of extensions that also work on MS-Windows.
During development, it became clear that there were other features
that should be available to extensions, which were also subsequently
provided:
* Extensions should have the ability to hook into 'gawk''s I/O
redirection mechanism. In particular, the 'xgawk' developers
provided a so-called "open hook" to take over reading records.
During development, this was generalized to allow extensions to
hook into input processing, output processing, and two-way I/O.
* An extension should be able to provide a "call back" function to
perform cleanup actions when 'gawk' exits.
* An extension should be able to provide a version string so that
'gawk''s '--version' option can provide information about
extensions as well.
The requirement to avoid access to 'gawk''s symbols is, at first
glance, a difficult one to meet.
One design, apparently used by Perl and Ruby and maybe others, would
be to make the mainline 'gawk' code into a library, with the 'gawk'
utility a small C 'main()' function linked against the library.
This seemed like the tail wagging the dog, complicating build and
installation and making a simple copy of the 'gawk' executable from one
system to another (or one place to another on the same system!) into a
chancy operation.
Pat Rankin suggested the solution that was adopted. Extension
Mechanism Outline, for the details.
---------- Footnotes ----------
(1) The "symbols" are the variables and functions defined inside
'gawk'. Access to these symbols by code external to 'gawk' loaded
dynamically at runtime is problematic on MS-Windows.