gawk: Definition Syntax

 
 9.2.1 Function Definition Syntax
 --------------------------------
 
      It's entirely fair to say that the awk syntax for local variable
      definitions is appallingly awful.
                          -- _Brian Kernighan_
 
    Definitions of functions can appear anywhere between the rules of an
 'awk' program.  Thus, the general form of an 'awk' program is extended
 to include sequences of rules _and_ user-defined function definitions.
 There is no need to put the definition of a function before all uses of
 the function.  This is because 'awk' reads the entire program before
 starting to execute any of it.
 
    The definition of a function named NAME looks like this:
 
      'function' NAME'('[PARAMETER-LIST]')'
      '{'
           BODY-OF-FUNCTION
      '}'
 
 Here, NAME is the name of the function to define.  A valid function name
 is like a valid variable name: a sequence of letters, digits, and
 underscores that doesn't start with a digit.  Here too, only the 52
 upper- and lowercase English letters may be used in a function name.
 Within a single 'awk' program, any particular name can only be used as a
 variable, array, or function.
 
    PARAMETER-LIST is an optional list of the function's arguments and
 local variable names, separated by commas.  When the function is called,
 the argument names are used to hold the argument values given in the
 call.
 
    A function cannot have two parameters with the same name, nor may it
 have a parameter with the same name as the function itself.
 
      CAUTION: According to the POSIX standard, function parameters
      cannot have the same name as one of the special predefined
      variables (SeeBuilt-in Variables), nor may a function
      parameter have the same name as another function.
 
      Not all versions of 'awk' enforce these restrictions.  'gawk'
      always enforces the first restriction.  With '--posix' (See
      Options), it also enforces the second restriction.
 
    Local variables act like the empty string if referenced where a
 string value is required, and like zero if referenced where a numeric
 value is required.  This is the same as the behavior of regular
 variables that have never been assigned a value.  (There is more to
 understand about local variables; SeeDynamic Typing.)
 
    The BODY-OF-FUNCTION consists of 'awk' statements.  It is the most
 important part of the definition, because it says what the function
 should actually _do_.  The argument names exist to give the body a way
 to talk about the arguments; local variables exist to give the body
 places to keep temporary values.
 
    Argument names are not distinguished syntactically from local
 variable names.  Instead, the number of arguments supplied when the
 function is called determines how many argument variables there are.
 Thus, if three argument values are given, the first three names in
 PARAMETER-LIST are arguments and the rest are local variables.
 
    It follows that if the number of arguments is not the same in all
 calls to the function, some of the names in PARAMETER-LIST may be
 arguments on some occasions and local variables on others.  Another way
 to think of this is that omitted arguments default to the null string.
 
    Usually when you write a function, you know how many names you intend
 to use for arguments and how many you intend to use as local variables.
 It is conventional to place some extra space between the arguments and
 the local variables, in order to document how your function is supposed
 to be used.
 
    During execution of the function body, the arguments and local
 variable values hide, or "shadow", any variables of the same names used
 in the rest of the program.  The shadowed variables are not accessible
 in the function definition, because there is no way to name them while
 their names have been taken away for the arguments and local variables.
 All other variables used in the 'awk' program can be referenced or set
 normally in the function's body.
 
    The arguments and local variables last only as long as the function
 body is executing.  Once the body finishes, you can once again access
 the variables that were shadowed while the function was running.
 
    The function body can contain expressions that call functions.  They
 can even call this function, either directly or by way of another
 function.  When this happens, we say the function is "recursive".  The
 act of a function calling itself is called "recursion".
 
    All the built-in functions return a value to their caller.
 User-defined functions can do so also, using the 'return' statement,
 which is described in detail in SeeReturn Statement.  Many of the
 subsequent examples in this minor node use the 'return' statement.
 
    In many 'awk' implementations, including 'gawk', the keyword
 'function' may be abbreviated 'func'.  (c.e.)  However, POSIX only
 specifies the use of the keyword 'function'.  This actually has some
 practical implications.  If 'gawk' is in POSIX-compatibility mode (See
 Options), then the following statement does _not_ define a function:
 
      func foo() { a = sqrt($1) ; print a }
 
 Instead, it defines a rule that, for each record, concatenates the value
 of the variable 'func' with the return value of the function 'foo'.  If
 the resulting string is non-null, the action is executed.  This is
 probably not what is desired.  ('awk' accepts this input as
 syntactically valid, because functions may be used before they are
 defined in 'awk' programs.(1))
 
    To ensure that your 'awk' programs are portable, always use the
 keyword 'function' when defining a function.
 
    ---------- Footnotes ----------
 
    (1) This program won't actually run, because 'foo()' is undefined.