gawk: Variable Typing

 
 6.3.2.1 String Type versus Numeric Type
 .......................................
 
 Scalar objects in 'awk' (variables, array elements, and fields) are
 _dynamically_ typed.  This means their type can change as the program
 runs, from "untyped" before any use,(1) to string or number, and then
 from string to number or number to string, as the program progresses.
 ('gawk' also provides regexp-typed scalars, but let's ignore that for
 now; SeeStrong Regexp Constants.)
 
    You can't do much with untyped variables, other than tell that they
 are untyped.  The following program tests 'a' against '""' and '0'; the
 test succeeds when 'a' has never been assigned a value.  It also uses
 Functions::) to show 'a''s type:
 
      $ gawk 'BEGIN { print (a == "" && a == 0 ?
      > "a is untyped" : "a has a type!") ; print typeof(a) }'
      -| a is untyped
      -| unassigned
 
    A scalar has numeric type when assigned a numeric value, such as from
 a numeric constant, or from another scalar with numeric type:
 
      $ gawk 'BEGIN { a = 42 ; print typeof(a)
      > b = a ; print typeof(b) }'
      number
      number
 
    Similarly, a scalar has string type when assigned a string value,
 such as from a string constant, or from another scalar with string type:
 
      $ gawk 'BEGIN { a = "forty two" ; print typeof(a)
      > b = a ; print typeof(b) }'
      string
      string
 
    So far, this is all simple and straightforward.  What happens,
 though, when 'awk' has to process data from a user?  Let's start with
 field data.  What should the following command produce as output?
 
      echo hello | awk '{ printf("%s %s < 42\n", $1,
                                 ($1 < 42 ? "is" : "is not")) }'
 
 Since 'hello' is alphabetic data, 'awk' can only do a string comparison.
 Internally, it converts '42' into '"42"' and compares the two string
 values '"hello"' and '"42"'.  Here's the result:
 
      $ echo hello | awk '{ printf("%s %s < 42\n", $1,
      >                            ($1 < 42 ? "is" : "is not")) }'
      -| hello is not < 42
 
    However, what happens when data from a user _looks like_ a number?
 On the one hand, in reality, the input data consists of characters, not
 binary numeric values.  But, on the other hand, the data looks numeric,
 and 'awk' really ought to treat it as such.  And indeed, it does:
 
      $ echo 37 | awk '{ printf("%s %s < 42\n", $1,
      >                         ($1 < 42 ? "is" : "is not")) }'
      -| 37 is < 42
 
    Here are the rules for when 'awk' treats data as a number, and for
 when it treats data as a string.
 
    The POSIX standard uses the term "numeric string" for input data that
 looks numeric.  The '37' in the previous example is a numeric string.
 So what is the type of a numeric string?  Answer: numeric.
 
    The type of a variable is important because the types of two
 variables determine how they are compared.  Variable typing follows
 these definitions and rules:
 
    * A numeric constant or the result of a numeric operation has the
      "numeric" attribute.
 
    * A string constant or the result of a string operation has the
      "string" attribute.
 
    * Fields, 'getline' input, 'FILENAME', 'ARGV' elements, 'ENVIRON'
      elements, and the elements of an array created by 'match()',
      'split()', and 'patsplit()' that are numeric strings have the
      "strnum" attribute.(2)  Otherwise, they have the "string"
      attribute.  Uninitialized variables also have the "strnum"
      attribute.
 
    * Attributes propagate across assignments but are not changed by any
      use.
 
    The last rule is particularly important.  In the following program,
 'a' has numeric type, even though it is later used in a string
 operation:
 
      BEGIN {
           a = 12.345
           b = a " is a cute number"
           print b
      }
 
    When two operands are compared, either string comparison or numeric
 comparison may be used.  This depends upon the attributes of the
 operands, according to the following symmetric matrix:
 
         +----------------------------------------------
         |       STRING          NUMERIC         STRNUM
 --------+----------------------------------------------
         |
 STRING  |       string          string          string
         |
 NUMERIC |       string          numeric         numeric
         |
 STRNUM  |       string          numeric         numeric
 --------+----------------------------------------------
 
    The basic idea is that user input that looks numeric--and _only_ user
 input--should be treated as numeric, even though it is actually made of
 characters and is therefore also a string.  Thus, for example, the
 string constant '" +3.14"', when it appears in program source code, is a
 string--even though it looks numeric--and is _never_ treated as a number
 for comparison purposes.
 
    In short, when one operand is a "pure" string, such as a string
 constant, then a string comparison is performed.  Otherwise, a numeric
 comparison is performed.  (The primary difference between a number and a
 strnum is that for strnums 'gawk' preserves the original string value
 that the scalar had when it came in.)
 
    This point bears additional emphasis: Input that looks numeric _is_
 numeric.  All other input is treated as strings.
 
    Thus, the six-character input string ' +3.14' receives the strnum
 attribute.  In contrast, the eight characters '" +3.14"' appearing in
 program text comprise a string constant.  The following examples print
 '1' when the comparison between the two different constants is true, and
 '0' otherwise:
 
      $ echo ' +3.14' | awk '{ print($0 == " +3.14") }'    True
      -| 1
      $ echo ' +3.14' | awk '{ print($0 == "+3.14") }'     False
      -| 0
      $ echo ' +3.14' | awk '{ print($0 == "3.14") }'      False
      -| 0
      $ echo ' +3.14' | awk '{ print($0 == 3.14) }'        True
      -| 1
      $ echo ' +3.14' | awk '{ print($1 == " +3.14") }'    False
      -| 0
      $ echo ' +3.14' | awk '{ print($1 == "+3.14") }'     True
      -| 1
      $ echo ' +3.14' | awk '{ print($1 == "3.14") }'      False
      -| 0
      $ echo ' +3.14' | awk '{ print($1 == 3.14) }'        True
      -| 1
 
    You can see the type of an input field (or other user input) using
 'typeof()':
 
      $ echo hello 37 | gawk '{ print typeof($1), typeof($2) }'
      -| string strnum
 
    ---------- Footnotes ----------
 
    (1) 'gawk' calls this "unassigned", as the following example shows.
 
    (2) Thus, a POSIX numeric string and 'gawk''s strnum are the same
 thing.