gawk: Array Intro

 
 8.1.1 Introduction to Arrays
 ----------------------------
 
      Doing linear scans over an associative array is like trying to club
      someone to death with a loaded Uzi.
                             -- _Larry Wall_
 
    The 'awk' language provides one-dimensional arrays for storing groups
 of related strings or numbers.  Every 'awk' array must have a name.
 Array names have the same syntax as variable names; any valid variable
 name would also be a valid array name.  But one name cannot be used in
 both ways (as an array and as a variable) in the same 'awk' program.
 
    Arrays in 'awk' superficially resemble arrays in other programming
 languages, but there are fundamental differences.  In 'awk', it isn't
 necessary to specify the size of an array before starting to use it.
 Additionally, any number or string, not just consecutive integers, may
 be used as an array index.
 
    In most other languages, arrays must be "declared" before use,
 including a specification of how many elements or components they
 contain.  In such languages, the declaration causes a contiguous block
 of memory to be allocated for that many elements.  Usually, an index in
 the array must be a nonnegative integer.  For example, the index zero
 specifies the first element in the array, which is actually stored at
 the beginning of the block of memory.  Index one specifies the second
 element, which is stored in memory right after the first element, and so
 on.  It is impossible to add more elements to the array, because it has
 room only for as many elements as given in the declaration.  (Some
 languages allow arbitrary starting and ending indices--e.g., '15 ..
 27'--but the size of the array is still fixed when the array is
 declared.)
 
    A contiguous array of four elements might look like See(gawk)conceptually conceptually, if the element values are eight,
 '"foo"', '""', and 30.
 
 [image src="array-elements.png" alt="A Contiguous Array" text="+---------+---------+--------+---------+
 |    8    |  \"foo\"  |   \"\"   |    30   |    Value
 +---------+---------+--------+---------+
      0         1         2         3        Index"]
 
 Figure 8.1: A contiguous array
 
 Only the values are stored; the indices are implicit from the order of
 the values.  Here, eight is the value at index zero, because eight
 appears in the position with zero elements before it.
 
    Arrays in 'awk' are different--they are "associative".  This means
 that each array is a collection of pairs--an index and its corresponding
 array element value:
 
         Index   Value
 ------------------------
         '3'     '30'
         '1'     '"foo"'
         '0'     '8'
         '2'     '""'
 
 The pairs are shown in jumbled order because their order is
 irrelevant.(1)
 
    One advantage of associative arrays is that new pairs can be added at
 any time.  For example, suppose a tenth element is added to the array
 whose value is '"number ten"'.  The result is:
 
         Index   Value
 -------------------------------
         '10'    '"number
                 ten"'
         '3'     '30'
         '1'     '"foo"'
         '0'     '8'
         '2'     '""'
 
 Now the array is "sparse", which just means some indices are missing.
 It has elements 0-3 and 10, but doesn't have elements 4, 5, 6, 7, 8, or
 9.
 
    Another consequence of associative arrays is that the indices don't
 have to be nonnegative integers.  Any number, or even a string, can be
 an index.  For example, the following is an array that translates words
 from English to French:
 
         Index   Value
 ------------------------
         '"dog"' '"chien"'
         '"cat"' '"chat"'
         '"one"' '"un"'
         '1'     '"un"'
 
 Here we decided to translate the number one in both spelled-out and
 numeric form--thus illustrating that a single array can have both
 numbers and strings as indices.  (In fact, array subscripts are always
 strings.  There are some subtleties to how numbers work when used as
 array subscripts; this is discussed in more detail in SeeNumeric
 Array Subscripts.)  Here, the number '1' isn't double-quoted, because
 'awk' automatically converts it to a string.
 
    The value of 'IGNORECASE' has no effect upon array subscripting.  The
 identical string value used to store an array element must be used to
 retrieve it.  When 'awk' creates an array (e.g., with the 'split()'
 built-in function), that array's indices are consecutive integers
 starting at one.  (SeeString Functions.)
 
    'awk''s arrays are efficient--the time to access an element is
 independent of the number of elements in the array.
 
    ---------- Footnotes ----------
 
    (1) The ordering will vary among 'awk' implementations, which
 typically use hash tables to store array elements and values.