gawk: Scanning an Array

 
 8.1.5 Scanning All Elements of an Array
 ---------------------------------------
 
 In programs that use arrays, it is often necessary to use a loop that
 executes once for each element of an array.  In other languages, where
 arrays are contiguous and indices are limited to nonnegative integers,
 this is easy: all the valid indices can be found by counting from the
 lowest index up to the highest.  This technique won't do the job in
 'awk', because any number or string can be an array index.  So 'awk' has
 a special kind of 'for' statement for scanning an array:
 
      for (VAR in ARRAY)
          BODY
 
 This loop executes BODY once for each index in ARRAY that the program
 has previously used, with the variable VAR set to that index.
 
    The following program uses this form of the 'for' statement.  The
 first rule scans the input records and notes which words appear (at
 least once) in the input, by storing a one into the array 'used' with
 the word as the index.  The second rule scans the elements of 'used' to
 find all the distinct words that appear in the input.  It prints each
 word that is more than 10 characters long and also prints the number of
 such words.  SeeString Functions for more information on the
 built-in function 'length()'.
 
      # Record a 1 for each word that is used at least once
      {
          for (i = 1; i <= NF; i++)
              used[$i] = 1
      }
 
      # Find number of distinct words more than 10 characters long
      END {
          for (x in used) {
              if (length(x) > 10) {
                  ++num_long_words
                  print x
              }
          }
          print num_long_words, "words longer than 10 characters"
      }
 
 SeeWord Sorting for a more detailed example of this type.
 
    The order in which elements of the array are accessed by this
 statement is determined by the internal arrangement of the array
 elements within 'awk' and in standard 'awk' cannot be controlled or
 changed.  This can lead to problems if new elements are added to ARRAY
 by statements in the loop body; it is not predictable whether the 'for'
 loop will reach them.  Similarly, changing VAR inside the loop may
 produce strange results.  It is best to avoid such things.
 
    As a point of information, 'gawk' sets up the list of elements to be
 iterated over before the loop starts, and does not change it.  But not
 all 'awk' versions do so.  Consider this program, named 'loopcheck.awk':
 
      BEGIN {
          a["here"] = "here"
          a["is"] = "is"
          a["a"] = "a"
          a["loop"] = "loop"
          for (i in a) {
              j++
              a[j] = j
              print i
          }
      }
 
    Here is what happens when run with 'gawk' (and 'mawk'):
 
      $ gawk -f loopcheck.awk
      -| here
      -| loop
      -| a
      -| is
 
    Contrast this to BWK 'awk':
 
      $ nawk -f loopcheck.awk
      -| loop
      -| here
      -| is
      -| a
      -| 1