gawk: Controlling Scanning

 
 8.1.6 Using Predefined Array Scanning Orders with 'gawk'
 --------------------------------------------------------
 
 This node describes a feature that is specific to 'gawk'.
 
    By default, when a 'for' loop traverses an array, the order is
 undefined, meaning that the 'awk' implementation determines the order in
 which the array is traversed.  This order is usually based on the
 internal implementation of arrays and will vary from one version of
 'awk' to the next.
 
    Often, though, you may wish to do something simple, such as "traverse
 the array by comparing the indices in ascending order," or "traverse the
 array by comparing the values in descending order."  'gawk' provides two
 mechanisms that give you this control:
 
    * Set 'PROCINFO["sorted_in"]' to one of a set of predefined values.
      We describe this now.
 
    * Set 'PROCINFO["sorted_in"]' to the name of a user-defined function
      to use for comparison of array elements.  This advanced feature is
      described later in SeeArray Sorting.
 
    The following special values for 'PROCINFO["sorted_in"]' are
 available:
 
 '"@unsorted"'
      Array elements are processed in arbitrary order, which is the
      default 'awk' behavior.
 
 '"@ind_str_asc"'
      Order by indices in ascending order compared as strings; this is
      the most basic sort.  (Internally, array indices are always
      strings, so with 'a[2*5] = 1' the index is '"10"' rather than
      numeric 10.)
 
 '"@ind_num_asc"'
      Order by indices in ascending order but force them to be treated as
      numbers in the process.  Any index with a non-numeric value will
      end up positioned as if it were zero.
 
 '"@val_type_asc"'
      Order by element values in ascending order (rather than by
      indices).  Ordering is by the type assigned to the element (See
      Typing and Comparison).  All numeric values come before all
      string values, which in turn come before all subarrays.  (Subarrays
      have not been described yet; SeeArrays of Arrays.)
 
 '"@val_str_asc"'
      Order by element values in ascending order (rather than by
      indices).  Scalar values are compared as strings.  Subarrays, if
      present, come out last.
 
 '"@val_num_asc"'
      Order by element values in ascending order (rather than by
      indices).  Scalar values are compared as numbers.  Subarrays, if
      present, come out last.  When numeric values are equal, the string
      values are used to provide an ordering: this guarantees consistent
      results across different versions of the C 'qsort()' function,(1)
      which 'gawk' uses internally to perform the sorting.
 
 '"@ind_str_desc"'
      Like '"@ind_str_asc"', but the string indices are ordered from high
      to low.
 
 '"@ind_num_desc"'
      Like '"@ind_num_asc"', but the numeric indices are ordered from
      high to low.
 
 '"@val_type_desc"'
      Like '"@val_type_asc"', but the element values, based on type, are
      ordered from high to low.  Subarrays, if present, come out first.
 
 '"@val_str_desc"'
      Like '"@val_str_asc"', but the element values, treated as strings,
      are ordered from high to low.  Subarrays, if present, come out
      first.
 
 '"@val_num_desc"'
      Like '"@val_num_asc"', but the element values, treated as numbers,
      are ordered from high to low.  Subarrays, if present, come out
      first.
 
    The array traversal order is determined before the 'for' loop starts
 to run.  Changing 'PROCINFO["sorted_in"]' in the loop body does not
 affect the loop.  For example:
 
      $ gawk '
      > BEGIN {
      >    a[4] = 4
      >    a[3] = 3
      >    for (i in a)
      >        print i, a[i]
      > }'
      -| 4 4
      -| 3 3
      $ gawk '
      > BEGIN {
      >    PROCINFO["sorted_in"] = "@ind_str_asc"
      >    a[4] = 4
      >    a[3] = 3
      >    for (i in a)
      >        print i, a[i]
      > }'
      -| 3 3
      -| 4 4
 
    When sorting an array by element values, if a value happens to be a
 subarray then it is considered to be greater than any string or numeric
 value, regardless of what the subarray itself contains, and all
 subarrays are treated as being equal to each other.  Their order
 relative to each other is determined by their index strings.
 
    Here are some additional things to bear in mind about sorted array
 traversal:
 
    * The value of 'PROCINFO["sorted_in"]' is global.  That is, it
      affects all array traversal 'for' loops.  If you need to change it
      within your own code, you should see if it's defined and save and
      restore the value:
 
           ...
           if ("sorted_in" in PROCINFO) {
               save_sorted = PROCINFO["sorted_in"]
               PROCINFO["sorted_in"] = "@val_str_desc" # or whatever
           }
           ...
           if (save_sorted)
               PROCINFO["sorted_in"] = save_sorted
 
    * As already mentioned, the default array traversal order is
      represented by '"@unsorted"'.  You can also get the default
      behavior by assigning the null string to 'PROCINFO["sorted_in"]' or
      by just deleting the '"sorted_in"' element from the 'PROCINFO'
      array with the 'delete' statement.  (The 'delete' statement hasn't
      been described yet; SeeDelete.)
 
    In addition, 'gawk' provides built-in functions for sorting arrays;
 see SeeArray Sorting Functions.
 
    ---------- Footnotes ----------
 
    (1) When two elements compare as equal, the C 'qsort()' function does
 not guarantee that they will maintain their original relative order
 after sorting.  Using the string value to provide a unique ordering when
 the numeric values are equal ensures that 'gawk' behaves consistently
 across different environments.