gawk: Bitwise Functions

 
 9.1.6 Bit-Manipulation Functions
 --------------------------------
 
      I can explain it for you, but I can't understand it for you.
                             -- _Anonymous_
 
    Many languages provide the ability to perform "bitwise" operations on
 two integer numbers.  In other words, the operation is performed on each
 successive pair of bits in the operands.  Three common operations are
 bitwise AND, OR, and XOR. The operations are described in SeeTable
 9.6 table-bitwise-ops.
 
                 Bit operator
           |  AND  |   OR  |  XOR
           |---+---+---+---+---+---
 Operands  | 0 | 1 | 0 | 1 | 0 | 1
 ----------+---+---+---+---+---+---
     0     | 0   0 | 0   1 | 0   1
     1     | 0   1 | 1   1 | 1   0
 
 Table 9.6: Bitwise operations
 
    As you can see, the result of an AND operation is 1 only when _both_
 bits are 1.  The result of an OR operation is 1 if _either_ bit is 1.
 The result of an XOR operation is 1 if either bit is 1, but not both.
 The next operation is the "complement"; the complement of 1 is 0 and the
 complement of 0 is 1.  Thus, this operation "flips" all the bits of a
 given value.
 
    Finally, two other common operations are to shift the bits left or
 right.  For example, if you have a bit string '10111001' and you shift
 it right by three bits, you end up with '00010111'.(1)  If you start
 over again with '10111001' and shift it left by three bits, you end up
 with '11001000'.  The following list describes 'gawk''s built-in
 functions that implement the bitwise operations.  Optional parameters
 are enclosed in square brackets ([ ]):
 
 'and(V1, V2 [, ...])'
      Return the bitwise AND of the arguments.  There must be at least
      two.
 
 'compl(VAL)'
      Return the bitwise complement of VAL.
 
 'lshift(VAL, COUNT)'
      Return the value of VAL, shifted left by COUNT bits.
 
 'or(V1, V2 [, ...])'
      Return the bitwise OR of the arguments.  There must be at least
      two.
 
 'rshift(VAL, COUNT)'
      Return the value of VAL, shifted right by COUNT bits.
 
 'xor(V1, V2 [, ...])'
      Return the bitwise XOR of the arguments.  There must be at least
      two.
 
      CAUTION: Beginning with 'gawk' version 4.2, negative operands are
      not allowed for any of these functions.  A negative operand
      produces a fatal error.  See the sidebar "Beware The Smoke and
      Mirrors!"  for more information as to why.
 
    Here is a user-defined function (SeeUser-defined) that
 illustrates the use of these functions:
 
      # bits2str --- turn an integer into readable ones and zeros
 
      function bits2str(bits,        data, mask)
      {
          if (bits == 0)
              return "0"
 
          mask = 1
          for (; bits != 0; bits = rshift(bits, 1))
              data = (and(bits, mask) ? "1" : "0") data
 
          while ((length(data) % 8) != 0)
              data = "0" data
 
          return data
      }
 
      BEGIN {
          printf "123 = %s\n", bits2str(123)
          printf "0123 = %s\n", bits2str(0123)
          printf "0x99 = %s\n", bits2str(0x99)
          comp = compl(0x99)
          printf "compl(0x99) = %#x = %s\n", comp, bits2str(comp)
          shift = lshift(0x99, 2)
          printf "lshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift)
          shift = rshift(0x99, 2)
          printf "rshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift)
      }
 
 This program produces the following output when run:
 
      $ gawk -f testbits.awk
      -| 123 = 01111011
      -| 0123 = 01010011
      -| 0x99 = 10011001
      -| compl(0x99) = 0x3fffffffffff66 =
      -| 00111111111111111111111111111111111111111111111101100110
      -| lshift(0x99, 2) = 0x264 = 0000001001100100
      -| rshift(0x99, 2) = 0x26 = 00100110
 
    The 'bits2str()' function turns a binary number into a string.
 Initializing 'mask' to one creates a binary value where the rightmost
 bit is set to one.  Using this mask, the function repeatedly checks the
 rightmost bit.  ANDing the mask with the value indicates whether the
 rightmost bit is one or not.  If so, a '"1"' is concatenated onto the
 front of the string.  Otherwise, a '"0"' is added.  The value is then
 shifted right by one bit and the loop continues until there are no more
 one bits.
 
    If the initial value is zero, it returns a simple '"0"'.  Otherwise,
 at the end, it pads the value with zeros to represent multiples of 8-bit
 quantities.  This is typical in modern computers.
 
    The main code in the 'BEGIN' rule shows the difference between the
 decimal and octal values for the same numbers (See
 Nondecimal-numbers), and then demonstrates the results of the
 'compl()', 'lshift()', and 'rshift()' functions.
 
                      Beware The Smoke and Mirrors!
 
    It other languages, bitwise operations are performed on integer
 values, not floating-point values.  As a general statement, such
 operations work best when performed on unsigned integers.
 
    'gawk' attempts to treat the arguments to the bitwise functions as
 unsigned integers.  For this reason, negative arguments produce a fatal
 error.
 
    In normal operation, for all of these functions, first the
 double-precision floating-point value is converted to the widest C
 unsigned integer type, then the bitwise operation is performed.  If the
 result cannot be represented exactly as a C 'double', leading nonzero
 bits are removed one by one until it can be represented exactly.  The
 result is then converted back into a C 'double'.(2)
 
    However, when using arbitrary precision arithmetic with the '-M'
 option (SeeArbitrary Precision Arithmetic), the results may differ.
 This is particularly noticeable with the 'compl()' function:
 
      $ gawk 'BEGIN { print compl(42) }'
      -| 9007199254740949
      $ gawk -M 'BEGIN { print compl(42) }'
      -| -43
 
    What's going on becomes clear when printing the results in
 hexadecimal:
 
      $ gawk 'BEGIN { printf "%#x\n", compl(42) }'
      -| 0x1fffffffffffd5
      $ gawk -M 'BEGIN { printf "%#x\n", compl(42) }'
      -| 0xffffffffffffffd5
 
    When using the '-M' option, under the hood, 'gawk' uses GNU MP
 arbitrary precision integers which have at least 64 bits of precision.
 When not using '-M', 'gawk' stores integral values in regular
 double-precision floating point, which only maintain 53 bits of
 precision.  Furthermore, the GNU MP library treats (or at least seems to
 treat) the leading bit as a sign bit; thus the result with '-M' in this
 case is a negative number.
 
    In short, using 'gawk' for any but the simplest kind of bitwise
 operations is probably a bad idea; caveat emptor!
 
    ---------- Footnotes ----------
 
    (1) This example shows that zeros come in on the left side.  For
 'gawk', this is always true, but in some languages, it's possible to
 have the left side fill with ones.
 
    (2) If you don't understand this paragraph, the upshot is that 'gawk'
 can only store a particular range of integer values; numbers outside
 that range are reduced to fit within the range.