gawk: Basic Data Typing

 
 D.2 Data Values in a Computer
 =============================
 
 In a program, you keep track of information and values in things called
 "variables".  A variable is just a name for a given value, such as
 'first_name', 'last_name', 'address', and so on.  'awk' has several
 predefined variables, and it has special names to refer to the current
 input record and the fields of the record.  You may also group multiple
 associated values under one name, as an array.
 
    Data, particularly in 'awk', consists of either numeric values, such
 as 42 or 3.1415927, or string values.  String values are essentially
 anything that's not a number, such as a name.  Strings are sometimes
 referred to as "character data", since they store the individual
 characters that comprise them.  Individual variables, as well as numeric
 and string variables, are referred to as "scalar" values.  Groups of
 values, such as arrays, are not scalars.
 
    SeeComputer Arithmetic, provided a basic introduction to numeric
 types (integer and floating-point) and how they are used in a computer.
 Please review that information, including a number of caveats that were
 presented.
 
    While you are probably used to the idea of a number without a value
 (i.e., zero), it takes a bit more getting used to the idea of
 zero-length character data.  Nevertheless, such a thing exists.  It is
 called the "null string".  The null string is character data that has no
 value.  In other words, it is empty.  It is written in 'awk' programs
 like this: '""'.
 
    Humans are used to working in decimal; i.e., base 10.  In base 10,
 numbers go from 0 to 9, and then "roll over" into the next column.
 (Remember grade school?  42 = 4 x 10 + 2.)
 
    There are other number bases though.  Computers commonly use base 2
 or "binary", base 8 or "octal", and base 16 or "hexadecimal".  In
 binary, each column represents two times the value in the column to its
 right.  Each column may contain either a 0 or a 1.  Thus, binary 1010
 represents (1 x 8) + (0 x 4) + (1 x 2) + (0 x 1), or decimal 10.  Octal
 and hexadecimal are discussed more in SeeNondecimal-numbers.
 
    At the very lowest level, computers store values as groups of binary
 digits, or "bits".  Modern computers group bits into groups of eight,
 called "bytes".  Advanced applications sometimes have to manipulate bits
 directly, and 'gawk' provides functions for doing so.
 
    Programs are written in programming languages.  Hundreds, if not
 thousands, of programming languages exist.  One of the most popular is
 the C programming language.  The C language had a very strong influence
 on the design of the 'awk' language.
 
    There have been several versions of C. The first is often referred to
 as "K&R" C, after the initials of Brian Kernighan and Dennis Ritchie,
 the authors of the first book on C. (Dennis Ritchie created the
 language, and Brian Kernighan was one of the creators of 'awk'.)
 
    In the mid-1980s, an effort began to produce an international
 standard for C. This work culminated in 1989, with the production of the
 ANSI standard for C. This standard became an ISO standard in 1990.  In
 1999, a revised ISO C standard was approved and released.  Where it
 makes sense, POSIX 'awk' is compatible with 1999 ISO C.