gawk: Fixed width data

 
 4.6.1 Processing Fixed-Width Data
 ---------------------------------
 
 An example of fixed-width data would be the input for old Fortran
 programs where numbers are run together, or the output of programs that
 did not anticipate the use of their output as input for other programs.
 
    An example of the latter is a table where all the columns are lined
 up by the use of a variable number of spaces and _empty fields are just
 spaces_.  Clearly, 'awk''s normal field splitting based on 'FS' does not
 work well in this case.  Although a portable 'awk' program can use a
 series of 'substr()' calls on '$0' (SeeString Functions), this is
 awkward and inefficient for a large number of fields.
 
    The splitting of an input record into fixed-width fields is specified
 by assigning a string containing space-separated numbers to the built-in
 variable 'FIELDWIDTHS'.  Each number specifies the width of the field,
 _including_ columns between fields.  If you want to ignore the columns
 between fields, you can specify the width as a separate field that is
 subsequently ignored.  It is a fatal error to supply a field width that
 has a negative value.
 
    The following data is the output of the Unix 'w' utility.  It is
 useful to illustrate the use of 'FIELDWIDTHS':
 
       10:06pm  up 21 days, 14:04,  23 users
      User     tty       login  idle   JCPU   PCPU  what
      hzuo     ttyV0     8:58pm            9      5  vi p24.tex
      hzang    ttyV3     6:37pm    50                -csh
      eklye    ttyV5     9:53pm            7      1  em thes.tex
      dportein ttyV6     8:17pm  1:47                -csh
      gierd    ttyD3    10:00pm     1                elm
      dave     ttyD4     9:47pm            4      4  w
      brent    ttyp0    26Jun91  4:46  26:46   4:41  bash
      dave     ttyq4    26Jun9115days     46     46  wnewmail
 
    The following program takes this input, converts the idle time to
 number of seconds, and prints out the first two fields and the
 calculated idle time:
 
      BEGIN  { FIELDWIDTHS = "9 6 10 6 7 7 35" }
      NR > 2 {
          idle = $4
          sub(/^ +/, "", idle)   # strip leading spaces
          if (idle == "")
              idle = 0
          if (idle ~ /:/) {      # hh:mm
              split(idle, t, ":")
              idle = t[1] * 60 + t[2]
          }
          if (idle ~ /days/)
              idle *= 24 * 60 * 60
 
          print $1, $2, idle
      }
 
      NOTE: The preceding program uses a number of 'awk' features that
      haven't been introduced yet.
 
    Running the program on the data produces the following results:
 
      hzuo      ttyV0  0
      hzang     ttyV3  50
      eklye     ttyV5  0
      dportein  ttyV6  107
      gierd     ttyD3  1
      dave      ttyD4  0
      brent     ttyp0  286
      dave      ttyq4  1296000
 
    Another (possibly more practical) example of fixed-width input data
 is the input from a deck of balloting cards.  In some parts of the
 United States, voters mark their choices by punching holes in computer
 cards.  These cards are then processed to count the votes for any
 particular candidate or on any particular issue.  Because a voter may
 choose not to vote on some issue, any column on the card may be empty.
 An 'awk' program for processing such data could use the 'FIELDWIDTHS'
 feature to simplify reading the data.  (Of course, getting 'gawk' to run
 on a system with card readers is another story!)