gawk: Programs Exercises

 
 11.5 Exercises
 ==============
 
   1. Rewrite 'cut.awk' (SeeCut Program) using 'split()' with '""'
      as the separator.
 
   2. In SeeEgrep Program, we mentioned that 'egrep -i' could be
      simulated in versions of 'awk' without 'IGNORECASE' by using
      'tolower()' on the line and the pattern.  In a footnote there, we
      also mentioned that this solution has a bug: the translated line is
      output, and not the original one.  Fix this problem.
 
   3. The POSIX version of 'id' takes options that control which
      Program::) to accept the same arguments and perform in the same
      way.
 
   4. The 'split.awk' program (SeeSplit Program) assumes that
      letters are contiguous in the character set, which isn't true for
      EBCDIC systems.  Fix this problem.  (Hint: Consider a different way
      to work through the alphabet, without relying on 'ord()' and
      'chr()'.)
 
   5. In 'uniq.awk' (SeeUniq Program, the logic for choosing which
      lines to print represents a "state machine", which is "a device
      that can be in one of a set number of stable conditions depending
      on its previous condition and on the present values of its
      inputs."(1)  Brian Kernighan suggests that "an alternative approach
      to state machines is to just read the input into an array, then use
      indexing.  It's almost always easier code, and for most inputs
      where you would use this, just as fast."  Rewrite the logic to
      follow this suggestion.
 
   6. Why can't the 'wc.awk' program (SeeWc Program) just use the
      value of 'FNR' in 'endfile()'?  Hint: Examine the code in See
      Filetrans Function.
 
   7. Manipulation of individual characters in the 'translate' program
      (SeeTranslate Program) is painful using standard 'awk'
      functions.  Given that 'gawk' can split strings into individual
      characters using '""' as the separator, how might you use this
      feature to simplify the program?
 
   8. The 'extract.awk' program (SeeExtract Program) was written
      before 'gawk' had the 'gensub()' function.  Use it to simplify the
      code.
 
      Sed::) with the more straightforward:
 
           BEGIN {
               pat = ARGV[1]
               repl = ARGV[2]
               ARGV[1] = ARGV[2] = ""
           }
 
           { gsub(pat, repl); print }
 
   10. What are the advantages and disadvantages of 'awksed.awk' versus
      the real 'sed' utility?
 
   11. In SeeIgawk Program, we mentioned that not trying to save the
      line read with 'getline' in the 'pathto()' function when testing
      for the file's accessibility for use with the main program
      simplifies things considerably.  What problem does this engender
      though?
 
   12. As an additional example of the idea that it is not always
      necessary to add new features to a program, consider the idea of
      having two files in a directory in the search path:
 
      'default.awk'
           This file contains a set of default library functions, such as
           'getopt()' and 'assert()'.
 
      'site.awk'
           This file contains library functions that are specific to a
           site or installation; i.e., locally developed functions.
           Having a separate file allows 'default.awk' to change with new
           'gawk' releases, without requiring the system administrator to
           update it each time by adding the local functions.
 
      One user suggested that 'gawk' be modified to automatically read
      these files upon startup.  Instead, it would be very simple to
      modify 'igawk' to do this.  Since 'igawk' can process nested
      '@include' directives, 'default.awk' could simply contain
      '@include' statements for the desired library functions.  Make this
      change.
 
   13. Modify 'anagram.awk' (SeeAnagram Program), to avoid the use
      of the external 'sort' utility.
 
    ---------- Footnotes ----------
 
    (1) This is the definition returned from entering 'define: state
 machine' into Google.