gawk: Programs Exercises
11.5 Exercises
==============
1. Rewrite 'cut.awk' (Cut Program) using 'split()' with '""'
as the separator.
2. In Egrep Program, we mentioned that 'egrep -i' could be
simulated in versions of 'awk' without 'IGNORECASE' by using
'tolower()' on the line and the pattern. In a footnote there, we
also mentioned that this solution has a bug: the translated line is
output, and not the original one. Fix this problem.
3. The POSIX version of 'id' takes options that control which
Program::) to accept the same arguments and perform in the same
way.
4. The 'split.awk' program (Split Program) assumes that
letters are contiguous in the character set, which isn't true for
EBCDIC systems. Fix this problem. (Hint: Consider a different way
to work through the alphabet, without relying on 'ord()' and
'chr()'.)
5. In 'uniq.awk' (Uniq Program, the logic for choosing which
lines to print represents a "state machine", which is "a device
that can be in one of a set number of stable conditions depending
on its previous condition and on the present values of its
inputs."(1) Brian Kernighan suggests that "an alternative approach
to state machines is to just read the input into an array, then use
indexing. It's almost always easier code, and for most inputs
where you would use this, just as fast." Rewrite the logic to
follow this suggestion.
6. Why can't the 'wc.awk' program (Wc Program) just use the
value of 'FNR' in 'endfile()'? Hint: Examine the code in
Filetrans Function.
7. Manipulation of individual characters in the 'translate' program
(Translate Program) is painful using standard 'awk'
functions. Given that 'gawk' can split strings into individual
characters using '""' as the separator, how might you use this
feature to simplify the program?
8. The 'extract.awk' program (Extract Program) was written
before 'gawk' had the 'gensub()' function. Use it to simplify the
code.
Sed::) with the more straightforward:
BEGIN {
pat = ARGV[1]
repl = ARGV[2]
ARGV[1] = ARGV[2] = ""
}
{ gsub(pat, repl); print }
10. What are the advantages and disadvantages of 'awksed.awk' versus
the real 'sed' utility?
11. In Igawk Program, we mentioned that not trying to save the
line read with 'getline' in the 'pathto()' function when testing
for the file's accessibility for use with the main program
simplifies things considerably. What problem does this engender
though?
12. As an additional example of the idea that it is not always
necessary to add new features to a program, consider the idea of
having two files in a directory in the search path:
'default.awk'
This file contains a set of default library functions, such as
'getopt()' and 'assert()'.
'site.awk'
This file contains library functions that are specific to a
site or installation; i.e., locally developed functions.
Having a separate file allows 'default.awk' to change with new
'gawk' releases, without requiring the system administrator to
update it each time by adding the local functions.
One user suggested that 'gawk' be modified to automatically read
these files upon startup. Instead, it would be very simple to
modify 'igawk' to do this. Since 'igawk' can process nested
'@include' directives, 'default.awk' could simply contain
'@include' statements for the desired library functions. Make this
change.
13. Modify 'anagram.awk' (Anagram Program), to avoid the use
of the external 'sort' utility.
---------- Footnotes ----------
(1) This is the definition returned from entering 'define: state
machine' into Google.