gawkinet: CGI Lib

 
 2.9.1 A Simple CGI Library
 --------------------------
 
      HTTP is like being married: you have to be able to handle whatever
      you're given, while being very careful what you send back.
      Phil Smith III,
      <http://www.netfunny.com/rhf/jokes/99/Mar/http.html>
 
    In SeeA Web Service with Interaction Interacting Service, we saw
 the function 'CGI_setup()' as part of the web server "core logic"
 framework.  The code presented there handles almost everything necessary
 for CGI requests.  One thing it doesn't do is handle encoded characters
 in the requests.  For example, an '&' is encoded as a percent sign
 followed by the hexadecimal value: '%26'.  These encoded values should
 be decoded.  Following is a simple library to perform these tasks.  This
 code is used for all web server examples used throughout the rest of
 this Info file.  If you want to use it for your own web server, store
 the source code into a file named 'inetlib.awk'.  Then you can include
 these functions into your code by placing the following statement into
 your program (on the first line of your script):
 
      @include inetlib.awk
 
 But beware, this mechanism is only possible if you invoke your web
 server script with 'igawk' instead of the usual 'awk' or 'gawk'.  Here
 is the code:
 
      # CGI Library and core of a web server
      # Global arrays
      #   GETARG --- arguments to CGI GET command
      #   MENU   --- menu items (path names)
      #   PARAM  --- parameters of form x=y
 
      # Optional variable MyHost contains host address
      # Optional variable MyPort contains port number
      # Needs TopHeader, TopDoc, TopFooter
      # Sets MyPrefix, HttpService, Status, Reason
 
      BEGIN {
        if (MyHost == "") {
           "uname -n" | getline MyHost
           close("uname -n")
        }
        if (MyPort ==  0) MyPort = 8080
        HttpService = "/inet/tcp/" MyPort "/0/0"
        MyPrefix    = "http://" MyHost ":" MyPort
        SetUpServer()
        while ("awk" != "complex") {
          # header lines are terminated this way
          RS = ORS    = "\r\n"
          Status      = 200             # this means OK
          Reason      = "OK"
          Header      = TopHeader
          Document    = TopDoc
          Footer      = TopFooter
          if        (GETARG["Method"] == "GET") {
              HandleGET()
          } else if (GETARG["Method"] == "HEAD") {
              # not yet implemented
          } else if (GETARG["Method"] != "") {
              print "bad method", GETARG["Method"]
          }
          Prompt = Header Document Footer
          print "HTTP/1.0", Status, Reason     |& HttpService
          print "Connection: Close"            |& HttpService
          print "Pragma: no-cache"             |& HttpService
          len = length(Prompt) + length(ORS)
          print "Content-length:", len         |& HttpService
          print ORS Prompt                     |& HttpService
          # ignore all the header lines
          while ((HttpService |& getline) > 0)
              continue
          # stop talking to this client
          close(HttpService)
          # wait for new client request
          HttpService |& getline
          # do some logging
          print systime(), strftime(), $0
          CGI_setup($1, $2, $3)
        }
      }
 
      function CGI_setup(   method, uri, version, i)
      {
          delete GETARG
          delete MENU
          delete PARAM
          GETARG["Method"] = method
          GETARG["URI"] = uri
          GETARG["Version"] = version
 
          i = index(uri, "?")
          if (i > 0) {  # is there a "?" indicating a CGI request?
              split(substr(uri, 1, i-1), MENU, "[/:]")
              split(substr(uri, i+1), PARAM, "&")
              for (i in PARAM) {
                  PARAM[i] = _CGI_decode(PARAM[i])
                  j = index(PARAM[i], "=")
                  GETARG[substr(PARAM[i], 1, j-1)] = \
                                               substr(PARAM[i], j+1)
              }
          } else { # there is no "?", no need for splitting PARAMs
              split(uri, MENU, "[/:]")
          }
          for (i in MENU)     # decode characters in path
              if (i > 4)      # but not those in host name
                  MENU[i] = _CGI_decode(MENU[i])
      }
 
    This isolates details in a single function, 'CGI_setup()'.  Decoding
 of encoded characters is pushed off to a helper function,
 '_CGI_decode()'.  The use of the leading underscore ('_') in the
 function name is intended to indicate that it is an "internal" function,
 although there is nothing to enforce this:
 
      function _CGI_decode(str,   hexdigs, i, pre, code1, code2,
                                  val, result)
      {
         hexdigs = "123456789abcdef"
 
         i = index(str, "%")
         if (i == 0) # no work to do
            return str
 
         do {
            pre = substr(str, 1, i-1)   # part before %xx
            code1 = substr(str, i+1, 1) # first hex digit
            code2 = substr(str, i+2, 1) # second hex digit
            str = substr(str, i+3)      # rest of string
 
            code1 = tolower(code1)
            code2 = tolower(code2)
            val = index(hexdigs, code1) * 16 \
                  + index(hexdigs, code2)
 
            result = result pre sprintf("%c", val)
            i = index(str, "%")
         } while (i != 0)
         if (length(str) > 0)
            result = result str
         return result
      }
 
    This works by splitting the string apart around an encoded character.
 The two digits are converted to lowercase characters and looked up in a
 string of hex digits.  Note that '0' is not in the string on purpose;
 'index()' returns zero when it's not found, automatically giving the
 correct value!  Once the hexadecimal value is converted from characters
 in a string into a numerical value, 'sprintf()' converts the value back
 into a real character.  The following is a simple test harness for the
 above functions:
 
      BEGIN {
        CGI_setup("GET",
        "http://www.gnu.org/cgi-bin/foo?p1=stuff&p2=stuff%26junk" \
             "&percent=a %25 sign",
        "1.0")
        for (i in MENU)
            printf "MENU[\"%s\"] = %s\n", i, MENU[i]
        for (i in PARAM)
            printf "PARAM[\"%s\"] = %s\n", i, PARAM[i]
        for (i in GETARG)
            printf "GETARG[\"%s\"] = %s\n", i, GETARG[i]
      }
 
    And this is the result when we run it:
 
      $ gawk -f testserv.awk
      -| MENU["4"] = www.gnu.org
      -| MENU["5"] = cgi-bin
      -| MENU["6"] = foo
      -| MENU["1"] = http
      -| MENU["2"] =
      -| MENU["3"] =
      -| PARAM["1"] = p1=stuff
      -| PARAM["2"] = p2=stuff&junk
      -| PARAM["3"] = percent=a % sign
      -| GETARG["p1"] = stuff
      -| GETARG["percent"] = a % sign
      -| GETARG["p2"] = stuff&junk
      -| GETARG["Method"] = GET
      -| GETARG["Version"] = 1.0
      -| GETARG["URI"] = http://www.gnu.org/cgi-bin/foo?p1=stuff&
      p2=stuff%26junk&percent=a %25 sign