gawkinet: Web page

 
 2.7 Reading a Web Page
 ======================
 
 Retrieving a web page from a web server is as simple as retrieving email
 from an email server.  We only have to use a similar, but not identical,
 protocol and a different port.  The name of the protocol is HyperText
 Transfer Protocol (HTTP) and the port number is usually 80.  As in the
 preceding node, ask your administrator about the name of your local web
 server or proxy web server and its port number for HTTP requests.
 
    The following program employs a rather crude approach toward
 retrieving a web page.  It uses the prehistoric syntax of HTTP 0.9,
 which almost all web servers still support.  The most noticeable thing
 about it is that the program directs the request to the local proxy
 server whose name you insert in the special file name (which in turn
 calls 'www.yahoo.com'):
 
      BEGIN {
        RS = ORS = "\r\n"
        HttpService = "/inet/tcp/0/PROXY/80"
        print "GET http://www.yahoo.com"     |& HttpService
        while ((HttpService |& getline) > 0)
           print $0
        close(HttpService)
      }
 
    Again, lines are separated by a redefined 'RS' and 'ORS'.  The 'GET'
 request that we send to the server is the only kind of HTTP request that
 existed when the web was created in the early 1990s.  HTTP calls this
 'GET' request a "method," which tells the service to transmit a web page
 (here the home page of the Yahoo!  search engine).  Version 1.0 added
 the request methods 'HEAD' and 'POST'.  The current version of HTTP is
 1.1,(1) and knows the additional request methods 'OPTIONS', 'PUT',
 'DELETE', and 'TRACE'.  You can fill in any valid web address, and the
 program prints the HTML code of that page to your screen.
 
    Notice the similarity between the responses of the POP and HTTP
 services.  First, you get a header that is terminated by an empty line,
 and then you get the body of the page in HTML. The lines of the headers
 also have the same form as in POP. There is the name of a parameter,
 then a colon, and finally the value of that parameter.
 
    Images ('.png' or '.gif' files) can also be retrieved this way, but
 then you get binary data that should be redirected into a file.  Another
 application is calling a CGI (Common Gateway Interface) script on some
 server.  CGI scripts are used when the contents of a web page are not
 constant, but generated instantly at the moment you send a request for
 the page.  For example, to get a detailed report about the current
 quotes of Motorola stock shares, call a CGI script at Yahoo!  with the
 following:
 
      get = "GET http://quote.yahoo.com/q?s=MOT&d=t"
      print get |& HttpService
 
    You can also request weather reports this way.
 
    ---------- Footnotes ----------
 
    (1) Version 1.0 of HTTP was defined in RFC 1945.  HTTP 1.1 was
 initially specified in RFC 2068.  In June 1999, RFC 2068 was made
 obsolete by RFC 2616, an update without any substantial changes.