gawkinet: Web page
2.7 Reading a Web Page
======================
Retrieving a web page from a web server is as simple as retrieving email
from an email server. We only have to use a similar, but not identical,
protocol and a different port. The name of the protocol is HyperText
Transfer Protocol (HTTP) and the port number is usually 80. As in the
preceding node, ask your administrator about the name of your local web
server or proxy web server and its port number for HTTP requests.
The following program employs a rather crude approach toward
retrieving a web page. It uses the prehistoric syntax of HTTP 0.9,
which almost all web servers still support. The most noticeable thing
about it is that the program directs the request to the local proxy
server whose name you insert in the special file name (which in turn
calls 'www.yahoo.com'):
BEGIN {
RS = ORS = "\r\n"
HttpService = "/inet/tcp/0/PROXY/80"
print "GET http://www.yahoo.com" |& HttpService
while ((HttpService |& getline) > 0)
print $0
close(HttpService)
}
Again, lines are separated by a redefined 'RS' and 'ORS'. The 'GET'
request that we send to the server is the only kind of HTTP request that
existed when the web was created in the early 1990s. HTTP calls this
'GET' request a "method," which tells the service to transmit a web page
(here the home page of the Yahoo! search engine). Version 1.0 added
the request methods 'HEAD' and 'POST'. The current version of HTTP is
1.1,(1) and knows the additional request methods 'OPTIONS', 'PUT',
'DELETE', and 'TRACE'. You can fill in any valid web address, and the
program prints the HTML code of that page to your screen.
Notice the similarity between the responses of the POP and HTTP
services. First, you get a header that is terminated by an empty line,
and then you get the body of the page in HTML. The lines of the headers
also have the same form as in POP. There is the name of a parameter,
then a colon, and finally the value of that parameter.
Images ('.png' or '.gif' files) can also be retrieved this way, but
then you get binary data that should be redirected into a file. Another
application is calling a CGI (Common Gateway Interface) script on some
server. CGI scripts are used when the contents of a web page are not
constant, but generated instantly at the moment you send a request for
the page. For example, to get a detailed report about the current
quotes of Motorola stock shares, call a CGI script at Yahoo! with the
following:
get = "GET http://quote.yahoo.com/q?s=MOT&d=t"
print get |& HttpService
You can also request weather reports this way.
---------- Footnotes ----------
(1) Version 1.0 of HTTP was defined in RFC 1945. HTTP 1.1 was
initially specified in RFC 2068. In June 1999, RFC 2068 was made
obsolete by RFC 2616, an update without any substantial changes.