Info: (gawkinet) STOXPRED

Info Catalog
gawkinet: MOBAGWHO
gawkinet: Some Applications and Techniques
gawkinet: PROTBASE
gawkinet: STOXPRED

 
 3.9 STOXPRED: Stock Market Prediction As A Service
 ==================================================
 
      Far out in the uncharted backwaters of the unfashionable end of the
      Western Spiral arm of the Galaxy lies a small unregarded yellow
      sun.
 
      Orbiting this at a distance of roughly ninety-two million miles is
      an utterly insignificant little blue-green planet whose
      ape-descendent life forms are so amazingly primitive that they
      still think digital watches are a pretty neat idea.
 
      This planet has -- or rather had -- a problem, which was this: most
      of the people living on it were unhappy for pretty much of the
      time.  Many solutions were suggested for this problem, but most of
      these were largely concerned with the movements of small green
      pieces of paper, which is odd because it wasn't the small green
      pieces of paper that were unhappy.
      Douglas Adams, 'The Hitch Hiker's Guide to the Galaxy'
 
    Valuable services on the Internet are usually _not_ implemented as
 mobile agents.  There are much simpler ways of implementing services.
 All Unix systems provide, for example, the 'cron' service.  Unix system
 users can write a list of tasks to be done each day, each week, twice a
 day, or just once.  The list is entered into a file named 'crontab'.
 For example, to distribute a newsletter on a daily basis this way, use
 'cron' for calling a script each day early in the morning.
 
      # run at 8 am on weekdays, distribute the newsletter
      0 8 * * 1-5   $HOME/bin/daily.job >> $HOME/log/newsletter 2>&1
 
    The script first looks for interesting information on the Internet,
 assembles it in a nice form and sends the results via email to the
 customers.
 
    The following is an example of a primitive newsletter on stock market
 prediction.  It is a report which first tries to predict the change of
 each share in the Dow Jones Industrial Index for the particular day.
 Then it mentions some especially promising shares as well as some shares
 which look remarkably bad on that day.  The report ends with the usual
 disclaimer which tells every child _not_ to try this at home and hurt
 anybody.
 
      Good morning Uncle Scrooge,
 
      This is your daily stock market report for Monday, October 16, 2000.
      Here are the predictions for today:
 
              AA      neutral
              GE      up
              JNJ     down
              MSFT    neutral
              ...
              UTX     up
              DD      down
              IBM     up
              MO      down
              WMT     up
              DIS     up
              INTC    up
              MRK     down
              XOM     down
              EK      down
              IP      down
 
      The most promising shares for today are these:
 
              INTC            http://biz.yahoo.com/n/i/intc.html
 
      The stock shares to avoid today are these:
 
              EK              http://biz.yahoo.com/n/e/ek.html
              IP              http://biz.yahoo.com/n/i/ip.html
              DD              http://biz.yahoo.com/n/d/dd.html
              ...
 
    The script as a whole is rather long.  In order to ease the pain of
 studying other people's source code, we have broken the script up into
 meaningful parts which are invoked one after the other.  The basic
 structure of the script is as follows:
 
      BEGIN {
        Init()
        ReadQuotes()
        CleanUp()
        Prediction()
        Report()
        SendMail()
      }
 
    The earlier parts store data into variables and arrays which are
 subsequently used by later parts of the script.  The 'Init()' function
 first checks if the script is invoked correctly (without any
 parameters).  If not, it informs the user of the correct usage.  What
 follows are preparations for the retrieval of the historical quote data.
 The names of the 30 stock shares are stored in an array 'name' along
 with the current date in 'day', 'month', and 'year'.
 
    All users who are separated from the Internet by a firewall and have
 to direct their Internet accesses to a proxy must supply the name of the
 proxy to this script with the '-v Proxy=NAME' option.  For most users,
 the default proxy and port number should suffice.
 
      function Init() {
        if (ARGC != 1) {
          print "STOXPRED - daily stock share prediction"
          print "IN:\n    no parameters, nothing on stdin"
          print "PARAM:\n    -v Proxy=MyProxy -v ProxyPort=80"
          print "OUT:\n    commented predictions as email"
          print "JK 09.10.2000"
          exit
        }
        # Remember ticker symbols from Dow Jones Industrial Index
        StockCount = split("AA GE JNJ MSFT AXP GM JPM PG BA HD KO \
          SBC C HON MCD T CAT HWP MMM UTX DD IBM MO WMT DIS INTC \
          MRK XOM EK IP", name);
        # Remember the current date as the end of the time series
        day   = strftime("%d")
        month = strftime("%m")
        year  = strftime("%Y")
        if (Proxy     == "")  Proxy     = "chart.yahoo.com"
        if (ProxyPort ==  0)  ProxyPort = 80
        YahooData = "/inet/tcp/0/" Proxy "/" ProxyPort
      }
 
    There are two really interesting parts in the script.  One is the
 function which reads the historical stock quotes from an Internet
 server.  The other is the one that does the actual prediction.  In the
 following function we see how the quotes are read from the Yahoo server.
 The data which comes from the server is in CSV format (comma-separated
 values):
 
      Date,Open,High,Low,Close,Volume
      9-Oct-00,22.75,22.75,21.375,22.375,7888500
      6-Oct-00,23.8125,24.9375,21.5625,22,10701100
      5-Oct-00,24.4375,24.625,23.125,23.50,5810300
 
    Lines contain values of the same time instant, whereas columns are
 separated by commas and contain the kind of data that is described in
 the header (first) line.  At first, 'gawk' is instructed to separate
 columns by commas ('FS = ","').  In the loop that follows, a connection
 to the Yahoo server is first opened, then a download takes place, and
 finally the connection is closed.  All this happens once for each ticker
 symbol.  In the body of this loop, an Internet address is built up as a
 string according to the rules of the Yahoo server.  The starting and
 ending date are chosen to be exactly the same, but one year apart in the
 past.  All the action is initiated within the 'printf' command which
 transmits the request for data to the Yahoo server.
 
    In the inner loop, the server's data is first read and then scanned
 line by line.  Only lines which have six columns and the name of a month
 in the first column contain relevant data.  This data is stored in the
 two-dimensional array 'quote'; one dimension being time, the other being
 the ticker symbol.  During retrieval of the first stock's data, the
 calendar names of the time instances are stored in the array 'day'
 because we need them later.
 
      function ReadQuotes() {
        # Retrieve historical data for each ticker symbol
        FS = ","
        for (stock = 1; stock <= StockCount; stock++) {
          URL = "http://chart.yahoo.com/table.csv?s=" name[stock] \
                "&a=" month "&b=" day   "&c=" year-1 \
                "&d=" month "&e=" day   "&f=" year \
                "g=d&q=q&y=0&z=" name[stock] "&x=.csv"
          printf("GET " URL " HTTP/1.0\r\n\r\n") |& YahooData
          while ((YahooData |& getline) > 0) {
            if (NF == 6 && $1 ~ /Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec/) {
              if (stock == 1)
                days[++daycount] = $1;
              quote[$1, stock] = $5
            }
          }
          close(YahooData)
        }
        FS = " "
      }
 
    Now that we _have_ the data, it can be checked once again to make
 sure that no individual stock is missing or invalid, and that all the
 stock quotes are aligned correctly.  Furthermore, we renumber the time
 instances.  The most recent day gets day number 1 and all other days get
 consecutive numbers.  All quotes are rounded toward the nearest whole
 number in US Dollars.
 
      function CleanUp() {
        # clean up time series; eliminate incomplete data sets
        for (d = 1; d <= daycount; d++) {
          for (stock = 1; stock <= StockCount; stock++)
            if (! ((days[d], stock) in quote))
                stock = StockCount + 10
          if (stock > StockCount + 1)
              continue
          datacount++
          for (stock = 1; stock <= StockCount; stock++)
            data[datacount, stock] = int(0.5 + quote[days[d], stock])
        }
        delete quote
        delete days
      }
 
    Now we have arrived at the second really interesting part of the
 whole affair.  What we present here is a very primitive prediction
 algorithm: _If a stock fell yesterday, assume it will also fall today;
 if it rose yesterday, assume it will rise today_.  (Feel free to replace
 this algorithm with a smarter one.)  If a stock changed in the same
 direction on two consecutive days, this is an indication which should be
 highlighted.  Two-day advances are stored in 'hot' and two-day declines
 in 'avoid'.
 
    The rest of the function is a sanity check.  It counts the number of
 correct predictions in relation to the total number of predictions one
 could have made in the year before.
 
      function Prediction() {
        # Predict each ticker symbol by prolonging yesterday's trend
        for (stock = 1; stock <= StockCount; stock++) {
          if         (data[1, stock] > data[2, stock]) {
            predict[stock] = "up"
          } else if  (data[1, stock] < data[2, stock]) {
            predict[stock] = "down"
          } else {
            predict[stock] = "neutral"
          }
          if ((data[1, stock] > data[2, stock]) && (data[2, stock] > data[3, stock]))
            hot[stock] = 1
          if ((data[1, stock] < data[2, stock]) && (data[2, stock] < data[3, stock]))
            avoid[stock] = 1
        }
        # Do a plausibility check: how many predictions proved correct?
        for (s = 1; s <= StockCount; s++) {
          for (d = 1; d <= datacount-2; d++) {
            if         (data[d+1, s] > data[d+2, s]) {
              UpCount++
            } else if  (data[d+1, s] < data[d+2, s]) {
              DownCount++
            } else {
              NeutralCount++
            }
            if (((data[d, s]  > data[d+1, s]) && (data[d+1, s]  > data[d+2, s])) ||
                ((data[d, s]  < data[d+1, s]) && (data[d+1, s]  < data[d+2, s])) ||
                ((data[d, s] == data[d+1, s]) && (data[d+1, s] == data[d+2, s])))
              CorrectCount++
          }
        }
      }
 
    At this point the hard work has been done: the array 'predict'
 contains the predictions for all the ticker symbols.  It is up to the
 function 'Report()' to find some nice words to introduce the desired
 information.
 
      function Report() {
        # Generate report
        report =        "\nThis is your daily "
        report = report "stock market report for "strftime("%A, %B %d, %Y")".\n"
        report = report "Here are the predictions for today:\n\n"
        for (stock = 1; stock <= StockCount; stock++)
          report = report "\t" name[stock] "\t" predict[stock] "\n"
        for (stock in hot) {
          if (HotCount++ == 0)
            report = report "\nThe most promising shares for today are these:\n\n"
          report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
            tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
        }
        for (stock in avoid) {
          if (AvoidCount++ == 0)
            report = report "\nThe stock shares to avoid today are these:\n\n"
          report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
            tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
        }
        report = report "\nThis sums up to " HotCount+0 " winners and " AvoidCount+0
        report = report " losers. When using this kind\nof prediction scheme for"
        report = report " the 12 months which lie behind us,\nwe get " UpCount
        report = report " 'ups' and " DownCount " 'downs' and " NeutralCount
        report = report " 'neutrals'. Of all\nthese " UpCount+DownCount+NeutralCount
        report = report " predictions " CorrectCount " proved correct next day.\n"
        report = report "A success rate of "\
                   int(100*CorrectCount/(UpCount+DownCount+NeutralCount)) "%.\n"
        report = report "Random choice would have produced a 33% success rate.\n"
        report = report "Disclaimer: Like every other prediction of the stock\n"
        report = report "market, this report is, of course, complete nonsense.\n"
        report = report "If you are stupid enough to believe these predictions\n"
        report = report "you should visit a doctor who can treat your ailment."
      }
 
    The function 'SendMail()' goes through the list of customers and
 opens a pipe to the 'mail' command for each of them.  Each one receives
 an email message with a proper subject heading and is addressed with his
 full name.
 
      function SendMail() {
        # send report to customers
        customer["uncle.scrooge@ducktown.gov"] = "Uncle Scrooge"
        customer["more@utopia.org"           ] = "Sir Thomas More"
        customer["spinoza@denhaag.nl"        ] = "Baruch de Spinoza"
        customer["marx@highgate.uk"          ] = "Karl Marx"
        customer["keynes@the.long.run"       ] = "John Maynard Keynes"
        customer["bierce@devil.hell.org"     ] = "Ambrose Bierce"
        customer["laplace@paris.fr"          ] = "Pierre Simon de Laplace"
        for (c in customer) {
          MailPipe = "mail -s 'Daily Stock Prediction Newsletter'" c
          print "Good morning " customer[c] "," | MailPipe
          print report "\n.\n" | MailPipe
          close(MailPipe)
        }
      }
 
    Be patient when running the script by hand.  Retrieving the data for
 all the ticker symbols and sending the emails may take several minutes
 to complete, depending upon network traffic and the speed of the
 available Internet link.  The quality of the prediction algorithm is
 likely to be disappointing.  Try to find a better one.  Should you find
 one with a success rate of more than 50%, please tell us about it!  It
 is only for the sake of curiosity, of course.  ':-)'
Info Catalog
gawkinet: MOBAGWHO
gawkinet: Some Applications and Techniques
gawkinet: PROTBASE