gawkinet: STOXPRED
3.9 STOXPRED: Stock Market Prediction As A Service
==================================================
Far out in the uncharted backwaters of the unfashionable end of the
Western Spiral arm of the Galaxy lies a small unregarded yellow
sun.
Orbiting this at a distance of roughly ninety-two million miles is
an utterly insignificant little blue-green planet whose
ape-descendent life forms are so amazingly primitive that they
still think digital watches are a pretty neat idea.
This planet has -- or rather had -- a problem, which was this: most
of the people living on it were unhappy for pretty much of the
time. Many solutions were suggested for this problem, but most of
these were largely concerned with the movements of small green
pieces of paper, which is odd because it wasn't the small green
pieces of paper that were unhappy.
Douglas Adams, 'The Hitch Hiker's Guide to the Galaxy'
Valuable services on the Internet are usually _not_ implemented as
mobile agents. There are much simpler ways of implementing services.
All Unix systems provide, for example, the 'cron' service. Unix system
users can write a list of tasks to be done each day, each week, twice a
day, or just once. The list is entered into a file named 'crontab'.
For example, to distribute a newsletter on a daily basis this way, use
'cron' for calling a script each day early in the morning.
# run at 8 am on weekdays, distribute the newsletter
0 8 * * 1-5 $HOME/bin/daily.job >> $HOME/log/newsletter 2>&1
The script first looks for interesting information on the Internet,
assembles it in a nice form and sends the results via email to the
customers.
The following is an example of a primitive newsletter on stock market
prediction. It is a report which first tries to predict the change of
each share in the Dow Jones Industrial Index for the particular day.
Then it mentions some especially promising shares as well as some shares
which look remarkably bad on that day. The report ends with the usual
disclaimer which tells every child _not_ to try this at home and hurt
anybody.
Good morning Uncle Scrooge,
This is your daily stock market report for Monday, October 16, 2000.
Here are the predictions for today:
AA neutral
GE up
JNJ down
MSFT neutral
...
UTX up
DD down
IBM up
MO down
WMT up
DIS up
INTC up
MRK down
XOM down
EK down
IP down
The most promising shares for today are these:
INTC http://biz.yahoo.com/n/i/intc.html
The stock shares to avoid today are these:
EK http://biz.yahoo.com/n/e/ek.html
IP http://biz.yahoo.com/n/i/ip.html
DD http://biz.yahoo.com/n/d/dd.html
...
The script as a whole is rather long. In order to ease the pain of
studying other people's source code, we have broken the script up into
meaningful parts which are invoked one after the other. The basic
structure of the script is as follows:
BEGIN {
Init()
ReadQuotes()
CleanUp()
Prediction()
Report()
SendMail()
}
The earlier parts store data into variables and arrays which are
subsequently used by later parts of the script. The 'Init()' function
first checks if the script is invoked correctly (without any
parameters). If not, it informs the user of the correct usage. What
follows are preparations for the retrieval of the historical quote data.
The names of the 30 stock shares are stored in an array 'name' along
with the current date in 'day', 'month', and 'year'.
All users who are separated from the Internet by a firewall and have
to direct their Internet accesses to a proxy must supply the name of the
proxy to this script with the '-v Proxy=NAME' option. For most users,
the default proxy and port number should suffice.
function Init() {
if (ARGC != 1) {
print "STOXPRED - daily stock share prediction"
print "IN:\n no parameters, nothing on stdin"
print "PARAM:\n -v Proxy=MyProxy -v ProxyPort=80"
print "OUT:\n commented predictions as email"
print "JK 09.10.2000"
exit
}
# Remember ticker symbols from Dow Jones Industrial Index
StockCount = split("AA GE JNJ MSFT AXP GM JPM PG BA HD KO \
SBC C HON MCD T CAT HWP MMM UTX DD IBM MO WMT DIS INTC \
MRK XOM EK IP", name);
# Remember the current date as the end of the time series
day = strftime("%d")
month = strftime("%m")
year = strftime("%Y")
if (Proxy == "") Proxy = "chart.yahoo.com"
if (ProxyPort == 0) ProxyPort = 80
YahooData = "/inet/tcp/0/" Proxy "/" ProxyPort
}
There are two really interesting parts in the script. One is the
function which reads the historical stock quotes from an Internet
server. The other is the one that does the actual prediction. In the
following function we see how the quotes are read from the Yahoo server.
The data which comes from the server is in CSV format (comma-separated
values):
Date,Open,High,Low,Close,Volume
9-Oct-00,22.75,22.75,21.375,22.375,7888500
6-Oct-00,23.8125,24.9375,21.5625,22,10701100
5-Oct-00,24.4375,24.625,23.125,23.50,5810300
Lines contain values of the same time instant, whereas columns are
separated by commas and contain the kind of data that is described in
the header (first) line. At first, 'gawk' is instructed to separate
columns by commas ('FS = ","'). In the loop that follows, a connection
to the Yahoo server is first opened, then a download takes place, and
finally the connection is closed. All this happens once for each ticker
symbol. In the body of this loop, an Internet address is built up as a
string according to the rules of the Yahoo server. The starting and
ending date are chosen to be exactly the same, but one year apart in the
past. All the action is initiated within the 'printf' command which
transmits the request for data to the Yahoo server.
In the inner loop, the server's data is first read and then scanned
line by line. Only lines which have six columns and the name of a month
in the first column contain relevant data. This data is stored in the
two-dimensional array 'quote'; one dimension being time, the other being
the ticker symbol. During retrieval of the first stock's data, the
calendar names of the time instances are stored in the array 'day'
because we need them later.
function ReadQuotes() {
# Retrieve historical data for each ticker symbol
FS = ","
for (stock = 1; stock <= StockCount; stock++) {
URL = "http://chart.yahoo.com/table.csv?s=" name[stock] \
"&a=" month "&b=" day "&c=" year-1 \
"&d=" month "&e=" day "&f=" year \
"g=d&q=q&y=0&z=" name[stock] "&x=.csv"
printf("GET " URL " HTTP/1.0\r\n\r\n") |& YahooData
while ((YahooData |& getline) > 0) {
if (NF == 6 && $1 ~ /Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec/) {
if (stock == 1)
days[++daycount] = $1;
quote[$1, stock] = $5
}
}
close(YahooData)
}
FS = " "
}
Now that we _have_ the data, it can be checked once again to make
sure that no individual stock is missing or invalid, and that all the
stock quotes are aligned correctly. Furthermore, we renumber the time
instances. The most recent day gets day number 1 and all other days get
consecutive numbers. All quotes are rounded toward the nearest whole
number in US Dollars.
function CleanUp() {
# clean up time series; eliminate incomplete data sets
for (d = 1; d <= daycount; d++) {
for (stock = 1; stock <= StockCount; stock++)
if (! ((days[d], stock) in quote))
stock = StockCount + 10
if (stock > StockCount + 1)
continue
datacount++
for (stock = 1; stock <= StockCount; stock++)
data[datacount, stock] = int(0.5 + quote[days[d], stock])
}
delete quote
delete days
}
Now we have arrived at the second really interesting part of the
whole affair. What we present here is a very primitive prediction
algorithm: _If a stock fell yesterday, assume it will also fall today;
if it rose yesterday, assume it will rise today_. (Feel free to replace
this algorithm with a smarter one.) If a stock changed in the same
direction on two consecutive days, this is an indication which should be
highlighted. Two-day advances are stored in 'hot' and two-day declines
in 'avoid'.
The rest of the function is a sanity check. It counts the number of
correct predictions in relation to the total number of predictions one
could have made in the year before.
function Prediction() {
# Predict each ticker symbol by prolonging yesterday's trend
for (stock = 1; stock <= StockCount; stock++) {
if (data[1, stock] > data[2, stock]) {
predict[stock] = "up"
} else if (data[1, stock] < data[2, stock]) {
predict[stock] = "down"
} else {
predict[stock] = "neutral"
}
if ((data[1, stock] > data[2, stock]) && (data[2, stock] > data[3, stock]))
hot[stock] = 1
if ((data[1, stock] < data[2, stock]) && (data[2, stock] < data[3, stock]))
avoid[stock] = 1
}
# Do a plausibility check: how many predictions proved correct?
for (s = 1; s <= StockCount; s++) {
for (d = 1; d <= datacount-2; d++) {
if (data[d+1, s] > data[d+2, s]) {
UpCount++
} else if (data[d+1, s] < data[d+2, s]) {
DownCount++
} else {
NeutralCount++
}
if (((data[d, s] > data[d+1, s]) && (data[d+1, s] > data[d+2, s])) ||
((data[d, s] < data[d+1, s]) && (data[d+1, s] < data[d+2, s])) ||
((data[d, s] == data[d+1, s]) && (data[d+1, s] == data[d+2, s])))
CorrectCount++
}
}
}
At this point the hard work has been done: the array 'predict'
contains the predictions for all the ticker symbols. It is up to the
function 'Report()' to find some nice words to introduce the desired
information.
function Report() {
# Generate report
report = "\nThis is your daily "
report = report "stock market report for "strftime("%A, %B %d, %Y")".\n"
report = report "Here are the predictions for today:\n\n"
for (stock = 1; stock <= StockCount; stock++)
report = report "\t" name[stock] "\t" predict[stock] "\n"
for (stock in hot) {
if (HotCount++ == 0)
report = report "\nThe most promising shares for today are these:\n\n"
report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
}
for (stock in avoid) {
if (AvoidCount++ == 0)
report = report "\nThe stock shares to avoid today are these:\n\n"
report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \
tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n"
}
report = report "\nThis sums up to " HotCount+0 " winners and " AvoidCount+0
report = report " losers. When using this kind\nof prediction scheme for"
report = report " the 12 months which lie behind us,\nwe get " UpCount
report = report " 'ups' and " DownCount " 'downs' and " NeutralCount
report = report " 'neutrals'. Of all\nthese " UpCount+DownCount+NeutralCount
report = report " predictions " CorrectCount " proved correct next day.\n"
report = report "A success rate of "\
int(100*CorrectCount/(UpCount+DownCount+NeutralCount)) "%.\n"
report = report "Random choice would have produced a 33% success rate.\n"
report = report "Disclaimer: Like every other prediction of the stock\n"
report = report "market, this report is, of course, complete nonsense.\n"
report = report "If you are stupid enough to believe these predictions\n"
report = report "you should visit a doctor who can treat your ailment."
}
The function 'SendMail()' goes through the list of customers and
opens a pipe to the 'mail' command for each of them. Each one receives
an email message with a proper subject heading and is addressed with his
full name.
function SendMail() {
# send report to customers
customer["uncle.scrooge@ducktown.gov"] = "Uncle Scrooge"
customer["more@utopia.org" ] = "Sir Thomas More"
customer["spinoza@denhaag.nl" ] = "Baruch de Spinoza"
customer["marx@highgate.uk" ] = "Karl Marx"
customer["keynes@the.long.run" ] = "John Maynard Keynes"
customer["bierce@devil.hell.org" ] = "Ambrose Bierce"
customer["laplace@paris.fr" ] = "Pierre Simon de Laplace"
for (c in customer) {
MailPipe = "mail -s 'Daily Stock Prediction Newsletter'" c
print "Good morning " customer[c] "," | MailPipe
print report "\n.\n" | MailPipe
close(MailPipe)
}
}
Be patient when running the script by hand. Retrieving the data for
all the ticker symbols and sending the emails may take several minutes
to complete, depending upon network traffic and the speed of the
available Internet link. The quality of the prediction algorithm is
likely to be disappointing. Try to find a better one. Should you find
one with a success rate of more than 50%, please tell us about it! It
is only for the sake of curiosity, of course. ':-)'