gnus: The problem of spam
9.16.1 The problem of spam
--------------------------
First, some background on spam.
If you have access to e-mail, you are familiar with spam (technically
termed UCE, Unsolicited Commercial E-mail). Simply put, it exists
because e-mail delivery is very cheap compared to paper mail, so only a
very small percentage of people need to respond to an UCE to make it
worthwhile to the advertiser. Ironically, one of the most common spams
is the one offering a database of e-mail addresses for further spamming.
Senders of spam are usually called _spammers_, but terms like _vermin_,
_scum_, _sociopaths_, and _morons_ are in common use as well.
Spam comes from a wide variety of sources. It is simply impossible
to dispose of all spam without discarding useful messages. A good
example is the TMDA system, which requires senders unknown to you to
confirm themselves as legitimate senders before their e-mail can reach
you. Without getting into the technical side of TMDA, a downside is
clearly that e-mail from legitimate sources may be discarded if those
sources can’t or won’t confirm themselves through the TMDA system.
Another problem with TMDA is that it requires its users to have a basic
understanding of e-mail delivery and processing.
The simplest approach to filtering spam is filtering, at the mail
server or when you sort through incoming mail. If you get 200 spam
messages per day from ‘random-address@vmadmin.com’, you block
‘vmadmin.com’. If you get 200 messages about ‘VIAGRA’, you discard all
messages with ‘VIAGRA’ in the message. If you get lots of spam from
Bulgaria, for example, you try to filter all mail from Bulgarian IPs.
This, unfortunately, is a great way to discard legitimate e-mail.
The risks of blocking a whole country (Bulgaria, Norway, Nigeria, China,
etc.) or even a continent (Asia, Africa, Europe, etc.) from contacting
you should be obvious, so don’t do it if you have the choice.
In another instance, the very informative and useful RISKS digest has
been blocked by overzealous mail filters because it *contained* words
that were common in spam messages. Nevertheless, in isolated cases,
with great care, direct filtering of mail can be useful.
Another approach to filtering e-mail is the distributed spam
processing, for instance DCC implements such a system. In essence, N
systems around the world agree that a machine X in Ghana, Estonia, or
California is sending out spam e-mail, and these N systems enter X or
the spam e-mail from X into a database. The criteria for spam detection
vary—it may be the number of messages sent, the content of the messages,
and so on. When a user of the distributed processing system wants to
find out if a message is spam, he consults one of those N systems.
Distributed spam processing works very well against spammers that
send a large number of messages at once, but it requires the user to set
up fairly complicated checks. There are commercial and free distributed
spam processing systems. Distributed spam processing has its risks as
well. For instance legitimate e-mail senders have been accused of
sending spam, and their web sites and mailing lists have been shut down
for some time because of the incident.
The statistical approach to spam filtering is also popular. It is
based on a statistical analysis of previous spam messages. Usually the
analysis is a simple word frequency count, with perhaps pairs of words
or 3-word combinations thrown into the mix. Statistical analysis of
spam works very well in most of the cases, but it can classify
legitimate e-mail as spam in some cases. It takes time to run the
analysis, the full message must be analyzed, and the user has to store
the database of spam analysis. Statistical analysis on the server is
gaining popularity. This has the advantage of letting the user Just
Read Mail, but has the disadvantage that it’s harder to tell the server
that it has misclassified mail.
Fighting spam is not easy, no matter what anyone says. There is no
magic switch that will distinguish Viagra ads from Mom’s e-mails. Even
people are having a hard time telling spam apart from non-spam, because
spammers are actively looking to fool us into thinking they are Mom,
essentially. Spamming is irritating, irresponsible, and idiotic
behavior from a bunch of people who think the world owes them a favor.
We hope the following sections will help you in fighting the spam
plague.