gnus: The problem of spam

 
 9.16.1 The problem of spam
 --------------------------
 
 First, some background on spam.
 
    If you have access to e-mail, you are familiar with spam (technically
 termed UCE, Unsolicited Commercial E-mail).  Simply put, it exists
 because e-mail delivery is very cheap compared to paper mail, so only a
 very small percentage of people need to respond to an UCE to make it
 worthwhile to the advertiser.  Ironically, one of the most common spams
 is the one offering a database of e-mail addresses for further spamming.
 Senders of spam are usually called _spammers_, but terms like _vermin_,
 _scum_, _sociopaths_, and _morons_ are in common use as well.
 
    Spam comes from a wide variety of sources.  It is simply impossible
 to dispose of all spam without discarding useful messages.  A good
 example is the TMDA system, which requires senders unknown to you to
 confirm themselves as legitimate senders before their e-mail can reach
 you.  Without getting into the technical side of TMDA, a downside is
 clearly that e-mail from legitimate sources may be discarded if those
 sources can’t or won’t confirm themselves through the TMDA system.
 Another problem with TMDA is that it requires its users to have a basic
 understanding of e-mail delivery and processing.
 
    The simplest approach to filtering spam is filtering, at the mail
 server or when you sort through incoming mail.  If you get 200 spam
 messages per day from ‘random-address@vmadmin.com’, you block
 ‘vmadmin.com’.  If you get 200 messages about ‘VIAGRA’, you discard all
 messages with ‘VIAGRA’ in the message.  If you get lots of spam from
 Bulgaria, for example, you try to filter all mail from Bulgarian IPs.
 
    This, unfortunately, is a great way to discard legitimate e-mail.
 The risks of blocking a whole country (Bulgaria, Norway, Nigeria, China,
 etc.) or even a continent (Asia, Africa, Europe, etc.) from contacting
 you should be obvious, so don’t do it if you have the choice.
 
    In another instance, the very informative and useful RISKS digest has
 been blocked by overzealous mail filters because it *contained* words
 that were common in spam messages.  Nevertheless, in isolated cases,
 with great care, direct filtering of mail can be useful.
 
    Another approach to filtering e-mail is the distributed spam
 processing, for instance DCC implements such a system.  In essence, N
 systems around the world agree that a machine X in Ghana, Estonia, or
 California is sending out spam e-mail, and these N systems enter X or
 the spam e-mail from X into a database.  The criteria for spam detection
 vary—it may be the number of messages sent, the content of the messages,
 and so on.  When a user of the distributed processing system wants to
 find out if a message is spam, he consults one of those N systems.
 
    Distributed spam processing works very well against spammers that
 send a large number of messages at once, but it requires the user to set
 up fairly complicated checks.  There are commercial and free distributed
 spam processing systems.  Distributed spam processing has its risks as
 well.  For instance legitimate e-mail senders have been accused of
 sending spam, and their web sites and mailing lists have been shut down
 for some time because of the incident.
 
    The statistical approach to spam filtering is also popular.  It is
 based on a statistical analysis of previous spam messages.  Usually the
 analysis is a simple word frequency count, with perhaps pairs of words
 or 3-word combinations thrown into the mix.  Statistical analysis of
 spam works very well in most of the cases, but it can classify
 legitimate e-mail as spam in some cases.  It takes time to run the
 analysis, the full message must be analyzed, and the user has to store
 the database of spam analysis.  Statistical analysis on the server is
 gaining popularity.  This has the advantage of letting the user Just
 Read Mail, but has the disadvantage that it’s harder to tell the server
 that it has misclassified mail.
 
    Fighting spam is not easy, no matter what anyone says.  There is no
 magic switch that will distinguish Viagra ads from Mom’s e-mails.  Even
 people are having a hard time telling spam apart from non-spam, because
 spammers are actively looking to fool us into thinking they are Mom,
 essentially.  Spamming is irritating, irresponsible, and idiotic
 behavior from a bunch of people who think the world owes them a favor.
 We hope the following sections will help you in fighting the spam
 plague.