gnus: Creating a spam-stat dictionary

 
 9.17.8.1 Creating a spam-stat dictionary
 ........................................
 
 Before you can begin to filter spam based on statistics, you must create
 these statistics based on two mail collections, one with spam, one with
 non-spam.  These statistics are then stored in a dictionary for later
 use.  In order for these statistics to be meaningful, you need several
 hundred emails in both collections.
 
    Gnus currently supports only the nnml back end for automated
 dictionary creation.  The nnml back end stores all mails in a directory,
 one file per mail.  Use the following:
 
  -- Function: spam-stat-process-spam-directory
      Create spam statistics for every file in this directory.  Every
      file is treated as one spam mail.
 
  -- Function: spam-stat-process-non-spam-directory
      Create non-spam statistics for every file in this directory.  Every
      file is treated as one non-spam mail.
 
    Usually you would call ‘spam-stat-process-spam-directory’ on a
 directory such as ‘~/Mail/mail/spam’ (this usually corresponds to the
 group ‘nnml:mail.spam’), and you would call
 ‘spam-stat-process-non-spam-directory’ on a directory such as
 ‘~/Mail/mail/misc’ (this usually corresponds to the group
 ‘nnml:mail.misc’).
 
    When you are using IMAP, you won’t have the mails available locally,
 so that will not work.  One solution is to use the Gnus Agent to cache
 the articles.  Then you can use directories such as
 ‘"~/News/agent/nnimap/mail.yourisp.com/personal_spam"’ for
 ‘spam-stat-process-spam-directory’.  SeeAgent as Cache.
 
  -- Variable: spam-stat
      This variable holds the hash-table with all the statistics—the
      dictionary we have been talking about.  For every word in either
      collection, this hash-table stores a vector describing how often
      the word appeared in spam and often it appeared in non-spam mails.
 
    If you want to regenerate the statistics from scratch, you need to
 reset the dictionary.
 
  -- Function: spam-stat-reset
      Reset the ‘spam-stat’ hash-table, deleting all the statistics.
 
    When you are done, you must save the dictionary.  The dictionary may
 be rather large.  If you will not update the dictionary incrementally
 (instead, you will recreate it once a month, for example), then you can
 reduce the size of the dictionary by deleting all words that did not
 appear often enough or that do not clearly belong to only spam or only
 non-spam mails.
 
  -- Function: spam-stat-reduce-size
      Reduce the size of the dictionary.  Use this only if you do not
      want to update the dictionary incrementally.
 
  -- Function: spam-stat-save
      Save the dictionary.
 
  -- Variable: spam-stat-file
      The filename used to store the dictionary.  This defaults to
      ‘~/.spam-stat.el’.