gnus: Spam and Ham Processors

 
 9.17.4 Spam and Ham Processors
 ------------------------------
 
 Spam and ham processors specify special actions to take when you exit a
 group buffer.  Spam processors act on spam messages, and ham processors
 on ham messages.  At present, the main role of these processors is to
 update the dictionaries of dictionary-based spam back ends such as
DONTPRINTYET  Bogofilter (SeeBogofilter) and the Spam Statistics package (*noteDONTPRINTYET  Bogofilter (SeeBogofilter) and the Spam Statistics package (See
 Spam Statistics Filtering).
 
    The spam and ham processors that apply to each group are determined
 by the group’s‘spam-process’ group parameter.  If this group parameter
 is not defined, they are determined by the variable
 ‘gnus-spam-process-newsgroups’.
 
    Gnus learns from the spam you get.  You have to collect your spam in
 one or more spam groups, and set or customize the variable
 ‘spam-junk-mailgroups’ as appropriate.  You can also declare groups to
 contain spam by setting their group parameter ‘spam-contents’ to
 ‘gnus-group-spam-classification-spam’, or by customizing the
 corresponding variable ‘gnus-spam-newsgroup-contents’.  The
 ‘spam-contents’ group parameter and the ‘gnus-spam-newsgroup-contents’
 variable can also be used to declare groups as _ham_ groups if you set
 their classification to ‘gnus-group-spam-classification-ham’.  If groups
 are not classified by means of ‘spam-junk-mailgroups’, ‘spam-contents’,
 or ‘gnus-spam-newsgroup-contents’, they are considered _unclassified_.
 All groups are unclassified by default.
 
    In spam groups, all messages are considered to be spam by default:
 they get the ‘$’ mark (‘gnus-spam-mark’) when you enter the group.  If
 you have seen a message, had it marked as spam, then unmarked it, it
 won’t be marked as spam when you enter the group thereafter.  You can
 disable that behavior, so all unread messages will get the ‘$’ mark, if
 you set the ‘spam-mark-only-unseen-as-spam’ parameter to ‘nil’.  You
 should remove the ‘$’ mark when you are in the group summary buffer for
 every message that is not spam after all.  To remove the ‘$’ mark, you
 can use ‘M-u’ to “unread” the article, or ‘d’ for declaring it read the
 non-spam way.  When you leave a group, all spam-marked (‘$’) articles
 are sent to a spam processor which will study them as spam samples.
 
    Messages may also be deleted in various other ways, and unless
 ‘ham-marks’ group parameter gets overridden below, marks ‘R’ and ‘r’ for
 default read or explicit delete, marks ‘X’ and ‘K’ for automatic or
 explicit kills, as well as mark ‘Y’ for low scores, are all considered
 to be associated with articles which are not spam.  This assumption
 might be false, in particular if you use kill files or score files as
 means for detecting genuine spam, you should then adjust the ‘ham-marks’
 group parameter.
 
  -- Variable: ham-marks
      You can customize this group or topic parameter to be the list of
      marks you want to consider ham.  By default, the list contains the
      deleted, read, killed, kill-filed, and low-score marks (the idea is
      that these articles have been read, but are not spam).  It can be
      useful to also include the tick mark in the ham marks.  It is not
      recommended to make the unread mark a ham mark, because it normally
      indicates a lack of classification.  But you can do it, and we’ll
      be happy for you.
 
  -- Variable: spam-marks
      You can customize this group or topic parameter to be the list of
      marks you want to consider spam.  By default, the list contains
      only the spam mark.  It is not recommended to change that, but you
      can if you really want to.
 
    When you leave _any_ group, regardless of its ‘spam-contents’
 classification, all spam-marked articles are sent to a spam processor,
 which will study these as spam samples.  If you explicit kill a lot, you
 might sometimes end up with articles marked ‘K’ which you never saw, and
 which might accidentally contain spam.  Best is to make sure that real
 spam is marked with ‘$’, and nothing else.
 
    When you leave a _spam_ group, all spam-marked articles are marked as
 expired after processing with the spam processor.  This is not done for
 _unclassified_ or _ham_ groups.  Also, any *ham* articles in a spam
 group will be moved to a location determined by either the
 ‘ham-process-destination’ group parameter or a match in the
 ‘gnus-ham-process-destinations’ variable, which is a list of regular
 expressions matched with group names (it’s easiest to customize this
 variable with ‘M-x customize-variable <RET>
 gnus-ham-process-destinations’).  Each group name list is a standard
 Lisp list, if you prefer to customize the variable manually.  If the
 ‘ham-process-destination’ parameter is not set, ham articles are left in
 place.  If the ‘spam-mark-ham-unread-before-move-from-spam-group’
 parameter is set, the ham articles are marked as unread before being
 moved.
 
    If ham can not be moved—because of a read-only back end such as NNTP,
 for example, it will be copied.
 
    Note that you can use multiples destinations per group or regular
 expression!  This enables you to send your ham to a regular mail group
 and to a _ham training_ group.
 
    When you leave a _ham_ group, all ham-marked articles are sent to a
 ham processor, which will study these as non-spam samples.
 
    By default the variable ‘spam-process-ham-in-spam-groups’ is ‘nil’.
 Set it to ‘t’ if you want ham found in spam groups to be processed.
 Normally this is not done, you are expected instead to send your ham to
 a ham group and process it there.
 
    By default the variable ‘spam-process-ham-in-nonham-groups’ is ‘nil’.
 Set it to ‘t’ if you want ham found in non-ham (spam or unclassified)
 groups to be processed.  Normally this is not done, you are expected
 instead to send your ham to a ham group and process it there.
 
    When you leave a _ham_ or _unclassified_ group, all *spam* articles
 are moved to a location determined by either the
 ‘spam-process-destination’ group parameter or a match in the
 ‘gnus-spam-process-destinations’ variable, which is a list of regular
 expressions matched with group names (it’s easiest to customize this
 variable with ‘M-x customize-variable <RET>
 gnus-spam-process-destinations’).  Each group name list is a standard
 Lisp list, if you prefer to customize the variable manually.  If the
 ‘spam-process-destination’ parameter is not set, the spam articles are
 only expired.  The group name is fully qualified, meaning that if you
 see ‘nntp:servername’ before the group name in the group buffer then you
 need it here as well.
 
    If spam can not be moved—because of a read-only back end such as
 NNTP, for example, it will be copied.
 
    Note that you can use multiples destinations per group or regular
 expression!  This enables you to send your spam to multiple _spam
 training_ groups.
 
    The problem with processing ham and spam is that Gnus doesn’t track
 this processing by default.  Enable the ‘spam-log-to-registry’ variable
 so ‘spam.el’ will use ‘gnus-registry.el’ to track what articles have
 been processed, and avoid processing articles multiple times.  Keep in
 mind that if you limit the number of registry entries, this won’t work
 as well as it does without a limit.
 
    Set this variable if you want only unseen articles in spam groups to
 be marked as spam.  By default, it is set.  If you set it to ‘nil’,
 unread articles will also be marked as spam.
 
    Set this variable if you want ham to be unmarked before it is moved
 out of the spam group.  This is very useful when you use something like
 the tick mark ‘!’ to mark ham—the article will be placed in your
 ‘ham-process-destination’, unmarked as if it came fresh from the mail
 server.
 
    When autodetecting spam, this variable tells ‘spam.el’ whether only
 unseen articles or all unread articles should be checked for spam.  It
 is recommended that you leave it off.