gnus: Spam and Ham Processors
9.17.4 Spam and Ham Processors
------------------------------
Spam and ham processors specify special actions to take when you exit a
group buffer. Spam processors act on spam messages, and ham processors
on ham messages. At present, the main role of these processors is to
update the dictionaries of dictionary-based spam back ends such as
DONTPRINTYET Bogofilter (Bogofilter) and the Spam Statistics package (*noteDONTPRINTYET Bogofilter (Bogofilter) and the Spam Statistics package (
Spam Statistics Filtering).
The spam and ham processors that apply to each group are determined
by the group’s‘spam-process’ group parameter. If this group parameter
is not defined, they are determined by the variable
‘gnus-spam-process-newsgroups’.
Gnus learns from the spam you get. You have to collect your spam in
one or more spam groups, and set or customize the variable
‘spam-junk-mailgroups’ as appropriate. You can also declare groups to
contain spam by setting their group parameter ‘spam-contents’ to
‘gnus-group-spam-classification-spam’, or by customizing the
corresponding variable ‘gnus-spam-newsgroup-contents’. The
‘spam-contents’ group parameter and the ‘gnus-spam-newsgroup-contents’
variable can also be used to declare groups as _ham_ groups if you set
their classification to ‘gnus-group-spam-classification-ham’. If groups
are not classified by means of ‘spam-junk-mailgroups’, ‘spam-contents’,
or ‘gnus-spam-newsgroup-contents’, they are considered _unclassified_.
All groups are unclassified by default.
In spam groups, all messages are considered to be spam by default:
they get the ‘$’ mark (‘gnus-spam-mark’) when you enter the group. If
you have seen a message, had it marked as spam, then unmarked it, it
won’t be marked as spam when you enter the group thereafter. You can
disable that behavior, so all unread messages will get the ‘$’ mark, if
you set the ‘spam-mark-only-unseen-as-spam’ parameter to ‘nil’. You
should remove the ‘$’ mark when you are in the group summary buffer for
every message that is not spam after all. To remove the ‘$’ mark, you
can use ‘M-u’ to “unread” the article, or ‘d’ for declaring it read the
non-spam way. When you leave a group, all spam-marked (‘$’) articles
are sent to a spam processor which will study them as spam samples.
Messages may also be deleted in various other ways, and unless
‘ham-marks’ group parameter gets overridden below, marks ‘R’ and ‘r’ for
default read or explicit delete, marks ‘X’ and ‘K’ for automatic or
explicit kills, as well as mark ‘Y’ for low scores, are all considered
to be associated with articles which are not spam. This assumption
might be false, in particular if you use kill files or score files as
means for detecting genuine spam, you should then adjust the ‘ham-marks’
group parameter.
-- Variable: ham-marks
You can customize this group or topic parameter to be the list of
marks you want to consider ham. By default, the list contains the
deleted, read, killed, kill-filed, and low-score marks (the idea is
that these articles have been read, but are not spam). It can be
useful to also include the tick mark in the ham marks. It is not
recommended to make the unread mark a ham mark, because it normally
indicates a lack of classification. But you can do it, and we’ll
be happy for you.
-- Variable: spam-marks
You can customize this group or topic parameter to be the list of
marks you want to consider spam. By default, the list contains
only the spam mark. It is not recommended to change that, but you
can if you really want to.
When you leave _any_ group, regardless of its ‘spam-contents’
classification, all spam-marked articles are sent to a spam processor,
which will study these as spam samples. If you explicit kill a lot, you
might sometimes end up with articles marked ‘K’ which you never saw, and
which might accidentally contain spam. Best is to make sure that real
spam is marked with ‘$’, and nothing else.
When you leave a _spam_ group, all spam-marked articles are marked as
expired after processing with the spam processor. This is not done for
_unclassified_ or _ham_ groups. Also, any *ham* articles in a spam
group will be moved to a location determined by either the
‘ham-process-destination’ group parameter or a match in the
‘gnus-ham-process-destinations’ variable, which is a list of regular
expressions matched with group names (it’s easiest to customize this
variable with ‘M-x customize-variable <RET>
gnus-ham-process-destinations’). Each group name list is a standard
Lisp list, if you prefer to customize the variable manually. If the
‘ham-process-destination’ parameter is not set, ham articles are left in
place. If the ‘spam-mark-ham-unread-before-move-from-spam-group’
parameter is set, the ham articles are marked as unread before being
moved.
If ham can not be moved—because of a read-only back end such as NNTP,
for example, it will be copied.
Note that you can use multiples destinations per group or regular
expression! This enables you to send your ham to a regular mail group
and to a _ham training_ group.
When you leave a _ham_ group, all ham-marked articles are sent to a
ham processor, which will study these as non-spam samples.
By default the variable ‘spam-process-ham-in-spam-groups’ is ‘nil’.
Set it to ‘t’ if you want ham found in spam groups to be processed.
Normally this is not done, you are expected instead to send your ham to
a ham group and process it there.
By default the variable ‘spam-process-ham-in-nonham-groups’ is ‘nil’.
Set it to ‘t’ if you want ham found in non-ham (spam or unclassified)
groups to be processed. Normally this is not done, you are expected
instead to send your ham to a ham group and process it there.
When you leave a _ham_ or _unclassified_ group, all *spam* articles
are moved to a location determined by either the
‘spam-process-destination’ group parameter or a match in the
‘gnus-spam-process-destinations’ variable, which is a list of regular
expressions matched with group names (it’s easiest to customize this
variable with ‘M-x customize-variable <RET>
gnus-spam-process-destinations’). Each group name list is a standard
Lisp list, if you prefer to customize the variable manually. If the
‘spam-process-destination’ parameter is not set, the spam articles are
only expired. The group name is fully qualified, meaning that if you
see ‘nntp:servername’ before the group name in the group buffer then you
need it here as well.
If spam can not be moved—because of a read-only back end such as
NNTP, for example, it will be copied.
Note that you can use multiples destinations per group or regular
expression! This enables you to send your spam to multiple _spam
training_ groups.
The problem with processing ham and spam is that Gnus doesn’t track
this processing by default. Enable the ‘spam-log-to-registry’ variable
so ‘spam.el’ will use ‘gnus-registry.el’ to track what articles have
been processed, and avoid processing articles multiple times. Keep in
mind that if you limit the number of registry entries, this won’t work
as well as it does without a limit.
Set this variable if you want only unseen articles in spam groups to
be marked as spam. By default, it is set. If you set it to ‘nil’,
unread articles will also be marked as spam.
Set this variable if you want ham to be unmarked before it is moved
out of the spam group. This is very useful when you use something like
the tick mark ‘!’ to mark ham—the article will be placed in your
‘ham-process-destination’, unmarked as if it came fresh from the mail
server.
When autodetecting spam, this variable tells ‘spam.el’ whether only
unseen articles or all unread articles should be checked for spam. It
is recommended that you leave it off.