gnus: Adaptive Scoring
7.6 Adaptive Scoring
====================
If all this scoring is getting you down, Gnus has a way of making it all
happen automatically—as if by magic. Or rather, as if by artificial
stupidity, to be precise.
When you read an article, or mark an article as read, or kill an
article, you leave marks behind. On exit from the group, Gnus can sniff
these marks and add score elements depending on what marks it finds.
You turn on this ability by setting ‘gnus-use-adaptive-scoring’ to ‘t’
or ‘(line)’. If you want score adaptively on separate words appearing
in the subjects, you should set this variable to ‘(word)’. If you want
to use both adaptive methods, set this variable to ‘(word line)’.
To give you complete control over the scoring process, you can
customize the ‘gnus-default-adaptive-score-alist’ variable. For
instance, it might look something like this:
(setq gnus-default-adaptive-score-alist
'((gnus-unread-mark)
(gnus-ticked-mark (from 4))
(gnus-dormant-mark (from 5))
(gnus-del-mark (from -4) (subject -1))
(gnus-read-mark (from 4) (subject 2))
(gnus-expirable-mark (from -1) (subject -1))
(gnus-killed-mark (from -1) (subject -3))
(gnus-kill-file-mark)
(gnus-ancient-mark)
(gnus-low-score-mark)
(gnus-catchup-mark (from -1) (subject -1))))
As you see, each element in this alist has a mark as a key (either a
variable name or a “real” mark—a character). Following this key is a
arbitrary number of header/score pairs. If there are no header/score
pairs following the key, no adaptive scoring will be done on articles
that have that key as the article mark. For instance, articles with
‘gnus-unread-mark’ in the example above will not get adaptive score
entries.
Each article can have only one mark, so just a single of these rules
will be applied to each article.
To take ‘gnus-del-mark’ as an example—this alist says that all
articles that have that mark (i.e., are marked with ‘e’) will have a
score entry added to lower based on the ‘From’ header by -4, and lowered
by ‘Subject’ by -1. Change this to fit your prejudices.
If you have marked 10 articles with the same subject with
‘gnus-del-mark’, the rule for that mark will be applied ten times. That
means that that subject will get a score of ten times -1, which should
be, unless I’m much mistaken, -10.
If you have auto-expirable (mail) groups (Expiring Mail), all
the read articles will be marked with the ‘E’ mark. This’ll probably
make adaptive scoring slightly impossible, so auto-expiring and adaptive
scoring doesn’t really mix very well.
The headers you can score on are ‘from’, ‘subject’, ‘message-id’,
‘references’, ‘xref’, ‘lines’, ‘chars’ and ‘date’. In addition, you can
score on ‘followup’, which will create an adaptive score entry that
matches on the ‘References’ header using the ‘Message-ID’ of the current
article, thereby matching the following thread.
If you use this scheme, you should set the score file atom ‘mark’ to
something small—like -300, perhaps, to avoid having small random changes
result in articles getting marked as read.
After using adaptive scoring for a week or so, Gnus should start to
become properly trained and enhance the authors you like best, and kill
the authors you like least, without you having to say so explicitly.
You can control what groups the adaptive scoring is to be performed
on by using the score files (Score File Format). This will also
let you use different rules in different groups.
The adaptive score entries will be put into a file where the name is
the group name with ‘gnus-adaptive-file-suffix’ appended. The default
is ‘ADAPT’.
Adaptive score files can get huge and are not meant to be edited by
human hands. If ‘gnus-adaptive-pretty-print’ is ‘nil’ (the default)
those files will not be written in a human readable way.
When doing adaptive scoring, substring or fuzzy matching would
probably give you the best results in most cases. However, if the
header one matches is short, the possibility for false positives is
great, so if the length of the match is less than
‘gnus-score-exact-adapt-limit’, exact matching will be used. If this
variable is ‘nil’, exact matching will always be used to avoid this
problem.
As mentioned above, you can adapt either on individual words or
entire headers. If you adapt on words, the
‘gnus-default-adaptive-word-score-alist’ variable says what score each
instance of a word should add given a mark.
(setq gnus-default-adaptive-word-score-alist
`((,gnus-read-mark . 30)
(,gnus-catchup-mark . -10)
(,gnus-killed-mark . -20)
(,gnus-del-mark . -15)))
This is the default value. If you have adaption on words enabled,
every word that appears in subjects of articles marked with
‘gnus-read-mark’ will result in a score rule that increase the score
with 30 points.
Words that appear in the ‘gnus-default-ignored-adaptive-words’ list
will be ignored. If you wish to add more words to be ignored, use the
‘gnus-ignored-adaptive-words’ list instead.
Some may feel that short words shouldn’t count when doing adaptive
scoring. If so, you may set ‘gnus-adaptive-word-length-limit’ to an
integer. Words shorter than this number will be ignored. This variable
defaults to ‘nil’.
When the scoring is done, ‘gnus-adaptive-word-syntax-table’ is the
syntax table in effect. It is similar to the standard syntax table, but
it considers numbers to be non-word-constituent characters.
If ‘gnus-adaptive-word-minimum’ is set to a number, the adaptive word
scoring process will never bring down the score of an article to below
this number. The default is ‘nil’.
If ‘gnus-adaptive-word-no-group-words’ is set to ‘t’, gnus won’t
adaptively word score any of the words in the group name. Useful for
groups like ‘comp.editors.emacs’, where most of the subject lines
contain the word ‘emacs’.
After using this scheme for a while, it might be nice to write a
‘gnus-psychoanalyze-user’ command to go through the rules and see what
words you like and what words you don’t like. Or perhaps not.
Note that the adaptive word scoring thing is highly experimental and
is likely to change in the future. Initial impressions seem to indicate
that it’s totally useless as it stands. Some more work (involving more
rigorous statistical methods) will have to be done to make this useful.