gnus: Adaptive Scoring

 
 7.6 Adaptive Scoring
 ====================
 
 If all this scoring is getting you down, Gnus has a way of making it all
 happen automatically—as if by magic.  Or rather, as if by artificial
 stupidity, to be precise.
 
    When you read an article, or mark an article as read, or kill an
 article, you leave marks behind.  On exit from the group, Gnus can sniff
 these marks and add score elements depending on what marks it finds.
 You turn on this ability by setting ‘gnus-use-adaptive-scoring’ to ‘t’
 or ‘(line)’.  If you want score adaptively on separate words appearing
 in the subjects, you should set this variable to ‘(word)’.  If you want
 to use both adaptive methods, set this variable to ‘(word line)’.
 
    To give you complete control over the scoring process, you can
 customize the ‘gnus-default-adaptive-score-alist’ variable.  For
 instance, it might look something like this:
 
      (setq gnus-default-adaptive-score-alist
        '((gnus-unread-mark)
          (gnus-ticked-mark (from 4))
          (gnus-dormant-mark (from 5))
          (gnus-del-mark (from -4) (subject -1))
          (gnus-read-mark (from 4) (subject 2))
          (gnus-expirable-mark (from -1) (subject -1))
          (gnus-killed-mark (from -1) (subject -3))
          (gnus-kill-file-mark)
          (gnus-ancient-mark)
          (gnus-low-score-mark)
          (gnus-catchup-mark (from -1) (subject -1))))
 
    As you see, each element in this alist has a mark as a key (either a
 variable name or a “real” mark—a character).  Following this key is a
 arbitrary number of header/score pairs.  If there are no header/score
 pairs following the key, no adaptive scoring will be done on articles
 that have that key as the article mark.  For instance, articles with
 ‘gnus-unread-mark’ in the example above will not get adaptive score
 entries.
 
    Each article can have only one mark, so just a single of these rules
 will be applied to each article.
 
    To take ‘gnus-del-mark’ as an example—this alist says that all
 articles that have that mark (i.e., are marked with ‘e’) will have a
 score entry added to lower based on the ‘From’ header by -4, and lowered
 by ‘Subject’ by -1.  Change this to fit your prejudices.
 
    If you have marked 10 articles with the same subject with
 ‘gnus-del-mark’, the rule for that mark will be applied ten times.  That
 means that that subject will get a score of ten times -1, which should
 be, unless I’m much mistaken, -10.
 
    If you have auto-expirable (mail) groups (SeeExpiring Mail), all
 the read articles will be marked with the ‘E’ mark.  This’ll probably
 make adaptive scoring slightly impossible, so auto-expiring and adaptive
 scoring doesn’t really mix very well.
 
    The headers you can score on are ‘from’, ‘subject’, ‘message-id’,
 ‘references’, ‘xref’, ‘lines’, ‘chars’ and ‘date’.  In addition, you can
 score on ‘followup’, which will create an adaptive score entry that
 matches on the ‘References’ header using the ‘Message-ID’ of the current
 article, thereby matching the following thread.
 
    If you use this scheme, you should set the score file atom ‘mark’ to
 something small—like -300, perhaps, to avoid having small random changes
 result in articles getting marked as read.
 
    After using adaptive scoring for a week or so, Gnus should start to
 become properly trained and enhance the authors you like best, and kill
 the authors you like least, without you having to say so explicitly.
 
    You can control what groups the adaptive scoring is to be performed
 on by using the score files (SeeScore File Format).  This will also
 let you use different rules in different groups.
 
    The adaptive score entries will be put into a file where the name is
 the group name with ‘gnus-adaptive-file-suffix’ appended.  The default
 is ‘ADAPT’.
 
    Adaptive score files can get huge and are not meant to be edited by
 human hands.  If ‘gnus-adaptive-pretty-print’ is ‘nil’ (the default)
 those files will not be written in a human readable way.
 
    When doing adaptive scoring, substring or fuzzy matching would
 probably give you the best results in most cases.  However, if the
 header one matches is short, the possibility for false positives is
 great, so if the length of the match is less than
 ‘gnus-score-exact-adapt-limit’, exact matching will be used.  If this
 variable is ‘nil’, exact matching will always be used to avoid this
 problem.
 
    As mentioned above, you can adapt either on individual words or
 entire headers.  If you adapt on words, the
 ‘gnus-default-adaptive-word-score-alist’ variable says what score each
 instance of a word should add given a mark.
 
      (setq gnus-default-adaptive-word-score-alist
            `((,gnus-read-mark . 30)
              (,gnus-catchup-mark . -10)
              (,gnus-killed-mark . -20)
              (,gnus-del-mark . -15)))
 
    This is the default value.  If you have adaption on words enabled,
 every word that appears in subjects of articles marked with
 ‘gnus-read-mark’ will result in a score rule that increase the score
 with 30 points.
 
    Words that appear in the ‘gnus-default-ignored-adaptive-words’ list
 will be ignored.  If you wish to add more words to be ignored, use the
 ‘gnus-ignored-adaptive-words’ list instead.
 
    Some may feel that short words shouldn’t count when doing adaptive
 scoring.  If so, you may set ‘gnus-adaptive-word-length-limit’ to an
 integer.  Words shorter than this number will be ignored.  This variable
 defaults to ‘nil’.
 
    When the scoring is done, ‘gnus-adaptive-word-syntax-table’ is the
 syntax table in effect.  It is similar to the standard syntax table, but
 it considers numbers to be non-word-constituent characters.
 
    If ‘gnus-adaptive-word-minimum’ is set to a number, the adaptive word
 scoring process will never bring down the score of an article to below
 this number.  The default is ‘nil’.
 
    If ‘gnus-adaptive-word-no-group-words’ is set to ‘t’, gnus won’t
 adaptively word score any of the words in the group name.  Useful for
 groups like ‘comp.editors.emacs’, where most of the subject lines
 contain the word ‘emacs’.
 
    After using this scheme for a while, it might be nice to write a
 ‘gnus-psychoanalyze-user’ command to go through the rules and see what
 words you like and what words you don’t like.  Or perhaps not.
 
    Note that the adaptive word scoring thing is highly experimental and
 is likely to change in the future.  Initial impressions seem to indicate
 that it’s totally useless as it stands.  Some more work (involving more
 rigorous statistical methods) will have to be done to make this useful.