gnus: Comparing Mail Back Ends

 
 6.4.13.11 Comparing Mail Back Ends
 ..................................
 
 First, just for terminology, the “back end” is the common word for a
 low-level access method—a transport, if you will, by which something is
 acquired.  The sense is that one’s mail has to come from somewhere, and
 so selection of a suitable back end is required in order to get that
 mail within spitting distance of Gnus.
 
    The same concept exists for Usenet itself: Though access to articles
 is typically done by NNTP these days, once upon a midnight dreary,
 everyone in the world got at Usenet by running a reader on the machine
 where the articles lay (the machine which today we call an NNTP server),
 and access was by the reader stepping into the articles’ directory spool
 area directly.  One can still select between either the ‘nntp’ or
 ‘nnspool’ back ends, to select between these methods, if one happens
 actually to live on the server (or can see its spool directly, anyway,
 via NFS).
 
    The goal in selecting a mail back end is to pick one which
 simultaneously represents a suitable way of dealing with the original
 format plus leaving mail in a form that is convenient to use in the
 future.  Here are some high and low points on each:
 
 ‘nnmbox’
 
      UNIX systems have historically had a single, very common, and
      well-defined format.  All messages arrive in a single “spool file”,
      and they are delineated by a line whose regular expression matches
      ‘^From_’.  (My notational use of ‘_’ is to indicate a space, to
      make it clear in this instance that this is not the RFC-specified
      ‘From:’ header.)  Because Emacs and therefore Gnus emanate
      historically from the Unix environment, it is simplest if one does
      not mess a great deal with the original mailbox format, so if one
      chooses this back end, Gnus’ primary activity in getting mail from
      the real spool area to Gnus’ preferred directory is simply to copy
      it, with no (appreciable) format change in the process.  It is the
      “dumbest” way to move mail into availability in the Gnus
      environment.  This makes it fast to move into place, but slow to
      parse, when Gnus has to look at what’s where.
 
 ‘nnbabyl’
 
      Once upon a time, there was the DEC-10 and DEC-20, running
      operating systems called TOPS and related things, and the usual
      (only?)  mail reading environment was a thing called Babyl.  I
      don’t know what format was used for mail landing on the system, but
      Babyl had its own internal format to which mail was converted,
      primarily involving creating a spool-file-like entity with a scheme
      for inserting Babyl-specific headers and status bits above the top
      of each message in the file.  Rmail was Emacs’s first mail reader,
      it was written by Richard Stallman, and Stallman came out of that
      TOPS/Babyl environment, so he wrote Rmail to understand the mail
      files folks already had in existence.  Gnus (and VM, for that
      matter) continue to support this format because it’s perceived as
      having some good qualities in those mailer-specific headers/status
      bits stuff.  Rmail itself still exists as well, of course, and is
      still maintained within Emacs.  Since Emacs 23, it uses standard
      mbox format rather than Babyl.
 
      Both of the above forms leave your mail in a single file on your
      file system, and they must parse that entire file each time you
      take a look at your mail.
 
 ‘nnml’
 
      ‘nnml’ is the back end which smells the most as though you were
      actually operating with an ‘nnspool’-accessed Usenet system.  (In
      fact, I believe ‘nnml’ actually derived from ‘nnspool’ code, lo
      these years ago.)  One’s mail is taken from the original spool
      file, and is then cut up into individual message files, 1:1.  It
      maintains a Usenet-style active file (analogous to what one finds
      in an INN- or CNews-based news system in (for instance)
      ‘/var/lib/news/active’, or what is returned via the ‘NNTP LIST’
      verb) and also creates “overview” files for efficient group entry,
      as has been defined for NNTP servers for some years now.  It is
      slower in mail-splitting, due to the creation of lots of files,
      updates to the ‘nnml’ active file, and additions to overview files
      on a per-message basis, but it is extremely fast on access because
      of what amounts to the indexing support provided by the active file
      and overviews.
 
      ‘nnml’ costs “inodes” in a big way; that is, it soaks up the
      resource which defines available places in the file system to put
      new files.  Sysadmins take a dim view of heavy inode occupation
      within tight, shared file systems.  But if you live on a personal
      machine where the file system is your own and space is not at a
      premium, ‘nnml’ wins big.
 
      It is also problematic using this back end if you are living in a
      FAT16-based Windows world, since much space will be wasted on all
      these tiny files.
 
 ‘nnmh’
 
      The Rand MH mail-reading system has been around UNIX systems for a
      very long time; it operates by splitting one’s spool file of
      messages into individual files, but with little or no indexing
      support—‘nnmh’ is considered to be semantically equivalent to
      “‘nnml’ without active file or overviews”.  This is arguably the
      worst choice, because one gets the slowness of individual file
      creation married to the slowness of access parsing when learning
      what’s new in one’s groups.
 
 ‘nnfolder’
 
      Basically the effect of ‘nnfolder’ is ‘nnmbox’ (the first method
      described above) on a per-group basis.  That is, ‘nnmbox’ itself
      puts _all_ one’s mail in one file; ‘nnfolder’ provides a little bit
      of optimization to this so that each of one’s mail groups has a
      Unix mail box file.  It’s faster than ‘nnmbox’ because each group
      can be parsed separately, and still provides the simple Unix mail
      box format requiring minimal effort in moving the mail around.  In
      addition, it maintains an “active” file making it much faster for
      Gnus to figure out how many messages there are in each separate
      group.
 
      If you have groups that are expected to have a massive amount of
      messages, ‘nnfolder’ is not the best choice, but if you receive
      only a moderate amount of mail, ‘nnfolder’ is probably the most
      friendly mail back end all over.
 
 ‘nnmaildir’
 
      For configuring expiry and other things, ‘nnmaildir’ uses
      incompatible group parameters, slightly different from those of
      other mail back ends.
 
      ‘nnmaildir’ is largely similar to ‘nnml’, with some notable
      differences.  Each message is stored in a separate file, but the
      filename is unrelated to the article number in Gnus.  ‘nnmaildir’
      also stores the equivalent of ‘nnml’’s overview files in one file
      per article, so it uses about twice as many inodes as ‘nnml’.  (Use
      ‘df -i’ to see how plentiful your inode supply is.)  If this slows
      you down or takes up very much space, a non-block-structured file
      system.
 
      Since maildirs don’t require locking for delivery, the maildirs you
      use as groups can also be the maildirs your mail is directly
      delivered to.  This means you can skip Gnus’ mail splitting if your
      mail is already organized into different mailboxes during delivery.
      A ‘directory’ entry in ‘mail-sources’ would have a similar effect,
      but would require one set of mailboxes for spooling deliveries (in
      mbox format, thus damaging message bodies), and another set to be
      used as groups (in whatever format you like).  A maildir has a
      built-in spool, in the ‘new/’ subdirectory.  Beware that currently,
      mail moved from ‘new/’ to ‘cur/’ instead of via mail splitting will
      not undergo treatment such as duplicate checking.
 
      ‘nnmaildir’ stores article marks for a given group in the
      corresponding maildir, in a way designed so that it’s easy to
      manipulate them from outside Gnus.  You can tar up a maildir,
      unpack it somewhere else, and still have your marks.
 
      ‘nnmaildir’ uses a significant amount of memory to speed things up.
      (It keeps in memory some of the things that ‘nnml’ stores in files
      and that ‘nnmh’ repeatedly parses out of message files.)  If this
      is a problem for you, you can set the ‘nov-cache-size’ group
      parameter to something small (0 would probably not work, but 1
      probably would) to make it use less memory.  This caching will
      probably be removed in the future.
 
      Startup is likely to be slower with ‘nnmaildir’ than with other
      back ends.  Everything else is likely to be faster, depending in
      part on your file system.
 
      ‘nnmaildir’ does not use ‘nnoo’, so you cannot use ‘nnoo’ to write
      an ‘nnmaildir’-derived back end.