gnus: Document Server Internals

 
 6.6.3.1 Document Server Internals
 .................................
 
 Adding new document types to be recognized by ‘nndoc’ isn’t difficult.
 You just have to whip up a definition of what the document looks like,
 write a predicate function to recognize that document type, and then
 hook into ‘nndoc’.
 
    First, here’s an example document type definition:
 
      (mmdf
       (article-begin .  "^\^A\^A\^A\^A\n")
       (body-end .  "^\^A\^A\^A\^A\n"))
 
    The definition is simply a unique “name” followed by a series of
 regexp pseudo-variable settings.  Below are the possible variables—don’t
 be daunted by the number of variables; most document types can be
 defined with very few settings:
 
 ‘first-article’
      If present, ‘nndoc’ will skip past all text until it finds
      something that match this regexp.  All text before this will be
      totally ignored.
 
 ‘article-begin’
      This setting has to be present in all document type definitions.
      It says what the beginning of each article looks like.  To do more
      complicated things that cannot be dealt with a simple regexp, you
      can use ‘article-begin-function’ instead of this.
 
 ‘article-begin-function’
      If present, this should be a function that moves point to the
      beginning of each article.  This setting overrides ‘article-begin’.
 
 ‘head-begin’
      If present, this should be a regexp that matches the head of the
      article.  To do more complicated things that cannot be dealt with a
      simple regexp, you can use ‘head-begin-function’ instead of this.
 
 ‘head-begin-function’
      If present, this should be a function that moves point to the head
      of the article.  This setting overrides ‘head-begin’.
 
 ‘head-end’
      This should match the end of the head of the article.  It defaults
      to ‘^$’—the empty line.
 
 ‘body-begin’
      This should match the beginning of the body of the article.  It
      defaults to ‘^\n’.  To do more complicated things that cannot be
      dealt with a simple regexp, you can use ‘body-begin-function’
      instead of this.
 
 ‘body-begin-function’
      If present, this function should move point to the beginning of the
      body of the article.  This setting overrides ‘body-begin’.
 
 ‘body-end’
      If present, this should match the end of the body of the article.
      To do more complicated things that cannot be dealt with a simple
      regexp, you can use ‘body-end-function’ instead of this.
 
 ‘body-end-function’
      If present, this function should move point to the end of the body
      of the article.  This setting overrides ‘body-end’.
 
 ‘file-begin’
      If present, this should match the beginning of the file.  All text
      before this regexp will be totally ignored.
 
 ‘file-end’
      If present, this should match the end of the file.  All text after
      this regexp will be totally ignored.
 
    So, using these variables ‘nndoc’ is able to dissect a document file
 into a series of articles, each with a head and a body.  However, a few
 more variables are needed since not all document types are all that
 news-like—variables needed to transform the head or the body into
 something that’s palatable for Gnus:
 
 ‘prepare-body-function’
      If present, this function will be called when requesting an
      article.  It will be called with point at the start of the body,
      and is useful if the document has encoded some parts of its
      contents.
 
 ‘article-transform-function’
      If present, this function is called when requesting an article.
      It’s meant to be used for more wide-ranging transformation of both
      head and body of the article.
 
 ‘generate-head-function’
      If present, this function is called to generate a head that Gnus
      can understand.  It is called with the article number as a
      parameter, and is expected to generate a nice head for the article
      in question.  It is called when requesting the headers of all
      articles.
 
 ‘generate-article-function’
      If present, this function is called to generate an entire article
      that Gnus can understand.  It is called with the article number as
      a parameter when requesting all articles.
 
 ‘dissection-function’
      If present, this function is called to dissect a document by
      itself, overriding ‘first-article’, ‘article-begin’,
      ‘article-begin-function’, ‘head-begin’, ‘head-begin-function’,
      ‘head-end’, ‘body-begin’, ‘body-begin-function’, ‘body-end’,
      ‘body-end-function’, ‘file-begin’, and ‘file-end’.
 
    Let’s look at the most complicated example I can come up
 with—standard digests:
 
      (standard-digest
       (first-article . ,(concat "^" (make-string 70 ?-) "\n\n+"))
       (article-begin . ,(concat "\n\n" (make-string 30 ?-) "\n\n+"))
       (prepare-body-function . nndoc-unquote-dashes)
       (body-end-function . nndoc-digest-body-end)
       (head-end . "^ ?$")
       (body-begin . "^ ?\n")
       (file-end . "^End of .*digest.*[0-9].*\n\\*\\*\\|^End of.*Digest *$")
       (subtype digest guess))
 
    We see that all text before a 70-width line of dashes is ignored; all
 text after a line that starts with that ‘^End of’ is also ignored; each
 article begins with a 30-width line of dashes; the line separating the
 head from the body may contain a single space; and that the body is run
 through ‘nndoc-unquote-dashes’ before being delivered.
 
    To hook your own document definition into ‘nndoc’, use the
 ‘nndoc-add-type’ function.  It takes two parameters—the first is the
 definition itself and the second (optional) parameter says where in the
 document type definition alist to put this definition.  The alist is
 traversed sequentially, and ‘nndoc-TYPE-type-p’ is called for a given
 type TYPE.  So ‘nndoc-mmdf-type-p’ is called to see whether a document
 is of ‘mmdf’ type, and so on.  These type predicates should return ‘nil’
 if the document is not of the correct type; ‘t’ if it is of the correct
 type; and a number if the document might be of the correct type.  A high
 number means high probability; a low number means low probability with
 ‘0’ being the lowest valid number.