elisp: Parsing HTML/XML

 
 31.26 Parsing HTML and XML
 ==========================
 
 When Emacs is compiled with libxml2 support, the following functions are
 available to parse HTML or XML text into Lisp object trees.
 
  -- Function: libxml-parse-html-region start end &optional base-url
           discard-comments
      This function parses the text between START and END as HTML, and
      returns a list representing the HTML “parse tree”.  It attempts to
      handle real-world HTML by robustly coping with syntax mistakes.
 
      The optional argument BASE-URL, if non-‘nil’, should be a string
      specifying the base URL for relative URLs occurring in links.
 
      If the optional argument DISCARD-COMMENTS is non-‘nil’, then the
      parse tree is created without any comments.
 
      In the parse tree, each HTML node is represented by a list in which
      the first element is a symbol representing the node name, the
      second element is an alist of node attributes, and the remaining
      elements are the subnodes.
 
      The following example demonstrates this.  Given this (malformed)
      HTML document:
 
           <html><head></head><body width=101><div class=thing>Foo<div>Yes
 
      A call to ‘libxml-parse-html-region’ returns this DOM (document
      object model):
 
           (html nil
            (head nil)
            (body ((width . "101"))
             (div ((class . "thing"))
              "Foo"
              (div nil
               "Yes"))))
 
  -- Function: shr-insert-document dom
      This function renders the parsed HTML in DOM into the current
      buffer.  The argument DOM should be a list as generated by
      ‘libxml-parse-html-region’.  This function is, e.g., used by See
      EWW (eww)Top.
 
  -- Function: libxml-parse-xml-region start end &optional base-url
           discard-comments
      This function is the same as ‘libxml-parse-html-region’, except
      that it parses the text as XML rather than HTML (so it is stricter
      about syntax).
 

Menu