(chibi html-parser)

A permissive HTML parser supporting scalable streaming with a tree folding interface. This copies the interface of Oleg Kiselyov's SSAX parser, as well as providing simple convenience utilities. It correctly handles all invalid HTML, inserting "virtual" starting and closing tags as needed to maintain the proper tree structure needed for the foldts down/up logic. A major goal of this parser is bug-for-bug compatibility with the way common web browsers parse HTML.

(make-html-parser . keys)

Returns a procedure of two arguments, and initial seed and an optional input port, which parses the HTML document from the port with the callbacks specified in the plist keys (using normal, quoted symbols, for portability and to avoid making this a macro). The following callbacks are recognized: In addition, entity-mappings may be overriden with the entities: keyword. Example: the parser for html-strip could be defined as:
 (make-html-parser
  'start: (lambda (tag attrs seed virtual?) seed)
  'end:   (lambda (tag attrs parent-seed seed virtual?) seed)
  'text:  (lambda (text seed) (display text)))
Also see the parser code for html->sxml.

(html->sxml [port])

Returns the SXML representation of the document from port, using the default parsing options.

(html-strip [port])

Returns a string representation of the document from port with all tags removed. No whitespace reduction or other rendering is done.