HTMLp: Delphi DOM HTML parser and converter
initialization
Today we see more and more email messages formatted as HTML. For me the email is plain text medium (with attachments) and I don't use WebBrowser or Mozilla object as a message browser in my Для просмотра ссылки ВойдиTo-Do list:
- less restrictive when parsing poorly formatted HTML
- increase speed
- smarter conversion to plain text
interface
TDocument, TNode, TElement, TAttr etc. implements Для просмотра ссылки ВойдиTHtmlParser produces TDocument from HTML string.
function parseString(const HtmlStr: TDomString): TDocument;
THtmlParser uses THtmlReader a event driven SAX-like interface.
To convert DOM tree to plain text or HTML use TTextFormatter or THtmlFormatter respectively.
implementation
Parser is implemented as several modules:- WStrings.pas - like TStrings but Unicode
- DomCore.pas - core DOM implementation
- Entities.pas - HTML character definitions
- HtmlTags.pas - HTML tags atributes
- HtmlReader.pas - lexical analyzer
- HmlParser.pas - HTML parser
- Formatter.pas - HTML DOM tree converters