--- - branch: MAIN date: Sun Feb 16 22:58:51 UTC 2014 files: - new: '1.1' old: '0' path: pkgsrc/www/htmlcxx/DESCR pathrev: pkgsrc/www/htmlcxx/DESCR@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/www/htmlcxx/Makefile pathrev: pkgsrc/www/htmlcxx/Makefile@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/www/htmlcxx/PLIST pathrev: pkgsrc/www/htmlcxx/PLIST@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/www/htmlcxx/buildlink3.mk pathrev: pkgsrc/www/htmlcxx/buildlink3.mk@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/www/htmlcxx/distinfo pathrev: pkgsrc/www/htmlcxx/distinfo@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/www/htmlcxx/patches/patch-html_CharsetConverter.cc pathrev: pkgsrc/www/htmlcxx/patches/patch-html_CharsetConverter.cc@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/www/htmlcxx/patches/patch-html_ci__string.h pathrev: pkgsrc/www/htmlcxx/patches/patch-html_ci__string.h@1.1 type: added id: 20140216T225851Z.1bb5911323a005883a1c7a946e087bfb86c11442 log: | Import htmlcxx-0.85 as www/htmlcxx. htmlcxx is a simple non-validating CSS1 and HTML parser for C++. Although there are several other HTML parsers available, htmlcxx has some characteristics that make it unique: * STL like navigation of DOM tree, using the excellent tree.hh library from Kasper Peeters * It is possible to reproduce exactly, character by character, the original document from the parse tree * Bundled css parser * Optional parsing of attributes * C++ code that looks like C++ (not so true anymore) * Offsets of tags/elements in the original document are stored in the nodes of the DOM tree The parsing politics of htmlcxx were created trying to mimic Mozilla Firefox behavior. So you should expect parse trees similar to those create by Firefox. However, differently from Firefox, htmlcxx does not insert non-existent stuff in your html. Therefore, serializing the DOM tree gives exactly the same bytes contained in the original HTML document. module: pkgsrc subject: 'CVS commit: pkgsrc/www/htmlcxx' unixtime: '1392591531' user: wiz