Htmlparser First, I saw the package downloaded from SourceForge. It was really big and scary. A small HTML Parser actually had 5 MB. After downloading the file, expand the split file and other messy parts, the source is not small. After ant build, generate two jars, htmlparser. Jar (200 K) and htmllexer. Jar (56 K ). I am concerned with the analysis of HTML files, so I only care about parser. After a try, it seems that the independent htmlparser. jar can be used without the dependent libraries in the lib directory. The class structure is clear and detailed. The source directory contains several samples, which are relatively simple and easy to understand. Similar to the use of XML parser, there is also an event Driver Interface, expansion is also easy to generate DOM tree, easy to get started. Jericho A simple and small HTML Parser,ProgramThe package is relatively small, about KB, And the jar package built is 40 kb, which is much smaller than the preceding HTML Parser. In terms of usage, Jericho does not provide interfaces similar to Sax and does not focus on the detailed structure. The core concept of Jericho is segment, a tag, and a segment of content. At this level, it is starttag, endtag, and so on. After reading the sample provided by Jericho, it is also very simple. However, people who are familiar with the XML processing method will not get used to it, I think.Source codeThe quality is average, and the HTML Parser does not look good. Nekohtml This is an xni Interface Based on Apache xerces-J and relies on xerces-J. If you think of something as big as xerces-J, you will get angry and give up. Java HTML Parser In addition to the download connection, there is no more information on the home page. It is also quite messy and has not been tried. Tagsoup The download source link on the home page is disconnected and I sent a letter to the author. I quickly replied, saying that the link has been fixed. The compiled jar package is 30 kb, which is short and concise. Because the coreCodeTemplate generation is required, so normal compilation can only be performed in a Perl environment. No documentation, no simple sample, reading source, some dizzy, I feel more suitable for the compilation principle syntax analysis and state machine demonstration materials. BTW: on the home page, the handler interface of tagsoup is very similar to that of sax, but it is completely compatible without making it clear. |