Nekohtml clicks: 2603
Nekohtml is a simple HTML scanner and tag balancer, which enables the program to parse HTML documents and use standard XML interfaces to access the information. This parser can scan HTML files and "correct" the errors that many authors (people or machines) often make when writing HTML documents. Nekohtml can add missing parent elements, automatically close corresponding elements with end tags, and non-matching embedded element tags. The development of nekohtml uses xerces Native Interface (xni), which is the implementation basis of xerces2.
Jtidy clicks: 1916
Jtidy is a transplanted version of HTML tidy implemented in Java. It provides an HTML syntax checker and a good printing function. Similar to its non-Java products, jtidy can be used to clear bad and incorrect HTM formats. In addition, jtidy provides a DOM analyzer for the entire HTML. Programmers can use jtidy as a DOM Parser for processing HTML files.
Hotsax clicks: 1025
Hotsax is a fast and small footprint for unconfirmed sax2 parsing of HTML, XML, and XHTML. It can be used in simple web proxy, page capture, and crawler programs. It is similar to the Apache xerces analyzer.
Jericho HTML Parser clicks: 1480
Jericho HTML Parser is a simple and powerful Java HTML Parser library that can analyze and process part of HTML documents, including some common server-side labels, at the same time, you can generate unrecognized or invalid HTML. It also provides a useful HTML form analyzer.
HTML Parser clicks: 3045
HTML Parser implements real-time HTML syntax analysis programs.
Java HTML Parser clicks: 1896
HTML Parser provides a set of tag objects that can be deeply parsed into a searchable structure tree.
Tagsoup clicks: 715
Tagsoup is an HTML Parser developed in Java and compliant with Sax.
Htmlripper clicks: 1963
Htmlripper is a Java package that extracts dynamic data from web pages based on predefined rule settings.
Cobra clicks: 372
Cobra is an HTML toolkit. It contains a pure Java HTML Dom analyzer and a page presentation engine. Cobra supports html4, JavaScript, and css2.