Java open-source HTML operation components

Source: Internet
Author: User
Nekohtml clicks: 2603

Nekohtml is a simple HTML scanner and tag balancer, which enables the program to parse HTML documents and use standard XML interfaces to access the information. This parser can scan HTML files and "correct" the errors that many authors (people or machines) often make when writing HTML documents. Nekohtml can add missing parent elements, automatically close corresponding elements with end tags, and non-matching embedded element tags. The development of nekohtml uses xerces Native Interface (xni), which is the implementation basis of xerces2.

Jtidy clicks: 1916

Jtidy is a transplanted version of HTML tidy implemented in Java. It provides an HTML syntax checker and a good printing function. Similar to its non-Java products, jtidy can be used to clear bad and incorrect HTM formats. In addition, jtidy provides a DOM analyzer for the entire HTML. Programmers can use jtidy as a DOM Parser for processing HTML files.

Hotsax clicks: 1025

Hotsax is a fast and small footprint for unconfirmed sax2 parsing of HTML, XML, and XHTML. It can be used in simple web proxy, page capture, and crawler programs. It is similar to the Apache xerces analyzer.

Jericho HTML Parser clicks: 1480

Jericho HTML Parser is a simple and powerful Java HTML Parser library that can analyze and process part of HTML documents, including some common server-side labels, at the same time, you can generate unrecognized or invalid HTML. It also provides a useful HTML form analyzer.

HTML Parser clicks: 3045

HTML Parser implements real-time HTML syntax analysis programs.

Java HTML Parser clicks: 1896

HTML Parser provides a set of tag objects that can be deeply parsed into a searchable structure tree.

Tagsoup clicks: 715

Tagsoup is an HTML Parser developed in Java and compliant with Sax.

Htmlripper clicks: 1963

Htmlripper is a Java package that extracts dynamic data from web pages based on predefined rule settings.

Cobra clicks: 372

Cobra is an HTML toolkit. It contains a pure Java HTML Dom analyzer and a page presentation engine. Cobra supports html4, JavaScript, and css2.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.