This version adds a single pass selector for all complex queries, plus a notable improvement in the performance of extracting elements from the DOM using CSS selectors, fixing the bugs that Scala supports, providing new HTML manipulation features, and bug fixes.
Jsoup is a Java HTML parser that can directly parse a URL address, HTML text content. It provides a very labor-saving API for fetching and manipulating data through dom,css and jquery-like operations.
The main functions of Jsoup are as follows: Parsing html from a URL, file or string, using a DOM or CSS selector to find, fetching, and manipulating HTML elements, attributes, and text;
Jsoup is based on the MIT protocol and can be safely used in commercial projects.
Sample code:
File input = new file ("/tmp/input.html");
Document doc = jsoup.parse (input, "UTF-8", "http://example.com/");
Element content = Doc.getelementbyid ("content");
Elements links = Content.getelementsbytag ("a");
for (Element link:links) {
String linkhref = link.attr ("href");
String LinkText = Link.text ();
}
Article reproduced from: Open source China Community [Http://www.oschina.net]
Article title: Jsoup 1.5.1 release, excellent HTML parser
This article address: Http://www.oschina.net/news/15627/jsoup-1-5-1-html-parser
This article is the use of B3log Solo from the simple design of the art of the synchronization of the release of the original address: http://88250.b3log.org/jsoup-1-5-1-html-parser.html