Overview
The Jsoup is a Java HTML parser. Supports parsing HTML into a DOM tree, supports CSS selector form selection, supports HTML filtering, and itself comes with an HTTP downloader. Jsoup code Concise, a total of 53 classes, the code about 9000 lines, no third-party dependencies, the code structure is as follows
Jsoup
├──examples #样例, including an example of converting HTML to plain text and one extracting all linked addresses.
├──helper #一些工具类, including tools for reading data, processing connections, and converting strings
├──nodes #DOM节点定义
├──parser #解析html并转换为DOM树
├──safety #安全相关, including whitelist and HTML filtering
└──select #选择器, supports traversal of CSS selector and nodevisitor format
Use
The entrance to the Jsoup is the Jsoup
class. First parse the HTML into a DOM tree, using CSS selector and nodevisitor to manipulate DOM elements, as shown in the example code below
Reference: http://my.oschina.net/flashsword/blog/156748
http://jsoup.org/