How to parse an HTML document:
String html ="First parse"+"Parsed HTML into a doc.
"; Document doc = Jsoup. parse (html );
(For more details, see parse an HTML string .)
The parser can create a clean parsing result from the HTML document you provide as much as possible, regardless of whether the HTML format is complete or not. For example, it can handle:
- No closed tag (for example:
Lorem
Ipsum
Parses
Lorem
Ipsum
)
- Implicit tag (for example, it can automatically
Table data
Packaged
?)- Create a reliable document structure (the html Tag contains the head and body, and only the appropriate elements appear in the head ).
- The document consists of multiple Elements and TextNodes (and other auxiliary nodes: For details, see nodes package tree ).
- Its Inheritance structure is as follows:
Document InheritanceElement InheritanceNode .TextNode InheritanceNode .
- An Element contains a child node set and has a parent Element. They also provide a unique child element filter list.
|