Parse and traverse an HTML document
how to parse an HTML document :
Copy Code code as follows:
String html = "first parse"
+ "parsed HTML into a doc.";
Document doc = jsoup.parse (HTML);
Its parser is capable of
Document Object Properties and methodsThe above properties and methods can be used in HTML documents:
Properties/Methods
Description
Document.activeelement
Returns the currently acquired focus element
This article mainly describes the HTML Document object interpretation, as well as the use of HTML Document object parsing, then let's look at this article together
Let's start by introducing the Document object in HTML:
The Document object
5.1 HTML title Tags -The ① title (Heading) is defined by tags such as -. defines the maximum caption. defines the smallest title.② Make sure that the HTML heading tag is used only for headings. Don't use headings just to produce bold or large text.
HTML documents are defined by HTML elements. The HTML element refers to all the code from the start tag (start tag) to the end tag (end tag).
Start tag
Element content
End tag
This is a paragraph
For a developer, documents are always one of the most vexing things. Also, it is likely that you will take 2 different attitudes towards the document:
When you use someone else's code base, the most desirable is its technical documentation,
1. Introduction to JSOUP
In the past, when we used java to parse HTML documents or fragments, we usually use the open source class library htmlparser (http://htmlparser.sourceforge.net. Now that we have JSOUP, it is enough to use JSOUP to process
I. Introduction of JsoupIn the past, when parsing HTML documents or fragments with Java, we usually use the Htmlparser (http://htmlparser.sourceforge.net/) Open source class library. Now we have jsoup, the future processing of HTML content only need
The file object model (DOM) is an application interface (API) that represents the various elements of a document (such as HTML and XML) and accesses and operations ). Generally, all browsers that support Javascript support DOM. The DOM involved in
first, the basic format of HTML document ://Document type declaration //represent HTML document start //Include document metadata start //Declaration character encoding basic //Set document title //Include document metadata end //representation
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.