Android uses jsoup to parse webpages

Source: Internet
Author: User

Problem:

Because android courses require a free classroom to query related functions, the classroom usage information can be obtained through parsing html on the Office of Academic Affairs website. I have studied htmlparser, an open source library, but found that it is in conflict with the library provided by android, but it is feasible in pure java applications.

Htmlparser: The http://htmlparser.sourceforge.net/

 


Solution:

Another open-source library jsoup is used to solve the problem.

Jsoup: http://jsoup.org/

Jsoup is a Java HTML Parser that can directly parse a URL address and HTML text content. It provides a set of very labor-saving APIs that can be used to retrieve and manipulate data through DOM, CSS, and operations similar to JQuery. This version contains a parser branch that supports HTML5, which ensures that the HTML parsing method is the same as the current browser, and reduces the parsing time and memory usage.

You can use the javasdom的 to upload a webpage. For example, if I have downloaded a webpage and saved it as input.html, we can use the following code to obtain all the hyperlinks in the DOM with the id as content. The linkHref string obtains the link address of the link, and the linkText string obtains the text description of the link.

 

 

[Java] <span style = "font-size: 16px;"> File input = new File ("/tmp/input.html ");
Document doc = Jsoup. parse (input, "UTF-8", "http://example.com /");
 
Element content = doc. getElementById ("content ");
Elements links = content. getElementsByTag ("");
For (Element link: links ){
String linkHref = link. attr ("href ");
String linkText = link. text ();
}
</Span>
<Span style = "font-size: 16px;"> File input = new File ("/tmp/input.html ");
Document doc = Jsoup. parse (input, "UTF-8", "http://example.com /");

Element content = doc. getElementById ("content ");
Elements links = content. getElementsByTag ("");
For (Element link: links ){
String linkHref = link. attr ("href ");
String linkText = link. text ();
}
</Span>


There are many other ways to parse HTML in different situations. I will not list them here. Maybe you can study them on your own.

 


From Peking University-Google Android lab

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.