Use jsoup to parse HTML pages

Source: Internet
Author: User
When writing Android apps, you sometimes need to parse HTML pages, especially those that crawl data through websites, such as weather forecasts. The powerful htmlparser tool can be used for desktop applications, but errors may occur when used on the Android platform. The other method is to use regular expressions to extract data; another way is to find and locate strings. The text will introduce how to use the jsoup open-source parser. Jsoup can either use a URL or an HTML Script file or a string stored in the HTML Script as the data source, and then use the Dom or CSS selector to find and extract data. Example: // URL as the input source document DOC = jsoup. connect ("http://www.example.com "). timeout (60000 ). get (); // file as the input source file input = new file ("/tmp/input.html ");
Document Doc =
Jsoup. parse (input, "UTF-8", "http://www.example.com/"); // string as the input source document DOC = jsoup. parse (htmlstr); similar to Java Script, jsoup provides the following function getelementbyid (string ID) to obtain elements through ID
Getelementsbytag (string tag)
Obtain elements through tags
Getelementsbyclass (string classname)
Obtain elements through class
Getelementsbyattribute (string key)
Get elements through attributes

The following methods are also provided to obtain sibling nodes:
Siblingelements (), firstelementsibling (),
Lastelementsibling (); nextelementsibling (),
Previuselementsibling ()

Use the following method to obtain element data:
ATTR (string key)
Obtain Element Data
ATTR (string key, string value) sets the Element Data
Attributes () Get all attributes
ID (),
Classname () classnames () to get the value of ID class
Text () to get the text value
Text (string value)
Set Text Value
HTML () to obtain html
Set HTML (string value)
Outerhtml ()
Obtain internal html
Data () Get data content
Tag () Get tag and tagname () Get tagname operation html provides the following method:
Append (string html), prepend (string
Html)
Appendtext (string text), prependtext (string
Text)
Appendelement (string tagname), prependelement (string
Tagname)
Use jsoup to extract all the link addresses in a div block. The HTML text is as follows: <! Doctype HTML public "-// W3C // dtd html 4.01 transitional // en"
Http://www.w3.org/TR/html4/loose.dtd>
<HTML>
<Head>
<Meta
HTTP-equiv = "Content-Type" content = "text/html;
Charset = UTF-8 ">
<Title> test </title>
</Head>
<Body>
Test connection
<Div
Class = "My Div">
<
Href = "page1.html"> link 1 </a> <br>
<
Href = "http://www.example.com/page2.html"> link address 2 </a> <br>
</Div>
</Body>
</Html> Android Java code: Import org. jsoup. jsoup;
Import org. jsoup. nodes. Document;
Import
Org. jsoup. nodes. element;
Import org. jsoup. Select. elements; document DOC =
Jsoup. Connect ("http://www.example.com"). Timeout (60000). Get ();
Elements divs =
Doc. Select ("Div. My Div ");
Stringbuilder linkbuffer = new
Stringbuilder ();
If (Divs! = NULL ){
For (element Div: divs ){

Elements links = div. Select ("A [href]");
If (null! = Links)
{
For (element link: Links ){

Linkbuffer. append (link. ATTR ("ABS: href"); // The relative address is automatically converted to an absolute URL address.

Linkbuffer. append ("");

Linkbuffer. append (link. Text ());
}

}
}
} For more details about jsoup, refer to the documents on the official website. The above code is successfully tested on Android 1.6 and later mobile phones. Note: If the mobile phone is connected through WAP, you may need to set HTTP proxy as follows (the code is placed in jsoup. before connect is called): String host = android.net. proxy. getdefaulthost ();
Int Port =
Android.net. Proxy. getdefaultport ();
If (host! = NULL & port! =-1)
{
System. getproperties (). setproperty ("proxyset", "true ");

System. setproperty ("HTTP. proxyhost", host );

System. setproperty ("HTTP. proxyport", integer. tostring (port ));
}

This article from the "technical life" blog, please be sure to keep this source http://zhaohaiyang.blog.51cto.com/2056753/735346

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.