Android uses Jsoup to crawl page data _android

Source: Internet
Author: User
Tags stringbuffer

Jsoup is a Java HTML parser that can directly parse a URL address, HTML text content. It provides a very labor-saving API for fetching and manipulating data through dom,css and jquery-like operations.

Jsoup's official Chinese address: http://www.open-open.com/jsoup/parse-document-from-string.htm
On this site you can find some instructions,. jar file downloads, doc docs descriptions, etc.

The main functions of Jsoup are as follows:

    1. Parsing html from a URL, file, or string;
    2. Use a DOM or CSS selector to find and retrieve data;
    3. Can manipulate HTML elements, attributes, text;

Jsoup is based on the MIT protocol and can be safely used in commercial projects.

The following methods in the Jsoup class are static and can be invoked directly. Description of several methods

Connect () method, obtain a connection, and then call the Connection object get () method to obtain the Document object. Then parsing the Document object connection provides some setup methods timeout (), url (), and so on

Here's a little bit of testing code for the Java project I used.

Package com.javen.Jsoup;

Import java.io.IOException;
Import Org.jsoup.Jsoup;
Import org.jsoup.nodes.Document;
Import org.jsoup.nodes.Element;

Import org.jsoup.select.Elements;
  public class Jsouptest {static String url= "http://www.cnblogs.com/zyw-205520/archive/2012/12/20/2826402.html"; /** * @param args * @throws Exception/public static void main (string[] args) throws Exception {//
    TODO auto-generated Method Stub bolgbody ();
    Test ();
    Blog (); /* Document doc = jsoup.connect ("http://www.jb51.net/") *. Data ("Query", "Java")//Request parameters. useragent ("I ' m JSO
     Up ")//set * user-agent. Cookie (" auth "," token ")//Set cookies. Timeout (3000)//* Set the connection timeout. Post (); *///use the POST method to access the URL/*//Load HTML document from the file input = new file ("d:/test.html");
     Document doc * = Jsoup.parse (input, "UTF-8", "http://www.jb51.net/"); * */** * Gets the body * specified by the specified HTML document @throws IOException * * private static void BoLgbody () throws IOException {//input HTML documents directly from strings string html = " 

Here's how to use Jsoup to asynchronously parse Web pages in Android. Note: It's easy to encounter a garbled problem here

Configuration file: Androidmanifest.xml privileges

<uses-permission android:name= "Android.permission.INTERNET" ></uses-permission>

Layout files for layout

<linearlayout xmlns:android= "http://schemas.android.com/apk/res/android"
  xmlns:tools= "http:// Schemas.android.com/tools "
  android:layout_width=" match_parent "
  android:layout_height=" Match_parent "
  android:orientation= "vertical" >

  <webview
    android:id= "@+id/webview"
    android:layout_width = "Fill_parent"
    android:layout_height= "200DP"/>

  <scrollview
    android:layout_width= "WRAP_" Content "
    android:layout_height=" wrap_content ">

    <textview
      android:id=" @+id/textview
      " Android:layout_width= "Wrap_content"
      android:layout_height= "wrap_content"
      android:text= "@string/hello _world "/>
  </ScrollView>

</LinearLayout>

Code that mainly asynchronously loads data

Package com.javen.aaa;
Import Java.io.BufferedReader;
Import java.io.IOException;
Import Java.io.InputStreamReader;

Import Java.net.URL;
Import Org.jsoup.Jsoup;
Import org.jsoup.nodes.Document;
Import org.jsoup.nodes.Element;

Import org.jsoup.select.Elements;
Import android.app.Activity;
Import Android.app.Dialog;
Import Android.app.ProgressDialog;
Import Android.os.AsyncTask;
Import Android.os.Bundle;
Import Android.util.Log;
Import Android.webkit.WebView;

Import Android.widget.TextView;
  public class Mainactivity extends activity {private WebView webview;
  Private TextView TextView;
  private static final int dialog_key = 0;
    @Override protected void OnCreate (Bundle savedinstancestate) {super.oncreate (savedinstancestate);
    Setcontentview (R.layout.main);
    WebView = (webview) Findviewbyid (R.id.webview);
    textview= (TextView) Findviewbyid (R.id.textview);
      try {progressasynctask asynctask=new progressasynctask (Webview,textview); Asynctask.execute (10000);
    catch (Exception e) {//TODO auto-generated catch block E.printstacktrace ();
    } public String Test () {StringBuffer buffer=new stringbuffer ();
    Document Doc;
      try {doc = Jsoup.connect ("http://www.cnblogs.com/zyw-205520/"). get ();
      Elements Listdiv = Doc.getelementsbyattributevalue ("Class", "Posttitle");
        for (Element element:listdiv) {Elements links = Element.getelementsbytag ("a");
          for (Element link:links) {String linkhref = link.attr ("href");
          String LinkText = Link.text (). Trim ();
          Buffer.append ("linkhref==" +linkhref);
          
          Buffer.append ("linktext==" +linktext);
          System.out.println (LINKHREF);
        System.out.println (LinkText);
    (IOException e) {//TODO auto-generated catch block E.printstacktrace ();

  return buffer.tostring (); //Pop-up View dialog @Override protected Dialog Oncreatedialog (int id) {switch (ID) {case dialog_key: {progressdialog DIALOG = new ProgressDialog (this);
        Dialog.setmessage ("Please wait for the data ...");
        Dialog.setindeterminate (TRUE);
        Dialog.setcancelable (TRUE);
      return dialog;
    } return null;
      public static string readhtml (String myurl) {StringBuffer sb = new StringBuffer ("");
      URL url;
        try {url = new URL (myurl);
        BufferedReader br = new BufferedReader (New InputStreamReader (Url.openstream (), "GBK"));
        String s = "";
        while (s = br.readline ())!= null) {Sb.append (s + "\ r \ n");
      } catch (Exception e) {e.printstacktrace ();
    return sb.tostring ();
    Class Progressasynctask extends Asynctask<integer, Integer, string> {private WebView webview;
    Private TextView TextView;
      Public Progressasynctask (WebView Webview,textview TextView) {super (); This.weBview=webview;
    This.textview=textview; /** * Here the integer parameter corresponds to the first argument in the Asynctask the string return value here corresponds to the third parameter of Asynctask * This method does not run in the UI thread, primarily for asynchronous operations, and is not available in this method to the UI The space is set and modified * but you can invoke the publish progress method to trigger Onprogressupdate to manipulate the UI/@Override protected String doinbackg
      Round (Integer ... params) {String str =null;
      Document doc = null; try {//String URL = ' http://www.cnblogs.com/zyw-205520/p/3355681.html '/////doc= Jsoup.parse (new
URL (URL). OpenStream (), "utf-8", url);
doc = Jsoup.parse (readhtml (URL));
Doc=jsoup.connect (URL). get ();
        Str=doc.body (). toString ();
        doc = Jsoup.connect ("http://www.cnblogs.com/zyw-205520/archive/2012/12/20/2826402.html"). get ();
        Elements Listdiv = Doc.getelementsbyattributevalue ("Class", "postbody");
          for (Element element:listdiv) {str=element.html ();
        System.out.println (element.html ()); } log.d ("DoinbackgrounD ", str.tostring ());
        System.out.println (str);
      You can try GBK or UTF-8} catch (Exception e) {//TODO auto-generated catch block E.printstacktrace ();
      return str.tostring ();
    return test (); /** * The string parameter here corresponds to the third parameter in the Asynctask (that is, the return value of the receive Doinbackground) * runs after the Doinbackground method execution, and runs in the UI thread to UI space Set/@Override protected void OnPostExecute (String result) {Webview.loaddata (result, "text/html
      ; Charset=utf-8 ", null);
      Textview.settext (result);
    Removedialog (Dialog_key); The method runs in the UI thread and is run in the UI thread to set the UI space @Override protected void OnPreExecute () {ShowDialog () Dialog_k
    EY);
     /** * Here the Intege parameter corresponds to the second parameter in Asynctask * in the Doinbackground method, each call to the Publishprogress method triggers the onprogressupdate execution * Onprogressupdate is performed in the UI thread, all can operate on the UI space * * @Override protected void onprogressupdate (Integer ... values)

 {
      
    }
  }

}

The above is the entire content of this article, I hope to help you learn, but also hope that we support the cloud habitat community.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.