Java uses crawler to crawl Web page content method _java

Source: Internet
Author: User
Tags stringbuffer

This article is an example of how Java uses Spiders to crawl Web page content. Share to everyone for your reference. Specifically as follows:

Recently in the Java research under the crawl technology, hehe, into a door, their own experience and everyone to share the next
Here are two ways to provide a package with Apache. The other is brought in Java.

The code is as follows:

The first method///This method is provided by the Apache package, simple and convenient//But the following packages are used: Commons-codec-1.4.jar//Commons-httpclient-3.1.jar// Commons-logging-1.0.4.jar public static string createhttpclient (string url, string param) {httpclient client = new Http
  Client ();
  String response = null;
  String keyword = null;
Postmethod Postmethod = new Postmethod (URL); try {//if (param!= null)//keyword = new String (param.getbytes ("gb2312"), "iso-8859-1");//} catch (Unsuppo Rtedencodingexception E1) {////TODO auto-generated catch block//E1.printstacktrace ();//}//namevaluepair[]
  data = {New Namevaluepair ("keyword", keyword)};
  Put the value of the form into the Postmethod//postmethod.setrequestbody (data); The above part is with the parameter crawl, I write it off myself.
   You can erase the cancellation. Study under try {int statusCode = Client.executemethod (Postmethod);
     Response = new String (postmethod.getresponsebodyasstring (). GetBytes ("Iso-8859-1"), "gb2312"); Notice here that the gb2312 is going to be the same as the code you want to crawl the page. String p = response.replaceall ("//&[a-za-z]{1,10};", ""). ReplAceall ("<[^>]*>", "")//Remove tags with HTML language in the Web page System.out.println (p);
  catch (Exception e) {e.printstacktrace ();
return response; //second method///This method is a Java-brought URL to crawl site content public string getpagecontent (string strurl, string strpostrequest, int maxLength)
  {//Read results page stringbuffer buffer = new StringBuffer ();
  System.setproperty ("Sun.net.client.defaultConnectTimeout", "5000");
  System.setproperty ("Sun.net.client.defaultReadTimeout", "5000");
   try {URL newurl = new URL (strurl);
   HttpURLConnection hconnect = (httpurlconnection) newurl. OpenConnection ();
    Additional data for Post mode if (Strpostrequest.length () > 0) {hconnect.setdooutput (true);
    OutputStreamWriter out = new OutputStreamWriter (hconnect. Getoutputstream ());
    Out.write (strpostrequest);
    Out.flush ();
   Out.close ();
   //Read content BufferedReader rd = new BufferedReader (New InputStreamReader (Hconnect.getinputstream ()));
   int ch; for (int length = 0; (ch = rd.read ()) >-1 && (maxLength <= 0 | | | length < MAXLENGTH);
   length++) Buffer.append ((char) ch);
   String s = buffer.tostring ();
   S.replaceall ("//&[a-za-z]{1,10};", ""). ReplaceAll ("<[^>]*>", "");
   System.out.println (s);
   Rd.close ();
   Hconnect.disconnect ();
  Return buffer.tostring (). Trim (); catch (Exception e) {//Return Error: Read Web page failed!
   ";
  return null;

 }
}

Then write a test class:

public static void Main (string[] args) {
  String url = ' http://www.jb51.net ';
  String keyword = "cloud-dwelling community";
  Createhttpclient p = new Createhttpclient ();
  String response = p.createhttpclient (URL, keyword);
  The first method
  //p.getpagecontent (URL, "post", 100500);//The second method
}

Oh, look at the console bar, is not the content of the Web page to obtain

I hope this article will help you with your Java programming.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.