HttpClient simulation landed in Jiaotong University library

Source: Internet
Author: User

Recently wanted to be a library client, because the school does not provide the API, can only simulate landing and then crawl data.

The first thing to solve is the landing problem, in fact, after it is not difficult, I actually spent two days ... Are some of the details.

With HttpClient simulation requests, it is important to note:

    1. HttpClient (Defaulthttpclient) represents a session in which HttpClient automatically manages cookies (and, of course, controls in the program).
    2. In the same session, when using post or get to initiate a new request, it is generally necessary to call the Abort () method of the previous session, otherwise an exception will be thrown.
    3. Some websites are redirected after successful login (302, 303). If a POST request is made, you need to remove the location from the response header and send a request to the Web site again to get the final data
    4. Crawlers do not run too frequently, most sites have anti-brush site mechanism.
    5. Android uses Jsoup to parse the resulting HTML.
the library's landing process is as follows:
    1. Sends a POST request to set the response request header. The table monomer needs to pass in four fields, namely: Username (code), password (PIN), (submit.x), (SUBMIT.Y)
    2. Based on the response header of the POST request, obtain the location and get the redirected URL.
    3. Send a GET request for specific information. If you use a new HttpClient object for the second request, you will need to obtain the cookie in the POST request and set the cookie to the request header of the GET request, otherwise the system prompts for the expiration date. If you use the same httpclient here, you do not need to set a cookie because httpclient will help you manage cookies.
    4. Check the information to get a little note: httpclient support automatic steering processing, but like the post and put method of the request to accept the successor service, temporarily does not support automatic steering, so if the post is submitted after the return of 301 or 302, you need to handle.
The code is as follows:
<span style= "FONT-SIZE:14PX;" >package com.ali.login;import java.util.arraylist;import Java.util.list;import org.apache.http.HttpResponse; Import Org.apache.http.namevaluepair;import Org.apache.http.client.httpclient;import Org.apache.http.client.entity.urlencodedformentity;import Org.apache.http.client.methods.httpget;import Org.apache.http.client.methods.httppost;import Org.apache.http.impl.client.defaulthttpclient;import Org.apache.http.message.basicnamevaluepair;import Org.apache.http.params.basichttpparams;import Org.apache.http.params.httpconnectionparams;import Org.apache.http.protocol.http;import Org.apache.http.util.entityutils;public class Libraryutil {private static final String BASEURL = "/http/ innopac.lib.xjtu.edu.cn ";p rivate static String post_url =" Http://innopac.lib.xjtu.edu.cn/patroninfo*chx ";p rivate static string Key_code = "CODE";p rivate static string key_pin = "PIN";p rivate static string key_submit_x = "Submit.x";//2 6private static String key_submit_y = "SUBMIt.y ";//20//The HttpClient is used in one sessionprivate static HttpResponse response;private static HttpClient Httpclie NT = null;private static String resulthtml = null;/** * @param args */public static void main (string[] args) {Boolean isCo nn = login ("2111******", "********"), if (isconn) {System.out.println (resulthtml);}} /** * Login * */public Static Boolean login (string userName, string password) {HttpGet httpget = null; HttpPost HttpPost = null;try {httpclient = new defaulthttpclient ();//As browser basichttpparams httpparams = new Basichttppar AMS (); Httpconnectionparams.setconnectiontimeout (Httpparams, 10000); Httpconnectionparams.setsotimeout (Httpparams, 10000); list<namevaluepair> namevaluepairs = new arraylist<namevaluepair> (); Namevaluepairs.add (new Basicnamevaluepair (Key_code, UserName)); Namevaluepairs.add (New Basicnamevaluepair (Key_pin, password)); Namevaluepairs.add (New Basicnamevaluepair (key_submit_x, integer.tostring)); Namevaluepairs.add (new Basicnamevaluepair (key_sUbmit_y, integer.tostring)); httppost = new HttpPost (post_url); Httppost.setheader ("Accept", "text/html, application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 "); Httppost.setheader (" Accept-Language "," ZH-CN , zh;q=0.8,en-us;q=0.5,en;q=0.3 "), Httppost.setheader (" accept-encoding "," gzip, deflate "); Httppost.setheader (" Referer "," Http://innopac.lib.xjtu.edu.cn/patroninfo*chx "); Httppost.setheader (" Connection "," keep-alive "); Httppost.setheader ("Content-type", "application/x-www-form-urlencoded"); Httppost.setheader ("Host", " Innopac.lib.xjtu.edu.cn "); Httppost.setentity (new Urlencodedformentity (namevaluepairs)); response = Httpclient.execute (HttpPost);//send POST request int code = Response.getstatusline (). Getstatuscode (); System.out.println (Response.getstatusline ());//200 indicates a password or username error, 302 indicates a normal login if (code = =) {System.out.println (" User name or password is wrong, please re-login "); return false;} else if (code = = 302) {System.out.println ("Login successful, jump ..."); String location = response.getheaders ("location") [0].getvalue (); SYSTEM.OUT.PRINTLN (location); Httppost.abort () HttpGet = new HttpGet (BASEURL + location); response = Httpclient.execute (httpget);// Send GET Request code = Response.getstatusline (). Getstatuscode (); System.out.println (Response.getstatusline ()), if (code = =) {if (response! = null) {resulthtml = Entityutils.tostring ( Response.getentity (), HTTP. Utf_8);} return true;}}} catch (Exception e) {e.printstacktrace ();} finally {httpget.abort (); Httpclient.getconnectionmanager (). shutdown ();} return false;}} </span>

Reference: http://www.iteye.com/problems/65312

HttpClient simulation landed in Jiaotong University library

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.